Skip to content

Resolve "Update clinvar 202405"

Francesco Bettella requested to merge 102-update-clinvar-202405 into dev

Description

Dataset updated/created:

Dataset version:

  • Generated data has been validated (specify below)
  • Versioning follows previous versioning pattern
  • Data has been uploaded to DigitalOcean

Notes to reviewer

Validation of data

  • Number of entries is reasonable
  • File(s) is/are not truncated
  • File size(s) is/are reasonable
bcftools stats

# This file was produced by bcftools stats (1.13+htslib-1.13+ds) and can be plotted using plot-vcfstats.
# The command line was:	bcftools stats  /home/franbe/ssh/zed/storage/franbe/rep/amg/ella-anno/data/variantDBs/clinvar/clinvar_20240515.vcf.gz
#
# Definition of sets:
# ID	[2]id	[3]tab-separated file names
ID	0	/home/franbe/ssh/zed/storage/franbe/rep/amg/ella-anno/data/variantDBs/clinvar/clinvar_20240515.vcf.gz
# SN, Summary numbers:
#   number of records   .. number of data rows in the VCF
#   number of no-ALTs   .. reference-only sites, ALT is either "." or identical to REF
#   number of SNPs      .. number of rows with a SNP
#   number of MNPs      .. number of rows with a MNP, such as CC>TT
#   number of indels    .. number of rows with an indel
#   number of others    .. number of rows with other type, for example a symbolic allele or
#                          a complex substitution, such as ACT>TCGA
#   number of multiallelic sites     .. number of rows with multiple alternate alleles
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
# 
#   Note that rows containing multiple types will be counted multiple times, in each
#   counter. For example, a row with a SNP and an indel increments both the SNP and
#   the indel counter.
# 
# SN	[2]id	[3]key	[4]value
SN	0	number of samples:	0
SN	0	number of records:	2892503
SN	0	number of no-ALTs:	0
SN	0	number of SNPs:	2656894
SN	0	number of MNPs:	9231
SN	0	number of indels:	220330
SN	0	number of others:	6048
SN	0	number of multiallelic sites:	0
SN	0	number of multiallelic SNP sites:	0
# TSTV, transitions/transversions:
# TSTV	[2]id	[3]ts	[4]tv	[5]ts/tv	[6]ts (1st ALT)	[7]tv (1st ALT)	[8]ts/tv (1st ALT)
TSTV	0	1754677	902217	1.94	1754677	902217	1.94
# SiS, Singleton stats:
# SiS	[2]id	[3]allele count	[4]number of SNPs	[5]number of transitions	[6]number of transversions	[7]number of indels	[8]repeat-consistent	[9]repeat-inconsistent	[10]not applicable
SiS	0	1	2656894	1754677	902217	220330	0	0	220330
# AF, Stats by non-reference allele frequency:
# AF	[2]id	[3]allele frequency	[4]number of SNPs	[5]number of transitions	[6]number of transversions	[7]number of indels	[8]repeat-consistent	[9]repeat-inconsistent	[10]not applicable
AF	0	0.000000	2656894	1754677	902217	220330	0	0	220330
# QUAL, Stats by quality
# QUAL	[2]id	[3]Quality	[4]number of SNPs	[5]number of transitions (1st ALT)	[6]number of transversions (1st ALT)	[7]number of indels
QUAL	0	5000.0	2656894	1754677	902217	220330
# IDD, InDel distribution:
# IDD	[2]id	[3]length (deletions negative)	[4]number of sites	[5]number of genotypes	[6]mean VAF
IDD	0	-60	2685	0	.
IDD	0	-59	17	0	.
IDD	0	-58	15	0	.
IDD	0	-57	29	0	.
IDD	0	-56	24	0	.
IDD	0	-55	30	0	.
IDD	0	-54	43	0	.
IDD	0	-53	24	0	.
IDD	0	-52	31	0	.
IDD	0	-51	26	0	.
IDD	0	-50	36	0	.
IDD	0	-49	24	0	.
IDD	0	-48	54	0	.
IDD	0	-47	25	0	.
IDD	0	-46	42	0	.
IDD	0	-45	62	0	.
IDD	0	-44	48	0	.
IDD	0	-43	35	0	.
IDD	0	-42	79	0	.
IDD	0	-41	56	0	.
IDD	0	-40	57	0	.
IDD	0	-39	68	0	.
IDD	0	-38	74	0	.
IDD	0	-37	78	0	.
IDD	0	-36	133	0	.
IDD	0	-35	88	0	.
IDD	0	-34	89	0	.
IDD	0	-33	127	0	.
IDD	0	-32	107	0	.
IDD	0	-31	115	0	.
IDD	0	-30	233	0	.
IDD	0	-29	143	0	.
IDD	0	-28	180	0	.
IDD	0	-27	396	0	.
IDD	0	-26	249	0	.
IDD	0	-25	224	0	.
IDD	0	-24	508	0	.
IDD	0	-23	299	0	.
IDD	0	-22	349	0	.
IDD	0	-21	553	0	.
IDD	0	-20	431	0	.
IDD	0	-19	443	0	.
IDD	0	-18	786	0	.
IDD	0	-17	551	0	.
IDD	0	-16	677	0	.
IDD	0	-15	1036	0	.
IDD	0	-14	868	0	.
IDD	0	-13	1035	0	.
IDD	0	-12	1713	0	.
IDD	0	-11	1296	0	.
IDD	0	-10	1600	0	.
IDD	0	-9	2119	0	.
IDD	0	-8	1912	0	.
IDD	0	-7	1981	0	.
IDD	0	-6	3310	0	.
IDD	0	-5	4361	0	.
IDD	0	-4	10885	0	.
IDD	0	-3	15132	0	.
IDD	0	-2	23362	0	.
IDD	0	-1	64919	0	.
IDD	0	1	39814	0	.
IDD	0	2	8362	0	.
IDD	0	3	4648	0	.
IDD	0	4	5533	0	.
IDD	0	5	1915	0	.
IDD	0	6	2312	0	.
IDD	0	7	891	0	.
IDD	0	8	1249	0	.
IDD	0	9	1140	0	.
IDD	0	10	661	0	.
IDD	0	11	392	0	.
IDD	0	12	833	0	.
IDD	0	13	314	0	.
IDD	0	14	372	0	.
IDD	0	15	531	0	.
IDD	0	16	377	0	.
IDD	0	17	278	0	.
IDD	0	18	609	0	.
IDD	0	19	265	0	.
IDD	0	20	305	0	.
IDD	0	21	430	0	.
IDD	0	22	243	0	.
IDD	0	23	174	0	.
IDD	0	24	273	0	.
IDD	0	25	140	0	.
IDD	0	26	134	0	.
IDD	0	27	168	0	.
IDD	0	28	90	0	.
IDD	0	29	91	0	.
IDD	0	30	147	0	.
IDD	0	31	66	0	.
IDD	0	32	72	0	.
IDD	0	33	61	0	.
IDD	0	34	52	0	.
IDD	0	35	45	0	.
IDD	0	36	77	0	.
IDD	0	37	33	0	.
IDD	0	38	26	0	.
IDD	0	39	48	0	.
IDD	0	40	36	0	.
IDD	0	41	31	0	.
IDD	0	42	59	0	.
IDD	0	43	23	0	.
IDD	0	44	36	0	.
IDD	0	45	38	0	.
IDD	0	46	22	0	.
IDD	0	47	17	0	.
IDD	0	48	36	0	.
IDD	0	49	23	0	.
IDD	0	50	14	0	.
IDD	0	51	19	0	.
IDD	0	52	26	0	.
IDD	0	53	9	0	.
IDD	0	54	30	0	.
IDD	0	55	16	0	.
IDD	0	56	11	0	.
IDD	0	57	26	0	.
IDD	0	58	11	0	.
IDD	0	59	16	0	.
IDD	0	60	788	0	.
# ST, Substitution types:
# ST	[2]id	[3]type	[4]count
ST	0	A>C	84428
ST	0	A>G	291245
ST	0	A>T	70065
ST	0	C>A	140561
ST	0	C>G	157477
ST	0	C>T	588506
ST	0	G>A	586389
ST	0	G>C	155649
ST	0	G>T	140798
ST	0	T>A	69008
ST	0	T>C	288537
ST	0	T>G	84231
# DP, Depth distribution
# DP	[2]id	[3]bin	[4]number of genotypes	[5]fraction of genotypes (%)	[6]number of sites	[7]fraction of sites (%)
Edited by Francesco Bettella

Merge request reports