Resolve "Update clinvar 202405"
Description
Dataset updated/created:
Dataset version:
-
Generated data has been validated (specify below) -
Versioning follows previous versioning pattern -
Data has been uploaded to DigitalOcean
Notes to reviewer
Validation of data
-
Number of entries is reasonable -
File(s) is/are not truncated -
File size(s) is/are reasonable
bcftools stats
# This file was produced by bcftools stats (1.13+htslib-1.13+ds) and can be plotted using plot-vcfstats.
# The command line was: bcftools stats /home/franbe/ssh/zed/storage/franbe/rep/amg/ella-anno/data/variantDBs/clinvar/clinvar_20240515.vcf.gz
#
# Definition of sets:
# ID [2]id [3]tab-separated file names
ID 0 /home/franbe/ssh/zed/storage/franbe/rep/amg/ella-anno/data/variantDBs/clinvar/clinvar_20240515.vcf.gz
# SN, Summary numbers:
# number of records .. number of data rows in the VCF
# number of no-ALTs .. reference-only sites, ALT is either "." or identical to REF
# number of SNPs .. number of rows with a SNP
# number of MNPs .. number of rows with a MNP, such as CC>TT
# number of indels .. number of rows with an indel
# number of others .. number of rows with other type, for example a symbolic allele or
# a complex substitution, such as ACT>TCGA
# number of multiallelic sites .. number of rows with multiple alternate alleles
# number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
#
# Note that rows containing multiple types will be counted multiple times, in each
# counter. For example, a row with a SNP and an indel increments both the SNP and
# the indel counter.
#
# SN [2]id [3]key [4]value
SN 0 number of samples: 0
SN 0 number of records: 2892503
SN 0 number of no-ALTs: 0
SN 0 number of SNPs: 2656894
SN 0 number of MNPs: 9231
SN 0 number of indels: 220330
SN 0 number of others: 6048
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
# TSTV, transitions/transversions:
# TSTV [2]id [3]ts [4]tv [5]ts/tv [6]ts (1st ALT) [7]tv (1st ALT) [8]ts/tv (1st ALT)
TSTV 0 1754677 902217 1.94 1754677 902217 1.94
# SiS, Singleton stats:
# SiS [2]id [3]allele count [4]number of SNPs [5]number of transitions [6]number of transversions [7]number of indels [8]repeat-consistent [9]repeat-inconsistent [10]not applicable
SiS 0 1 2656894 1754677 902217 220330 0 0 220330
# AF, Stats by non-reference allele frequency:
# AF [2]id [3]allele frequency [4]number of SNPs [5]number of transitions [6]number of transversions [7]number of indels [8]repeat-consistent [9]repeat-inconsistent [10]not applicable
AF 0 0.000000 2656894 1754677 902217 220330 0 0 220330
# QUAL, Stats by quality
# QUAL [2]id [3]Quality [4]number of SNPs [5]number of transitions (1st ALT) [6]number of transversions (1st ALT) [7]number of indels
QUAL 0 5000.0 2656894 1754677 902217 220330
# IDD, InDel distribution:
# IDD [2]id [3]length (deletions negative) [4]number of sites [5]number of genotypes [6]mean VAF
IDD 0 -60 2685 0 .
IDD 0 -59 17 0 .
IDD 0 -58 15 0 .
IDD 0 -57 29 0 .
IDD 0 -56 24 0 .
IDD 0 -55 30 0 .
IDD 0 -54 43 0 .
IDD 0 -53 24 0 .
IDD 0 -52 31 0 .
IDD 0 -51 26 0 .
IDD 0 -50 36 0 .
IDD 0 -49 24 0 .
IDD 0 -48 54 0 .
IDD 0 -47 25 0 .
IDD 0 -46 42 0 .
IDD 0 -45 62 0 .
IDD 0 -44 48 0 .
IDD 0 -43 35 0 .
IDD 0 -42 79 0 .
IDD 0 -41 56 0 .
IDD 0 -40 57 0 .
IDD 0 -39 68 0 .
IDD 0 -38 74 0 .
IDD 0 -37 78 0 .
IDD 0 -36 133 0 .
IDD 0 -35 88 0 .
IDD 0 -34 89 0 .
IDD 0 -33 127 0 .
IDD 0 -32 107 0 .
IDD 0 -31 115 0 .
IDD 0 -30 233 0 .
IDD 0 -29 143 0 .
IDD 0 -28 180 0 .
IDD 0 -27 396 0 .
IDD 0 -26 249 0 .
IDD 0 -25 224 0 .
IDD 0 -24 508 0 .
IDD 0 -23 299 0 .
IDD 0 -22 349 0 .
IDD 0 -21 553 0 .
IDD 0 -20 431 0 .
IDD 0 -19 443 0 .
IDD 0 -18 786 0 .
IDD 0 -17 551 0 .
IDD 0 -16 677 0 .
IDD 0 -15 1036 0 .
IDD 0 -14 868 0 .
IDD 0 -13 1035 0 .
IDD 0 -12 1713 0 .
IDD 0 -11 1296 0 .
IDD 0 -10 1600 0 .
IDD 0 -9 2119 0 .
IDD 0 -8 1912 0 .
IDD 0 -7 1981 0 .
IDD 0 -6 3310 0 .
IDD 0 -5 4361 0 .
IDD 0 -4 10885 0 .
IDD 0 -3 15132 0 .
IDD 0 -2 23362 0 .
IDD 0 -1 64919 0 .
IDD 0 1 39814 0 .
IDD 0 2 8362 0 .
IDD 0 3 4648 0 .
IDD 0 4 5533 0 .
IDD 0 5 1915 0 .
IDD 0 6 2312 0 .
IDD 0 7 891 0 .
IDD 0 8 1249 0 .
IDD 0 9 1140 0 .
IDD 0 10 661 0 .
IDD 0 11 392 0 .
IDD 0 12 833 0 .
IDD 0 13 314 0 .
IDD 0 14 372 0 .
IDD 0 15 531 0 .
IDD 0 16 377 0 .
IDD 0 17 278 0 .
IDD 0 18 609 0 .
IDD 0 19 265 0 .
IDD 0 20 305 0 .
IDD 0 21 430 0 .
IDD 0 22 243 0 .
IDD 0 23 174 0 .
IDD 0 24 273 0 .
IDD 0 25 140 0 .
IDD 0 26 134 0 .
IDD 0 27 168 0 .
IDD 0 28 90 0 .
IDD 0 29 91 0 .
IDD 0 30 147 0 .
IDD 0 31 66 0 .
IDD 0 32 72 0 .
IDD 0 33 61 0 .
IDD 0 34 52 0 .
IDD 0 35 45 0 .
IDD 0 36 77 0 .
IDD 0 37 33 0 .
IDD 0 38 26 0 .
IDD 0 39 48 0 .
IDD 0 40 36 0 .
IDD 0 41 31 0 .
IDD 0 42 59 0 .
IDD 0 43 23 0 .
IDD 0 44 36 0 .
IDD 0 45 38 0 .
IDD 0 46 22 0 .
IDD 0 47 17 0 .
IDD 0 48 36 0 .
IDD 0 49 23 0 .
IDD 0 50 14 0 .
IDD 0 51 19 0 .
IDD 0 52 26 0 .
IDD 0 53 9 0 .
IDD 0 54 30 0 .
IDD 0 55 16 0 .
IDD 0 56 11 0 .
IDD 0 57 26 0 .
IDD 0 58 11 0 .
IDD 0 59 16 0 .
IDD 0 60 788 0 .
# ST, Substitution types:
# ST [2]id [3]type [4]count
ST 0 A>C 84428
ST 0 A>G 291245
ST 0 A>T 70065
ST 0 C>A 140561
ST 0 C>G 157477
ST 0 C>T 588506
ST 0 G>A 586389
ST 0 G>C 155649
ST 0 G>T 140798
ST 0 T>A 69008
ST 0 T>C 288537
ST 0 T>G 84231
# DP, Depth distribution
# DP [2]id [3]bin [4]number of genotypes [5]fraction of genotypes (%) [6]number of sites [7]fraction of sites (%)
Edited by Francesco Bettella