updated clinvar
Description
Dataset updated: Clinvar
Dataset version: 20201203
-
Generated data has been validated (specify below) -
Versioning follows previous versioning pattern -
Data has been uploaded to DigitalOcean
Notes to reviewer
Validation of data
-
Number of entries is reasonable
773782 entries (200907) -> 781281 entries (this version)
-
File(s) is/are not truncated
$ zcat clinvar_20201203.vcf.gz | grep -v "^#" | cut -f1 | uniq -c
61080 1
81893 2
38849 3
22713 4
41387 5
32085 6
35808 7
26189 8
34795 9
27608 10
47534 11
34359 12
22908 13
23529 14
26845 15
42715 16
58750 17
12575 18
37391 19
13285 20
8747 21
14635 22
2811 MT
32748 X
42 Y
-
File size(s) is/are reasonable
59M (200907) -> 60M (this version)
(put relevant validation data here, e.g. script output from $ANNO_DATA/SYNC_DATA_LOG here)
2020-12-03 22:30:57,987 - sync_data - main:154 - INFO - Generating dataset clinvar
2020-12-03 22:30:57,988 - sync_data - main:213 - INFO - Running: python3 /anno/scripts/clinvar/clinvardb_to_vcf.py -np 20 -o clinvar_20201203.vcf --no-archive --debug
2020-12-03 23:11:33,757 - sync_data - main:213 - INFO - Running: python3 /anno/scripts/clinvar/pubmed_ids_from_clinvarvcf.py clinvar_20201203.vcf.gz > clinvar_20201203_pubmed_ids.txt
2020-12-03 23:12:08,048 - sync_data - main:213 - INFO - Running: mv clinvar_20201203.vcf.gz* /anno/data/variantDBs/clinvar
2020-12-03 23:12:08,157 - sync_data - main:213 - INFO - Running: mv clinvar_20201203_pubmed_ids.txt /anno/data/variantDBs/clinvar
2020-12-03 23:12:08.651684 - Finished hashing clinvar_20201203.vcf.gz 1/3 files (33.33%)
2020-12-03 23:12:08.651684 - Finished hashing clinvar_20201203.vcf.gz 1/3 files (33.33%)
2020-12-03 23:12:08.651821 - Finished hashing clinvar_20201203.vcf.gz.tbi 2/3 files (66.67%)
2020-12-03 23:12:08.651845 - Finished hashing clinvar_20201203_pubmed_ids.txt 3/3 files (100.00%)
2020-12-03 23:12:08,704 - sync_data - update_vcfanno_toml:346 - INFO - Creating new entry for clinvar in /anno/data/vcfanno_config.toml with file(s) variantDBs/clinvar/clinvar_20201203.vcf.gz
2020-12-03 23:12:08,705 - sync_data - update_vcfanno_toml:349 - INFO - Updated /anno/data/vcfanno_config.toml successfully
2020-12-03 23:12:08,705 - sync_data - update_sources:315 - INFO - Updated /anno/data/sources.json for clinvar (version: 20201203)
2020-12-04 07:17:33,334 - sync_data - main:154 - INFO - Uploading dataset clinvar
2020-12-04 07:17:33,704 - datamanager - upload_package:200 - INFO - Uploading 5 files for clinvar
2020-12-04 07:17:35,605 - datamanager - upload_package:208 - INFO - Finished processing all files for clinvar
Edited by Øyvind Evju