Skip to content

updated clinvar

Erik Severinsen requested to merge clinvar-20201203 into dev

Description

Dataset updated: Clinvar

Dataset version: 20201203

  • Generated data has been validated (specify below)
  • Versioning follows previous versioning pattern
  • Data has been uploaded to DigitalOcean

Notes to reviewer

Validation of data

  • Number of entries is reasonable

773782 entries (200907) -> 781281 entries (this version)

  • File(s) is/are not truncated
$ zcat clinvar_20201203.vcf.gz | grep -v "^#" | cut -f1 | uniq -c
  61080 1
  81893 2
  38849 3
  22713 4
  41387 5
  32085 6
  35808 7
  26189 8
  34795 9
  27608 10
  47534 11
  34359 12
  22908 13
  23529 14
  26845 15
  42715 16
  58750 17
  12575 18
  37391 19
  13285 20
   8747 21
  14635 22
   2811 MT
  32748 X
     42 Y
  • File size(s) is/are reasonable

59M (200907) -> 60M (this version)

(put relevant validation data here, e.g. script output from $ANNO_DATA/SYNC_DATA_LOG here)
2020-12-03 22:30:57,987 - sync_data - main:154 - INFO - Generating dataset clinvar
2020-12-03 22:30:57,988 - sync_data - main:213 - INFO - Running: python3 /anno/scripts/clinvar/clinvardb_to_vcf.py -np 20 -o clinvar_20201203.vcf --no-archive --debug
2020-12-03 23:11:33,757 - sync_data - main:213 - INFO - Running: python3 /anno/scripts/clinvar/pubmed_ids_from_clinvarvcf.py clinvar_20201203.vcf.gz > clinvar_20201203_pubmed_ids.txt
2020-12-03 23:12:08,048 - sync_data - main:213 - INFO - Running: mv clinvar_20201203.vcf.gz* /anno/data/variantDBs/clinvar
2020-12-03 23:12:08,157 - sync_data - main:213 - INFO - Running: mv clinvar_20201203_pubmed_ids.txt /anno/data/variantDBs/clinvar
2020-12-03 23:12:08.651684 - Finished hashing clinvar_20201203.vcf.gz 1/3 files (33.33%)
2020-12-03 23:12:08.651684 - Finished hashing clinvar_20201203.vcf.gz 1/3 files (33.33%)
2020-12-03 23:12:08.651821 - Finished hashing clinvar_20201203.vcf.gz.tbi 2/3 files (66.67%)
2020-12-03 23:12:08.651845 - Finished hashing clinvar_20201203_pubmed_ids.txt 3/3 files (100.00%)
2020-12-03 23:12:08,704 - sync_data - update_vcfanno_toml:346 - INFO - Creating new entry for clinvar in /anno/data/vcfanno_config.toml with file(s) variantDBs/clinvar/clinvar_20201203.vcf.gz
2020-12-03 23:12:08,705 - sync_data - update_vcfanno_toml:349 - INFO - Updated /anno/data/vcfanno_config.toml successfully
2020-12-03 23:12:08,705 - sync_data - update_sources:315 - INFO - Updated  /anno/data/sources.json for clinvar (version: 20201203)

2020-12-04 07:17:33,334 - sync_data - main:154 - INFO - Uploading dataset clinvar
2020-12-04 07:17:33,704 - datamanager - upload_package:200 - INFO - Uploading 5 files for clinvar
2020-12-04 07:17:35,605 - datamanager - upload_package:208 - INFO - Finished processing all files for clinvar
Edited by Øyvind Evju

Merge request reports