Support new vep
Description
Closes issues: LA-1550 Related issues:
Issue: In the code, the symbol has been used to fetch the HGNC ID from Ensembl features, as VEP does not annotate with HGNC ID for RefSeq. In this version of VEP, some features are annotated an outdated HGNC symbol.
- Use downloaded data sources (from genenames.org) for fetching HGNC ID from either NCBI gene id, Ensembl gene id or gene symbol
Issue: New VEP with added RefSeq GFF files (ella-anno!3 (merged)) may annotate with the same transcript multiple times for the same variant.
- On deposit, prioritize the RefSeq sources per variant (1. latest GFF, 2. interim GFF, 3. VEP default). Discards the lower priority sources.
Issue: New VEP may annotate with multiple versions of the same RefSeq transcript.
- Add logic to prioritize which transcript is selected. Choose transcript that matches genepanel transcript if available, otherwise, choose latest available version.
Notes to reviewer
Things to consider:
- Make HGNC ID non-nullable in
annotationshadowtranscript
- We currently have ~100k annotations in our production database today without HGNC id...
- Use downloaded HGNC resources to check genepanels on deposit
- How to ensure that we update HGNC sources periodically?
Manual testing:
Verified that annotation_transcripts_genepanel
give the same output as before on old VEP data. Tested with 6907 variants exonic (+/-20) from NA12878 on Mendel_v01. The only difference is that RefSeq-transcripts of the form NM_xxxxx.x_duplxx is no longer chosen when that transcript exists without the dupl-suffix.
Type of change
Application (affects UI or general functionality):
-
New feature -
Bug fix -
Improvement
Ops / admin / CI related only (not impacting users):
-
New feature -
Bug fix -
Improvement
Tests
General
-
Tests have been added that prove my fix is effective or that my feature works -
Related tests have been modified/removed
Hypothesis testing:
-
Soak testing has been done -
Distribution between positive / negative cases has been checked
Database
-
Includes changes to database schema -
Includes necessary database migrations
Configuration
-
Includes changes to configuration -
Includes configuration migration instructions in documentation
Merge checklist
-
Self-review of code performed -
Feature review against specification (if applicable) -
Need for documentation has been evaluated and, if necessary, updated -
Code and implementation is reviewed by other core developer (all changes, inc. changes based on initial review)