Skip to content

Support new vep

Øyvind Evju requested to merge support-new-vep into dev

Description

Closes issues: LA-1550 Related issues:

Issue: In the code, the symbol has been used to fetch the HGNC ID from Ensembl features, as VEP does not annotate with HGNC ID for RefSeq. In this version of VEP, some features are annotated an outdated HGNC symbol.

  • Use downloaded data sources (from genenames.org) for fetching HGNC ID from either NCBI gene id, Ensembl gene id or gene symbol

Issue: New VEP with added RefSeq GFF files (ella-anno!3 (merged)) may annotate with the same transcript multiple times for the same variant.

  • On deposit, prioritize the RefSeq sources per variant (1. latest GFF, 2. interim GFF, 3. VEP default). Discards the lower priority sources.

Issue: New VEP may annotate with multiple versions of the same RefSeq transcript.

  • Add logic to prioritize which transcript is selected. Choose transcript that matches genepanel transcript if available, otherwise, choose latest available version.

Notes to reviewer

Things to consider:

  • Make HGNC ID non-nullable in annotationshadowtranscript
    • We currently have ~100k annotations in our production database today without HGNC id...
  • Use downloaded HGNC resources to check genepanels on deposit
  • How to ensure that we update HGNC sources periodically?

Manual testing: Verified that annotation_transcripts_genepanel give the same output as before on old VEP data. Tested with 6907 variants exonic (+/-20) from NA12878 on Mendel_v01. The only difference is that RefSeq-transcripts of the form NM_xxxxx.x_duplxx is no longer chosen when that transcript exists without the dupl-suffix.

Type of change

Application (affects UI or general functionality):

  • New feature
  • Bug fix
  • Improvement

Ops / admin / CI related only (not impacting users):

  • New feature
  • Bug fix
  • Improvement

Tests

General

  • Tests have been added that prove my fix is effective or that my feature works
  • Related tests have been modified/removed

Hypothesis testing:

  • Soak testing has been done
  • Distribution between positive / negative cases has been checked

Database

  • Includes changes to database schema
  • Includes necessary database migrations

Configuration

  • Includes changes to configuration
  • Includes configuration migration instructions in documentation

Merge checklist

  • Self-review of code performed
  • Feature review against specification (if applicable)
  • Need for documentation has been evaluated and, if necessary, updated
  • Code and implementation is reviewed by other core developer (all changes, inc. changes based on initial review)
Edited by Svein Tore Koksrud Seljebotn

Merge request reports