Use splice prediction with SpliceAI

Background

Enabling and using splice prediction annotation with SpliceAI represents several improvements:

  • Will enable a more sensitive filter for synonymous variants (BP7; see LA-1294).
  • May increase diagnostic rates by identifying more aberrant splice sites than would otherwise be detected.
  • May reduce complexity of analysis workflow significantly, if used to replace cut/paste of Alamut splice prediction results.

About SpliceAI

SpliceAI comes from Illumina, released Jan 2019 as open source. Predicts splice sites, including cryptic ones, apparently far more accurately than older alternatives. Links:

Data output

Details of SpliceAI INFO field:

ID Description
ALLELE Alternate allele
SYMBOL Gene symbol
DS_AG Delta score (acceptor gain)
DS_AL Delta score (acceptor loss)
DS_DG Delta score (donor gain)
DS_DL Delta score (donor loss)
DP_AG Delta position (acceptor gain)
DP_AL Delta position (acceptor loss)
DP_DG Delta position (donor gain)
DP_DL Delta position (donor loss)

Explanation of values:

  • Symbol:
    • Gene symbol, most likely using GENCODE V24lift37 canonical annotation judging from README
  • Delta scores:
    • Probability of the variant being splice-altering. Beware differences in sensitivity between deep intronic and exonic/near exonic variants (see Fig 2F in article).
    • Values 0 to 1
    • Suggested cutoffs:
      • 0.2 (high recall/likely pathogenic)
      • 0.5 (recommended/pathogenic)
      • 0.8 (high precision/pathogenic)
      • No effect: <0.01 or <0.1?
  • Delta positions:
    • Location of splicing changes relative to the variant position (positive = downstream, negative = upstream)

Use of data

ACMG recommendations

ClinGen SVI Splicing Subgroup (Walker et al. 2023) recommends the following thresholds for variants outside +/-1,2bp donor/acceptor sites:

  • >=0.2 for PP3
  • <=0.1 for BP4

Limitations of precomputed dataset

The precomputed dataset has sparse documentation, but from notes in various issues it seems to have some noteworthy limitations:

  • Only variants "within genes" are computed. From here, this appears to include the entire transcribed sequence (TX_START-TX_END), i.e including deep intronic variants.
  • Only one transcript seems to be included per gene, the "canonical transcript" (see here). Unclear what this is, but most likely from ENSEMBL. Coordinates for each gene is given in spliceai/annotations/grch37.txt.
  • Max distance seems to be 50 bp (see here), equal to the default when running the program locally (see here). This is different from the default in the web version, which uses 500 bp as default (we use 250 bp in links in ELLA at OUSAMG).
  • For indels, only 1 base insertions and 1-4 base deletions are included (see here).

Also relates to: LA-1294, LA-523

Original user request: MEET-156

Implementation

See also ella-anno#15

Thresholds: Use recommendations from ClinGen SVI Splicing Subgroup:

  • >=0.2 for "spliceogenicity" (PP3)
  • <=0.1 for "non-spliceogenicity" (BP4)

Results should be used in

  • UI, prediction section: Possibly replace manual entry altogether
  • Filter rules (synonymous, no splice effect = BP7). Blocked by #2203 (closed)
  • ACMG rules engine (BP7; REQ_no_splice_effect, PP3 and BP4). Note new guidelines coming Q4 2024, see &40 --> #2409 (closed)
Edited by Morten C. Eike