Use splice prediction with SpliceAI
Background
Enabling and using splice prediction annotation with SpliceAI represents several improvements:
- Will enable a more sensitive filter for synonymous variants (BP7; see LA-1294).
- May increase diagnostic rates by identifying more aberrant splice sites than would otherwise be detected.
- May reduce complexity of analysis workflow significantly, if used to replace cut/paste of Alamut splice prediction results.
About SpliceAI
SpliceAI comes from Illumina, released Jan 2019 as open source. Predicts splice sites, including cryptic ones, apparently far more accurately than older alternatives. Links:
- Literature reference (Cell)
- GitHub
- VEP plugin
- Might come in handy: https://github.com/david-a-parry/vase/blob/master/vase/spliceai_filter.py
Data output
Details of SpliceAI INFO field:
| ID | Description |
|---|---|
| ALLELE | Alternate allele |
| SYMBOL | Gene symbol |
| DS_AG | Delta score (acceptor gain) |
| DS_AL | Delta score (acceptor loss) |
| DS_DG | Delta score (donor gain) |
| DS_DL | Delta score (donor loss) |
| DP_AG | Delta position (acceptor gain) |
| DP_AL | Delta position (acceptor loss) |
| DP_DG | Delta position (donor gain) |
| DP_DL | Delta position (donor loss) |
Explanation of values:
- Symbol:
- Gene symbol, most likely using GENCODE V24lift37 canonical annotation judging from README
- Delta scores:
- Probability of the variant being splice-altering. Beware differences in sensitivity between deep intronic and exonic/near exonic variants (see Fig 2F in article).
- Values 0 to 1
- Suggested cutoffs:
- 0.2 (high recall/likely pathogenic)
- 0.5 (recommended/pathogenic)
- 0.8 (high precision/pathogenic)
- No effect: <0.01 or <0.1?
- Delta positions:
- Location of splicing changes relative to the variant position (positive = downstream, negative = upstream)
Use of data
ACMG recommendations
ClinGen SVI Splicing Subgroup (Walker et al. 2023) recommends the following thresholds for variants outside +/-1,2bp donor/acceptor sites:
- >=0.2 for PP3
- <=0.1 for BP4
Limitations of precomputed dataset
The precomputed dataset has sparse documentation, but from notes in various issues it seems to have some noteworthy limitations:
- Only variants "within genes" are computed. From here, this appears to include the entire transcribed sequence (TX_START-TX_END), i.e including deep intronic variants.
- Only one transcript seems to be included per gene, the "canonical transcript" (see here). Unclear what this is, but most likely from ENSEMBL. Coordinates for each gene is given in
spliceai/annotations/grch37.txt. - Max distance seems to be 50 bp (see here), equal to the default when running the program locally (see here). This is different from the default in the web version, which uses 500 bp as default (we use 250 bp in links in ELLA at OUSAMG).
- For indels, only 1 base insertions and 1-4 base deletions are included (see here).
Also relates to: LA-1294, LA-523
Original user request: MEET-156
Implementation
See also ella-anno#15
Thresholds: Use recommendations from ClinGen SVI Splicing Subgroup:
- >=0.2 for "spliceogenicity" (PP3)
- <=0.1 for "non-spliceogenicity" (BP4)
Results should be used in
-
UI, prediction section: Possibly replace manual entry altogether -
Filter rules (synonymous, no splice effect = BP7). Blocked by #2203 (closed) -
ACMG rules engine (BP7; REQ_no_splice_effect, PP3 and BP4). Note new guidelines coming Q4 2024, see &40--> #2409 (closed)
Edited by Morten C. Eike