Support CNV filtering
Background
Filters in ELLA should handle CNV in addition to SNV. These should preferably be joint with SNV filters, but might need to separate some, and some filters may have to be run in the pipeline, at least in the beginning.
At least some of these considerations could already have been solved in the work already performed with the SV variant calling pipeline (in particular frequency/quality), so needs to be coordinated.
Particulars for CNV
CNVs will have partly different types of annotation from SNV, and we need to ensure that new annotation is implemented in a way that make the available to the filters.
Filters for all samples:
- Classification: Should be handled in a "minimal" CNV solution
-
Frequency:
- CNV must use different in-house databases (in-house/SVDB) for meaningful data, but maybe not a problem to have all DBs enabled for any variant?
- There's a separate structural variant dataset from gnomAD. See also literature reference.
- Region: SNV filters deep intron/UTR variants, and this will still be relevant for small CNVs. For larger CNVs, this will depend on if the analysis is "whole genome" (large CNVs only) or gene panel-restricted. For the latter, only variants overlapping with gene panel genes should be considered, but this should perhaps be handled separately from ELLA filters. Or use the Consequence filter?
- Consequence: This uses the VEP CSQ field (currently only used for filtering synonymous variants), should be able to use this directly also for CNVs? E.g. could consider adding filtering for "intergenic_variant" and similar.
- Polypyrimidine: Not relevant for CNV (as they never will be <3 nt)
- Quality: SNV use allele ratio (WES/WGS)/NOT PASS (EHG target) - what can be used for CNV? Should quality filtering be done in the pipeline only? In any case, filters need to handle CNV and SNV separately for this step.
- External:
- Gene:
Single-specific filters:
- Inheritance model: Depends on annotation with which genes are affected/overlapping. When overlapping with a single gene, this should not pose any problems, but what about CNVs overlapping multiple genes?
Trio-specific filters:
- Segregation: ** "parent genotype missing", "de novo" and "homozygous recessive" should not pose any particular problems? ** "compound heterozygous": same considerations as "inheritance model"
Gene-level filters:
Combinations with SNV
For some filters, the considerations of possible combinations of CNVs and SNVs in the same gene are important:
- "Inheritance model": When checking for "single variant" in a gene, either comnbinations of (unfiltered) SNV+SNV, CNV+CNV or SNV+CNV means the filter criteria are not fulfilled, and the variants should not be filtered
- "Segregation": When looking for combined heterozygotes, either SNV or CNV should be considered, e.g. SNV+CNV where each come from a different parent means the variants should not be filtered.
Older notes
Possible filter variables:
- Quality
- Known SV - decide on what is a “match”
- Consequence, incl. gene-level annotation (e.g. haploinsufficience)
- Inheritance
- Phenotype (HPO) - gene association?
- Selected regions in/out?
Implementation
[Describe suggested solution; This should possibly be broken down into multiple issues?]