Generic annotation filter
Background
With configurable annotation, we should be able to configure on arbitrary parts of the annotation JSON object (column annotations
of the annotation
table).
We would need to implement a separate filter class, e.g. GenericAnnotationFilter
.
Implementation
Below are some options for how to implement this. The options are not mutually exclusive, and we can implement multiple options.
Perhaps it makes sense to implement a JSON schema for full flexibility, and then implement a rule based filter for the "easy" cases?
Option 1: JSON schema
The easiest option from a development perspective would be to use JSON schema to define the filter.
This would be a very flexible solution, but it would also be very complex to use for the end user, opening up for a lot of potential errors.
It could work by attempting to match the JSON schema against the annotation JSON object, and if it matches, the variant passes the filter.
Example config:
{
"target": "external.varde",
"schema": {
"type": "object",
"properties": {
"ousamg": {
"type": "object",
"properties": {
"classificiation": {
"enum": ["1", "2"]
}
},
"additionalProperties": true
}
},
"additionalProperties": true
}
}
This would match the following annotation JSON object:
{
"external": {
"varde": {
"ousamg": {
"classificiation": "1"
},
"...": "...",
},
"..." : "..."
},
"...": "..."
}
Option 2: Rule based
Using a rule based implementation requires a bit more development, but it would (hopefully) be easier to use for the end user.
Possible example config:
{ "target": "prediction.spliceai", "is_array": True, "config": [ {"key": "DS_AG", "operator": "<", "value": 0.05}, {"key": "DS_AL", "operator": "<", "value": 0.05}, {"key": "DS_DG", "operator": "<", "value": 0.05}, {"key": "DS_DL", "operator": "<", "value": 0.05}, ], "array_mode": "all", }
This would match the following annotation JSON object:
{
"prediction": {
"spliceai": [
{
"DP_AG": -6,
"DP_AL": -47,
"DP_DG": -6,
"DP_DL": 45,
"DS_AG": 0.01,
"DS_AL": 0.0,
"DS_DG": 0.0,
"DS_DL": 0.0,
"SYMBOL": "NM_001369.2"
},
{
"DP_AG": -62,
"DP_AL": -4,
"DP_DG": -61,
"DP_DL": 4,
"DS_AG": 0.0,
"DS_AL": 0.02,
"DS_DG": 0.01,
"DS_DL": 0.03,
"SYMBOL": "NM_001369.2"
}
],
"...": "...",
},
"...": "..."
}
Considerations
Operators
For this to work as a fully generic solution (not necessarily required), we would need to implement an exhaustive list of operators.
Possibly non-exhaustive list of operators (along with their supported types):
-
is
(None, bool) -
is not
(None, bool) -
==
(string, int, float) -
!=
(string, int, float) -
>
(int, float) -
>=
(int, float) -
<
(int, float) -
<=
(int, float) -
in
(value: string, int, float, target: array, string) -
not in
(value: string, int, float, target: array, string) -
contains
(value: array, string, target: array, string) -
not contains
(value: array, string, target: array, string) -
overlap
(value: array, target: array) -
not overlap
(value: array, target: array)
and/or
In addition, we would need to decide on a strategy for "and" and "or" operations. One way is to say that the config should be an array of arrays, where the inner array is "and"-ed, and the outer arrays are "or"-ed.
E.g.:
{
"config": [
[
{"key": "DS_AG", "operator": "<", "value": 0.05},
{"key": "DS_AL", "operator": "<", "value": 0.05},
{"key": "DS_DG", "operator": "<", "value": 0.05},
{"key": "DS_DL", "operator": "<", "value": 0.05},
],
[
{"key": "DP_AG", "operator": "<", "value": -60},
],
[
{"key": "DP_AL", "operator": ">", "value": 60},
]
]
}
Can be read as "filter out if (DS_AG < 0.05 AND DS_AL < 0.05 AND DS_DG < 0.05 AND DS_DL < 0.05) OR (DP_AG < -60) OR (DP_AL > 60)"
We can also decide to have a single array, and then have a mode
parameter that decides whether the variants should pass if they match all the rules, or if they should pass if they match any of the rules. If the user requires a combination, they can just add multiple filters (or exception filters).
Array mode
We would also need to decide on how to handle arrays. Should the filter pass if any of the values in the array matches the rule, or should it only pass if all the values in the array matches the rule?