Support dumping incidents from DW
Support dumping DW incidents ("issue occurrences" in DW parlance) and their requisites as a KCIDB dataset, detailed enough to support training the Kwai model.
- [ ] Decide on a way to represent the required data in KCIDB I/O
* Support dumping a timestamp range of data, so we could (re)dump in chunks.
* Download the requisite log files and cache them in an LZMA-compressed ZIP file (similarly as was done so far for Kwai), additionally/optionally access an arbitrary list of similar ZIP files for already-downloaded files (as was already done too).
* We'll have to have the following objects in the dump: `incidents` (d'uh!), `issues` (to represent culprits/labels), `tests` (to represent output files and "test paths"), `builds` (to represent architectures). The `checkouts` might not be strictly necessary, but could be useful for debugging, and in theory in the future.
* Add `evidence` array attribute to `incidents` (in `misc` for the start), listing all the places in the linked object's log files pointing to the issue occurring. To match DW logic each item in the array *alone* signifies the occurrence (in DW any one regex match is sufficient). And we're only describing output file contents matches for now.
* DW has more precise (even if a bit haphazard) culprit identification than KCIDB. Put extra classification under `misc` to augment e.g. `harness` when it's specified. At least for the start.
- [x] Implement querying all the necessary data from DW
* A temporary table of filtered incidents, and perhaps something else, and separate queries for separate object types seem to be in order.
- [ ] Implement issue matching logic, producing KCIDB data
* Suffer for now, but get rid of this in favor of simply dumping data generated by https://gitlab.com/cki-project/datawarehouse/-/issues/587
- [ ] Implement generating and dumping the KCIDB dataset representation
Jira: [CKI-6405](https://issues.redhat.com/browse/CKI-6405)
issue