Support dumping incidents from DW
Support dumping DW incidents ("issue occurrences" in DW parlance) and their requisites as a KCIDB dataset, detailed enough to support training the Kwai model. - [ ] Decide on a way to represent the required data in KCIDB I/O * Support dumping a timestamp range of data, so we could (re)dump in chunks. * Download the requisite log files and cache them in an LZMA-compressed ZIP file (similarly as was done so far for Kwai), additionally/optionally access an arbitrary list of similar ZIP files for already-downloaded files (as was already done too). * We'll have to have the following objects in the dump: `incidents` (d'uh!), `issues` (to represent culprits/labels), `tests` (to represent output files and "test paths"), `builds` (to represent architectures). The `checkouts` might not be strictly necessary, but could be useful for debugging, and in theory in the future. * Add `evidence` array attribute to `incidents` (in `misc` for the start), listing all the places in the linked object's log files pointing to the issue occurring. To match DW logic each item in the array *alone* signifies the occurrence (in DW any one regex match is sufficient). And we're only describing output file contents matches for now. * DW has more precise (even if a bit haphazard) culprit identification than KCIDB. Put extra classification under `misc` to augment e.g. `harness` when it's specified. At least for the start. - [x] Implement querying all the necessary data from DW * A temporary table of filtered incidents, and perhaps something else, and separate queries for separate object types seem to be in order. - [ ] Implement issue matching logic, producing KCIDB data * Suffer for now, but get rid of this in favor of simply dumping data generated by https://gitlab.com/cki-project/datawarehouse/-/issues/587 - [ ] Implement generating and dumping the KCIDB dataset representation Jira: [CKI-6405](https://issues.redhat.com/browse/CKI-6405)
issue