Skip to content

feat: move vulnerability extraction to Prompt Library

What does this merge request do and why?

This MR migrates over the existing Vulnerability extraction script under the prompt library. The core logic remains the same, the changes are mostly just refactoring:

  • Replaced the dataflow pipeline with the asyncio.run(tasks) pattern used by the other extraction scripts.
  • Separated the graphql request from the REST requests.
  • Replaced the graphql and REST call patterns with the ones we already use in Prompt Library.
  • Moved the vulnerability.signature check before the REST calls so we don't make the calls if the vulnerability already exists.
  • Added more try blocks to avoid having a single error stopping the whole pipeline.
  • Added base-url flag so we can extract from various environments.
  • Added option to write results to a local jsonl file instead of BQ.

After this MR is merged, we can iterate on the changes needed to build a v7 dataset using GDK.

Closes #464 (closed)

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Merge request checklist

  • I've ran the affected pipeline(s) to validate that nothing is broken.
  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Andras Herczeg

Merge request reports

Loading