Update explain vulnerability prompt for moderation (!123194) · Merge requests · GitLab.org / GitLab

Neil McCorrison requested to merge 414880-update-explain-exploit-prompt into master Jun 09, 2023

What does this MR do and why?

Vertex AI introduced content moderation around 2023-06-04 and is failing our prompt about 50% of the time

Changes prompt from:

Provide a code example with syntax highlighting on how to exploit it.

Provide a code example with syntax highlighting on how an attacker can take advantage of the vulnerability.

Example responses using the GCP sandbox:

Analysis we have conducted should see a decrease of the content moderation blocking prompt responses from 16.44% of the time to only 2.35% with this prompt change. This may also result in up to an approximate 10% reduction in requests failing, likely due to less time spent in the Google API's during the content moderation cycles.

How to set up and validate locally

There should be no operational difference except for a distinct decrease in the amount of responses that are actively content blocked by the Google Moderation algorithm.

When the content is blocked by the moderation algorithm, the response received is I'm not able to help with that, as I'm only a language model. If you believe this is an error, please send us your feedback.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Related to #414880 (closed)

Edited Jun 14, 2023 by Gregory Havenga

Update explain vulnerability prompt for moderation

What does this MR do and why?

How to set up and validate locally

MR acceptance checklist

Merge request reports