[Testing] - Resolve this Vulnerability - Capture prompt/response for sample set of vulnerabilities
This is an experiment.
Problem to be solved
Vulnerability remediation can be complex and may not be a simple code change. It requires time, testing, research, domain knowledge, coordination often across multiple teams and stakeholders.
Assumption
What assumptions are you making about this problem and the solution?
- Large language models (LLMs) could be used to remediate vulnerabilities.
Personas
What personas have this problem, who is the intended user?
- Sam (Security Analyst) uses this feature to begin triaging their vulnerability report.
Requirements
Manual curation
Please use the following template to report result from different prompts. In this Experiment we'd like to test the Google PaLM codechat-bison-001
model.
Template
For this prompt with these settings:
- AI Model:
- Temperature:
- Token limit:
- Top-K:
- Top-P:
Description: [please describe why you used this model and chose these parameters].
Prompts
Please list the different prompts that you used and why you decided to adjust them.
Final prompt
Include the prompt you used for testing here.
Responses
Please copy and paste the following template for your testing.
### DATA START ###
### DATA END ###
## Response 1
### Analysis
Conclusion
Please include your thoughts on this experiment and fill out the table below.
Link to analysis | text-bison@001 | ChatGPT-3.5 | ChatGPT-4 |
---|---|---|---|
1 | Poor response | Poor response | Great response |
2 | Good response | Good response | Great response |
3 | Good response | Good response | Great response |
4 | Good response | Good response | Great response |
5 | Good response | Good response | Good response |
6 | Good response | Good response | Good response |
General Concerns
- The max token limit is 1024, which is roughly 800 words. Occasionally this will exceed the token limit, and the response will be cut off:
![]() |
There is an MR open for discussion to raise the limit to 2048: Increase the Vertex AI token limit to the new 2... (!122864 - merged). This should mitigate issues regarding to prompt length fairly substantially. |
-
Sometimes the AI will respond with "I'm not able to help with that", but you can re-run the prompt again and get an answer the second time. It's tripping up on the content filter, but it's unknown what's triggering it. We can only set the content filter to "Block few" as a minimum, but not disable it completely:Edit: Turns out this is only a frontend check on the GCP playground, it doesn't affect us. See this comment: #412538 (comment 1425858730)
![]() |
- With the prompt that's currently used on production, sometimes the model will ignore the title, description, and identifiers, and instead looks only at the source code and infers the vulnerability from it alone. This will frequently cause it to explain a vulnerability different than the one the scanner picked up. We can update the prompt so that the model has an easier time parsing it, and I suggested one here.
What's next?
With some prompt modification, we were able to get some good responses. Detailed analysis is included in the links in the table above. At this time (Friday, June 9th) we are waiting for access to the Sec-Palm2
model to see if this yields better results. Once we have access to the Sec-Palm2
model we will commence with testing using the Explain this Vulnerability: Prompt / Response e... (#414871 - closed).