[Testing] - Resolve this Vulnerability - Capture prompt/response for sample set of vulnerabilities

Problem to be solved

Vulnerability remediation can be complex and may not be a simple code change. It requires time, testing, research, domain knowledge, coordination often across multiple teams and stakeholders.

Assumption

What assumptions are you making about this problem and the solution?

Large language models (LLMs) could be used to remediate vulnerabilities.

Personas

What personas have this problem, who is the intended user?

Sam (Security Analyst) uses this feature to begin triaging their vulnerability report.

Requirements

Manual curation

Please use the following template to report result from different prompts. In this Experiment we'd like to test the Google PaLM codechat-bison-001 model.

Template

For this prompt with these settings:

AI Model:
Temperature:
Token limit:
Top-K:
Top-P:

Description: [please describe why you used this model and chose these parameters].

Prompts

Please list the different prompts that you used and why you decided to adjust them.

Final prompt

Include the prompt you used for testing here.

Responses

Please copy and paste the following template for your testing.

### DATA START ###

### DATA END ###

## Response 1

### Analysis

Conclusion

Please include your thoughts on this experiment and fill out the table below.

Link to analysis	text-bison@001	ChatGPT-3.5	ChatGPT-4
1	Poor response	Poor response	Great response
2	Good response	Good response	Great response
3	Good response	Good response	Great response
4	Good response	Good response	Great response
5	Good response	Good response	Good response
6	Good response	Good response	Good response

General Concerns

The max token limit is 1024, which is roughly 800 words. Occasionally this will exceed the token limit, and the response will be cut off:



There is an MR open for discussion to raise the limit to 2048: Increase the Vertex AI token limit to the new 2... (!122864 - merged). This should mitigate issues regarding to prompt length fairly substantially.

Sometimes the AI will respond with "I'm not able to help with that", but you can re-run the prompt again and get an answer the second time. It's tripping up on the content filter, but it's unknown what's triggering it. We can only set the content filter to "Block few" as a minimum, but not disable it completely: Edit: Turns out this is only a frontend check on the GCP playground, it doesn't affect us. See this comment: #412538 (comment 1425858730)

With the prompt that's currently used on production, sometimes the model will ignore the title, description, and identifiers, and instead looks only at the source code and infers the vulnerability from it alone. This will frequently cause it to explain a vulnerability different than the one the scanner picked up. We can update the prompt so that the model has an easier time parsing it, and I suggested one here.

What's next?

With some prompt modification, we were able to get some good responses. Detailed analysis is included in the links in the table above. At this time (Friday, June 9th) we are waiting for access to the Sec-Palm2 model to see if this yields better results. Once we have access to the Sec-Palm2 model we will commence with testing using the Explain this Vulnerability: Prompt / Response e... (#414871 - closed).

Edited Jul 21, 2023 by Alana Bellucci