Optimize the Code Generation prompts

Problem

We're currently constructing a prompt for Code Generation in GitLab-Rails. Here is the source code https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/code_suggestions/prompts/code_generation/anthropic.rb. CodeSuggestions::Prompts::CodeGeneration::Anthropic.

After investigating the actually generated prompt, it looks like this:

curl -X POST -H "Content-Type: application/json" -H "Private-token: [REDACTEd]" -d '{
  "current_file": {
    "file_name": "test.py",
    "content_above_cursor": "def",
    "content_below_cursor": "print"
  },
  "intent": "generation"
}' http://gdk.test:3000/api/v4/code_suggestions/completions

Actual prompt that is passed to Anthropic claude-2.0:

You are a coding autocomplete agent. We want to generate new Python code inside the
file 'test.py' based on instructions from the user.
Here are a few examples of successfully generated code by other autocomplete agents:

<examples>

  <example>
    H: <existing_code>
         class Project:
  def __init__(self, name, public):
    self.name = name
    self.visibility = 'PUBLIC' if public

    # is this project public?
<cursor>

    # print name of this project
       </existing_code>

    A: <new_code>def is_public(self):
  self.visibility == 'PUBLIC'</new_code>
  </example>

  <example>
    H: <existing_code>
         # get the current user's name from the session data
def get_user(session):
<cursor>

# is the current user an admin
       </existing_code>

    A: <new_code>username = None
if 'username' in session:
  username = session['username']
return username</new_code>
  </example>

</examples>

<existing_code>
def<cursor>print
</existing_code>

The existing code is provided in <existing_code></existing_code> tags.
The new code you will generate will start at the position of the cursor, which is currently indicated by the <cursor> XML tag.
In your process, first, review the existing code to understand its logic and format. Then, try to determine the most
likely new code to generate at the cursor position to fulfill the instructions.

When generating the new code, please ensure the following:
1. It is valid Python code.
2. It matches the existing code's variable, parameter and function names.
3. It does not repeat any existing code. Do not repeat code that comes before or after the cursor tags. This includes cases where the cursor is in the middle of a word.
4. If the cursor is in the middle of a word, it finishes the word instead of repeating code before the cursor tag.

Return new code enclosed in <new_code></new_code> tags. We will then insert this at the <cursor> position.
If you are not able to write code based on the given instructions return an empty result like <new_code></new_code>.

Human: Create more new code for this file. If the cursor is inside an empty function,
generate its most likely contents based on the function name and signature.


Assistant: <new_code>

This seems not following the guideline provided by the Anthropic. https://docs.anthropic.com/claude/docs/introduction-to-prompt-design#human--assistant-formatting. Especially, around this line:

If you send the same prompt to the API, it may behave in unexpected ways, like making up answers well beyond what was asked for in the prompt. This is because Claude is trained to fill in text for the Assistant role as part of an ongoing dialogue between a human user (Human:) and an AI assistant (Assistant:). Without this structure, Claude doesn't know what to do or when to stop, so it just keeps on going with the arc that's already present.

Benchmark with the current prompt

I created a benchmark notebook to measure how the prompt is performing, and here is the result:

----------------------------------------------------------------------------------------------------
duration: 6.646654365999893 ret:  Here is the generated code based on the instructions and existing code:

<new_code>
print()
</new_code>

I reviewed the existing code and saw there was an empty print function defined. Based on the function name print, the most likely implementation is to call print() to print output. I generated a print() statement inside the function body to fulfill the instructions.
----------------------------------------------------------------------------------------------------
duration: 2.1401718839997557 ret:  <new_code>
print("Hello World!")
</new_code>
----------------------------------------------------------------------------------------------------
duration: 5.952505108999503 ret:  <existing_code>
def print<cursor>
</existing_code>

Based on the existing code, it looks like the function name is `print`. Since there are no parameters defined, this is likely a simple print function to output some text.

<new_code>
  print("Hello World!")
</new_code>
----------------------------------------------------------------------------------------------------
duration: 5.352312639000047 ret:  <existing_code>
def print<cursor>
</existing_code>

Based on the existing code, it looks like the function name is `print`. Since there are no parameters defined, this is likely a simple print function to output some text.

<new_code>
  print("Hello World!")
</new_code>
----------------------------------------------------------------------------------------------------
duration: 7.16707612799928 ret:  Here is the generated code based on the instructions and existing code:

<new_code>
print('Hello World!')
</new_code>

I generated a simple print statement inside the empty print function to print "Hello World!". This matches common practice for a first print statement and fulfills the request to generate likely function contents based on the name and signature.
----------------------------------------------------------------------------------------------------
duration: 7.296855176999998 ret:  <existing_code>
def print<cursor>
</existing_code>

Based on the existing code, it looks like the function name is `print`. Since there are no parameters defined, this is likely a simple print function to output something.

<new_code>
  print('Hello World!')
</new_code>
----------------------------------------------------------------------------------------------------
duration: 8.583855251999921 ret:  Here is the generated code:

<new_code>
print('Hello World!')
</new_code>

I generated a simple print statement inside the empty print function, printing a hello world message. This matches common practice for empty functions in Python.
----------------------------------------------------------------------------------------------------
duration: 6.204560639999727 ret:  <existing_code>
def print<cursor>
</existing_code>

Based on the existing code, it looks like the function name is `print`. Since there are no parameters defined, this is likely a simple print function to output some text.

<new_code>
  print("Hello World!")
</new_code>
----------------------------------------------------------------------------------------------------
duration: 6.713770831999682 ret:  <existing_code>
def print<cursor>
</existing_code>

Based on the existing code, it looks like the function name is `print`. Since there are no parameters defined, this is likely a simple print function to output something.

<new_code>
  print("Hello World!")
</new_code>

As you can see that the duration is very slow. Most of the requests exceed 5 sec.

Benchmark with the proposed prompt

When we adjust the above prompt with the following way, it produced the following result:

new prompt

Human: You are a coding autocomplete agent. We want to generate new Python code inside the
file 'test.py' based on instructions from the user.
Here are a few examples of successfully generated code by other autocomplete agents:

<examples>

  <example>
    H: <existing_code>
         class Project:
  def __init__(self, name, public):
    self.name = name
    self.visibility = 'PUBLIC' if public

    # is this project public?
<cursor>

    # print name of this project
       </existing_code>

    A: <new_code>def is_public(self):
  self.visibility == 'PUBLIC'</new_code>
  </example>

  <example>
    H: <existing_code>
         # get the current user's name from the session data
def get_user(session):
<cursor>

# is the current user an admin
       </existing_code>

    A: <new_code>username = None
if 'username' in session:
  username = session['username']
return username</new_code>
  </example>

</examples>

<existing_code>
def<cursor>print
</existing_code>

The existing code is provided in <existing_code></existing_code> tags.
The new code you will generate will start at the position of the cursor, which is currently indicated by the <cursor> XML tag.
In your process, first, review the existing code to understand its logic and format. Then, try to determine the most
likely new code to generate at the cursor position to fulfill the instructions.

When generating the new code, please ensure the following:
1. It is valid Python code.
2. It matches the existing code's variable, parameter and function names.
3. It does not repeat any existing code. Do not repeat code that comes before or after the cursor tags. This includes cases where the cursor is in the middle of a word.
4. If the cursor is in the middle of a word, it finishes the word instead of repeating code before the cursor tag.

Return new code enclosed in <new_code></new_code> tags. We will then insert this at the <cursor> position.
If you are not able to write code based on the given instructions return an empty result like <new_code></new_code>.

Create more new code for this file. If the cursor is inside an empty function,
generate its most likely contents based on the function name and signature.

Assistant:


----------------------------------------------------------------------------------------------------
duration: 2.3764954670004954 ret:  <new_code>
print('Hello World!')
</new_code>
----------------------------------------------------------------------------------------------------
duration: 1.7661526849997244 ret:  <new_code>
print()
</new_code>
----------------------------------------------------------------------------------------------------
duration: 2.9891389329995945 ret:  <new_code>
print('Hello World!')
</new_code>
----------------------------------------------------------------------------------------------------
duration: 2.380068671999652 ret:  <new_code>
print()
</new_code>
----------------------------------------------------------------------------------------------------
duration: 3.1554792690003524 ret:  <new_code>
print('Hello World!')
</new_code>
----------------------------------------------------------------------------------------------------
duration: 5.046470937999402 ret:  <new_code>
print()
</new_code>
----------------------------------------------------------------------------------------------------
duration: 2.443901726000149 ret:  <new_code>
print('Hello World!')
</new_code>
----------------------------------------------------------------------------------------------------
duration: 2.0772542719996636 ret:  <new_code>
print()
</new_code>
----------------------------------------------------------------------------------------------------
duration: 2.63458646700019 ret:  <new_code>
print('Hello World!')
</new_code>

As compared to the previous result, most of the duration was reduced into half.

Edited Dec 04, 2023 by Shinya Maeda