feat: Ai Assisted Code Suggestions: prompt collection

Problem to solve

The "data", in this case code, that is send to the Code Suggestions backend, is called prompt. The prompt is used to make a prediction. Currently the whole document, above the cursor, is send to the Code Suggestions backend, this is not desirable as it can be too much data (tokens) for the model, as well as not being relevant etc. Some smart logic needs to be applied, e.g. sending only the last N lines along with the top N lines, language of the file, file name etc. The quality of the prompt, greatly influences the quality of the predictions.

Proposal

Implement variable logic to send a better prompt to the API, this ideally would include multi variate testing, but for now just getting a better prompt than the whole document would already be a great improvement.

Also implement a new data structure to allow the backend to further process the prompt based on additional logic. Proposed data structure:

Prompt version 0

content_above_cursor

.....

Prompt version 1

{
  "prompt_version": 1,
  "project_name": "awesome_project",
  "project_id": 14022,
  "current_file": {
    "file_name": "main.py",
    "content_above_cursor": "",
    "content_below_cursor": "",
  }
}

Prompt version 2

{
  "prompt_version": 2,
  "project_name": "awesome_project",
  "project_id": 14022,
  "current_file": {
    "file_name": "main.py",
    "content_above_cursor": "",
    "content_below_cursor": "",
  },
  "classes": ["A", "B"],
  "additional_files": [
    {
      "file_name": "requirements.txt",
      "content": ""
    },
    {
      "file_name": "README.md",
      "content": ""
    }
  ]
}

Further details

Links / references

https://microsoft.github.io/prompt-engineering/

Edited Apr 20, 2023 by Stephan Rayner