Skip to content

Draft: PoC of PG for XRay embeddings

Mikołaj Wawrzyniak requested to merge mwaw/poc_x_ray_embeddings into master

What does this MR do and why?

This MR contains PoC of semantic search over Repository X Ray data for code generation requests

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

demo

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

In theory embeddings should get created upon storing X Ray report from CI job. However to both avoid hustle of setting up whole X Ray pipeline to verify one, can trigger embeddings generation for existing X Ray report record with

service = Ai::StoreRepositoryXrayService.new nil
xray_report = Projects::XrayReport.first
service.send(:create_embeddings, xray_report.payload, xray_report.id)

Once that is done it once can trigger code generation request via cul

  curl -X POST -H "Content-Type: Application/json"  -H "Authorization: Bearer $GDK_API_TOKEN" http://gdk.test:3000/api/v4/code_suggestions/completions --data '{   "current_file": {
      "file_name": "utils.rb",
      "content_above_cursor": "# generate function that fetches most popular app from App Store",
      "content_below_cursor": ""
    },
    "project_path": "gitlab-org/mw_test",
    "intent": "generation"
  }'

It is important that params are supplied as follow:

  1. file_name points to a file with extension that matches language of Projects::XrayReport.first that was used to generate embeddings
  2. content_above_cursor should contain code comment with instruction onto what need to be generate, possibly in alignment with libraries listed in Projects::XrayReport.first
  3. project_path should match Projects::XrayReport.first.project.full_path

If give condition are matched one should see list of up to 5 relevant libraries from Projects::XrayReport.first in code generation prompt logged on AI Gateway

{"prompt": "Human: You are a tremendously .....<libs>\napp_store_connect: Ruby client for the App Store Connect API.\natlassian-jwt (~> 0.2.0): Encode and decode JWTs\nactiverecord-gitlab!: Fork of activerecord gem for GitLab.\narr-pm (~> 0.0.12): Arr helps you manage your Ruby app dependencies.\nattr_encrypted (~> 3.2.4)!: Generates attr_accessors that encrypt and decrypt attributes transparently.\n</libs>\nThe list of libraries relevant to your current task is provided in <libs></libs> tags.\nPrioritise using available libraries over writing your own code.....
Edited by Mikołaj Wawrzyniak

Merge request reports