Draft: PoC of PG for XRay embeddings

Mikołaj Wawrzyniak requested to merge mwaw/poc_x_ray_embeddings into master

What does this MR do and why?

This MR contains PoC of semantic search over Repository X Ray data for code generation requests

How to set up and validate locally

In theory embeddings should get created upon storing X Ray report from CI job. However to both avoid hustle of setting up whole X Ray pipeline to verify one, can trigger embeddings generation for existing X Ray report record with

service = nil
xray_report = Projects::XrayReport.first
service.send(:create_embeddings, xray_report.payload,

Once that is done it once can trigger code generation request via cul

  curl -X POST -H "Content-Type: Application/json"  -H "Authorization: Bearer $GDK_API_TOKEN" http://gdk.test:3000/api/v4/code_suggestions/completions --data '{   "current_file": {
      "file_name": "utils.rb",
      "content_above_cursor": "# generate function that fetches most popular app from App Store",
      "content_below_cursor": ""
    "project_path": "gitlab-org/mw_test",
    "intent": "generation"

It is important that params are supplied as follow:

  1. file_name points to a file with extension that matches language of Projects::XrayReport.first that was used to generate embeddings
  2. content_above_cursor should contain code comment with instruction onto what need to be generate, possibly in alignment with libraries listed in Projects::XrayReport.first
  3. project_path should match Projects::XrayReport.first.project.full_path

If give condition are matched one should see list of up to 5 relevant libraries from Projects::XrayReport.first in code generation prompt logged on AI Gateway

{"prompt": "Human: You are a tremendously .....<libs>\napp_store_connect: Ruby client for the App Store Connect API.\natlassian-jwt (~> 0.2.0): Encode and decode JWTs\nactiverecord-gitlab!: Fork of activerecord gem for GitLab.\narr-pm (~> 0.0.12): Arr helps you manage your Ruby app dependencies.\nattr_encrypted (~> 3.2.4)!: Generates attr_accessors that encrypt and decrypt attributes transparently.\n</libs>\nThe list of libraries relevant to your current task is provided in <libs></libs> tags.\nPrioritise using available libraries over writing your own code.....
