Skip to content

Implement Dataflow pipeline to get code embeddings

Alexander Chueshev requested to merge embeddings-gitlab-codebase into main

This MR provides a Dataflow pipeline that can be run locally or remotely to get embeddings for the code completion obtained in https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/prompt-library/-/merge_requests/11.

README.md contains the description of the required steps to run this pipeline.

  • Input BigQuery dataset (sample) - ref
  • Output BigQuery dataset (sample) - ref

Closes https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/184

Merge request reports