feat: add project deletion for chunk mode

Description

This adds the ability to delete projects to the chunk mode indexer.

Main changes:

  • New delete project feature: Added the ability to remove all indexed content for a specific project completely.
  • Introduce operation types: It supports different operations (index files or delete project) through a new "operation" parameter. If no operation is specified, it defaults to the existing indexing behavior, maintaining backward compatibility.
  • Improve documentation: Updated the README explaining the available operations, and command-line options.

Testing

I've also tested this by indexing two projects and then deleting one of them, and checking the indexed data and counts.

Rails console

p = Project.find_by_full_path('gitlab-org/gitlab-test')
p.repository.relative_path
=> "@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.git"
p = Project.find_by_full_path('flightjs/Flight')
p.repository.relative_path
=> "@hashed/79/02/7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451.git"

Command line running indexer

make && GITLAB_INDEXER_MODE=chunk GITLAB_INDEXER_DEBUG_LOGGING=1 ./bin/gitlab-elasticsearch-indexer -adapter "elasticsearch" -connection '{"url": ["http://localhost:9200"]}' -options '{
  "timeout": "30m",
  "chunk_size": 1000,
  "gitaly_batch_size": 1000,
  "from_sha": "",
  "to_sha": "",
  "project_id": 1,
  "partition_name": "gitlab_active_context_code",
  "partition_number": 0,
  "gitaly_config": {
    "address": "unix:/Users/arturo/projects/gdk/praefect.socket",
    "storage": "default",
    "relative_path": "@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.git",
    "project_path": "gitlab-org/gitlab-test"
  }
}'
make && GITLAB_INDEXER_MODE=chunk GITLAB_INDEXER_DEBUG_LOGGING=1 ./bin/gitlab-elasticsearch-indexer -adapter "elasticsearch" -connection '{"url": ["http://localhost:9200"]}' -options '{
  "timeout": "30m",
  "chunk_size": 1000,
  "gitaly_batch_size": 1000,
  "from_sha": "",
  "to_sha": "",
  "project_id": 2,
  "partition_name": "gitlab_active_context_code",
  "partition_number": 0,
  "gitaly_config": {
    "address": "unix:/Users/arturo/projects/gdk/praefect.socket",
    "storage": "default",
    "relative_path": "@hashed/79/02/7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451.git",
    "project_path": "flightjs/Flight"
  }
}'
make && GITLAB_INDEXER_MODE=chunk GITLAB_INDEXER_DEBUG_LOGGING=1 ./bin/gitlab-elasticsearch-indexer   -adapter "elasticsearch"   -connection '{"url": ["http://localhost:9200"]}'   -options '{
    "project_id": 2,
    "operation": "delete_project",
    "partition_name": "gitlab_active_context_code",
    "partition_number": 0,
    "timeout": "5m"
  }'

Kibana

http://localhost:5601/app/dev_tools#/console

GET gitlab_active_context_code/_count

Reference: https://gitlab.com/gitlab-org/gitlab/-/blob/master/gems/gitlab-active-context/doc/code_embeddings_indexing_pipeline.md#alternative-manual-indexing-with-gitlab-elasticsearch-indexer

Related to #158 (closed)

Merge request reports

Loading