Add REST API endpoint for semantic code search (Beta)

Summary

Adds a Beta REST API endpoint for semantic code search, making the feature available to CLI agents, CI/CD pipelines, scripts, and compliance-restricted customers who cannot use the MCP server.

GET /api/v4/projects/:id/search/semantic
GET /api/v4/projects/:id/-/search/semantic

Parameters:

Parameter Type Required Default Description
id string/integer yes Project ID or URL-encoded path
q string yes Natural language search query
directory_path string no nil Restrict to files under this path
knn integer no 10 Nearest neighbours (1–100)
limit integer no 10 Max results (1–100)

Response (200):

Results are grouped by file. Each file includes merged adjacent line ranges and a relevance score. An overall confidence level is included based on score distribution.

{
  "confidence": "high",
  "results": [
    {
      "path": "lib/gitlab/rack_attack.rb",
      "blob_id": "a1b2c3d4",
      "score": 0.9231,
      "snippet_ranges": [
        {
          "start_line": 88,
          "end_line": 90,
          "content": "throttle('api', limit: 600, period: 1.minute) do |req|\n  req.ip if req.api_request?\nend",
          "score": 0.9231
        },
        {
          "start_line": 110,
          "end_line": 112,
          "content": "throttle('unauthenticated', limit: 3600, period: 1.hour) do |req|\n  req.ip unless req.authenticated_user_id([:api])\nend",
          "score": 0.8540
        }
      ]
    },
    {
      "path": "lib/gitlab/application_rate_limiter.rb",
      "blob_id": "e5f6a7b8",
      "score": 0.8107,
      "snippet_ranges": [
        {
          "start_line": 42,
          "end_line": 45,
          "content": "def self.throttled?(key, scope:, threshold:, interval:)\n  cache_key = build_key(key, scope)\n  ...\nend",
          "score": 0.8107
        }
      ]
    }
  ]
}

Confidence levels: high (strong top score with steep drop-off), medium (reasonable scores, no clear winner), low (weak scores), unknown (no scores available).

Error responses:

Status Condition
401 Not authenticated
403 User lacks read_code on project
404 Project not found
422 Project has no embeddings / not eligible
429 Rate limited
503 Semantic search service unavailable

Implementation notes:

  • Backed by Ai::ActiveContext::Queries::Code — same service as the MCP tool
  • Post-processing via Ai::ActiveContext::Concerns::CodePostProcessing (shared with MCP tool): filter_excluded_results, group_results_by_file, compute_confidence_level
  • Uses search_rate_limit (consistent with existing search endpoints, includes allowlist support)
  • Marked Beta via route_setting :lifecycle, :beta
  • ai_workflows PAT scope accepted for GET requests

How to test

cURL:

curl --header "PRIVATE-TOKEN: $TOKEN" \
  "https://gitlab.com/api/v4/projects/:id/search/semantic?q=authentication+middleware"

curl --header "PRIVATE-TOKEN: $TOKEN" \
  "https://gitlab.com/api/v4/projects/:id/search/semantic?q=rate+limiting&directory_path=lib/&limit=5"

Unit tests:

bundle exec rspec ee/spec/requests/api/search/semantic_spec.rb
Edited by Tian Gao

Merge request reports

Loading