Create vulnerabilities ES Index
What does this MR do and why?
Creates the Vulnerabilities ES index along with the document schema and default index settings.
The other MR was merged and reverted because of this reason.
Database
Preloading logic query plans:
-
Vulnerabilities::Scanner Load- https://console.postgres.ai/gitlab/gitlab-production-main/sessions/37864/commands/115763 -
Vulnerability Load- https://console.postgres.ai/gitlab/gitlab-production-main/sessions/37864/commands/115764 -
Project Load- https://console.postgres.ai/gitlab/gitlab-production-main/sessions/37864/commands/115765 -
Vulnerabilities::Finding Load- https://console.postgres.ai/gitlab/gitlab-production-main/sessions/37864/commands/115766 -
Namespace Load- https://console.postgres.ai/gitlab/gitlab-production-main/sessions/37864/commands/115767 -
Vulnerabilities::FindingIdentifier Load- https://console.postgres.ai/gitlab/gitlab-production-main/sessions/37864/commands/115768 -
Route Load- https://console.postgres.ai/gitlab/gitlab-production-main/sessions/37864/commands/115769 -
Vulnerabilities::Identifier Load- https://console.postgres.ai/gitlab/gitlab-production-main/sessions/37864/commands/115770
How to test the index settings and the query against test data
- Partial text search test data is available here.
-
partial_text_search_test.rbfile has a hash where the custom mapping and query to be tested can be added for the test run. Add your custom index settings here and the query to test here. - After including your custom index settings and query, run
ruby partial_text_search_test.rband it will run and list out the best index settings and query. Ideally the best setting and query should have 100% success rate without any false positives. - Follow similar setting instructions for
full_text_search_test.rb.
How to set up and validate locally
Setup
Seed vulnerabilities in local:
- Import the project from here into local using the import by url option.
- In the imported project, run pipeline on the master branch and allow the pipeline to complete. This will seed the vulnerabilities data.
- To populate the
pm_cve_enrichmenttable with data forepss_scoresfield, follow the instructions on thereadme.mdfile on the imported project.
Run the ES migration:
- Run the migration in Rails console
Elastic::DataMigrationService[20250408180015].migrate.
Backfill ES index with documents manually:
- In Rails console run the below commands
Vulnerabilities::Read.all.each { |v| ::Elastic::ProcessBookkeepingService.track!(Search::Elastic::References::Vulnerability.new(v.vulnerability_id, "group_#{v.project.namespace.root_ancestor.id}")) }
- Run the bookkeeping command.
Elastic::ProcessBookkeepingService.new.execute
Validation steps:
-
GET gitlab-development-vulnerabilities/_settingsin Kibana Dev console orcurl "http://localhost:9200/gitlab-development-vulnerabilities/_settings"should list the new index being created after running the migration command above. - Find the name of the full index from the above request's response, lets say the index name from the response is
gitlab-development-vulnerabilities-20250319-2109. Verify that the mappings are created successfully by the requestGET gitlab-development-vulnerabilities/_mappingin Kibana or `curl "http://localhost:9200/gitlab-development-vulnerabilities/_mapping". It should look like the below response
{
"gitlab-development-vulnerabilities-20250407-2006": {
"mappings": {
"dynamic": "strict",
"_meta": {
"created_by": "17.11.0-pre"
},
"properties": {
"archived": {
"type": "boolean"
},
"auto_resolved": {
"type": "boolean"
},
"casted_cluster_agent_id": {
"type": "long"
},
"cluster_agent_id": {
"type": "text"
},
"created_at": {
"type": "date"
},
"dismissal_reason": {
"type": "short"
},
"epss_scores": {
"type": "float"
},
"has_issues": {
"type": "boolean"
},
"has_merge_request": {
"type": "boolean"
},
"has_remediations": {
"type": "boolean"
},
"has_vulnerability_resolution": {
"type": "boolean"
},
"id": {
"type": "long"
},
"identifier_names": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "lower_case_normalizer"
},
"ngram": {
"type": "text",
"analyzer": "identifier_ngram_analyzer",
"search_analyzer": "standard"
}
},
"analyzer": "identifier_pattern_analyzer"
},
"location_image": {
"type": "text"
},
"project_id": {
"type": "long"
},
"report_type": {
"type": "short"
},
"resolved_on_default_branch": {
"type": "boolean"
},
"scanner_external_id": {
"type": "text"
},
"scanner_id": {
"type": "long"
},
"schema_version": {
"type": "short"
},
"severity": {
"type": "short"
},
"state": {
"type": "short"
},
"traversal_ids": {
"type": "keyword"
},
"type": {
"type": "keyword"
},
"updated_at": {
"type": "date"
},
"uuid": {
"type": "binary"
},
"vulnerability_id": {
"type": "long"
}
}
}
}
}
Related to #515553 (closed)
Edited by Bala Kumar