build: add script for ingestion to vertex ai search
Note: This is a high priority MR for Solution implementation for "users can ask docu... (gitlab-org/gitlab#451215 - closed) in %17.0
What does this merge request do and why?
This MR introduces a script make ingest
to ingest and refresh GitLab Documentations served by Vertex AI Search (Agent Builder). This data will be used for documentation tool of Duo Chat.
See the doc for more information.
How to set up and validate locally
See Ingest GitLab documentations locally and Test search app in GCP console sections. We'll add an endpoint later in AI Gateway (example).
Here is a test run result:
make ingest > ingest.log 2>&1
- Execution date: Wed May 1 02:00:24 AM UTC 2024
- Execution SHA: ac69a3c5 (latest feature branch)
Further reading
We'll Generalize ingestion process for any public data (#446) in the future. For the sake of high priority of docs support and Keep it simple principle, this MR is tailored for gitlab docs.
We're also working on daily data refreshment in CI/CD pipelines in ci: ingest gitlab docs in pipeline schedules (!774 - merged).
Merge request checklist
-
Tests added for new functionality. If not, please raise an issue to follow up.=> Add tests for ingestion scripts (#448 - closed) is the follow-up issue. -
Documentation added/updated, if needed.