Skip to content

Draft: PoC: Use Elasticsearch for DUO vectors

Madelein van Niekerk requested to merge poc/gitlab-duo-on-elastic into master

This MR shows how we can move storing embeddings for GitLab DUO chat from the pg_vector database to Elasticsearch.

I generated the full dataset locally and got the switch working within 2 hours.

In this example we use a hosted model within Elasticsearch (ELSER) but it can be switched for another model OR we can generate embeddings outside of Elasticsearch and use it as a document store only (this would be even less work).

The results are promising. For the question "How do I change my password?", the top results are very accurate:

If you do not know your current password, select **I forgot my password**\nand complete the form. A password reset email is sent to the email address you\nenter into this form, provided that the email address is verified. If you enter an\nunverified email address into this form, no email is sent, and you see the following\nmessage:\n\n> \"If your email address exists in our database, you will receive a password recovery link at your email address in a few minutes.\"\n\nNOTE:\nYour account can have more than one verified email address, and any email address\nassociated with your account can be verified.\n\n## Password requirements\n\nYour passwords must meet a set of requirements when:\n\n- You choose a password during registration.\n- You choose a new password using the forgotten password reset flow.\n- You change your password proactively.\n- You change your password after it expires.\n- An an administrator creates your account.\n- An administrator updates your account.\n\nBy default GitLab enforces the following password requirements:\n\n- Minimum and maximum password lengths. For example,\n  see [the settings for GitLab.com](../gitlab_com/index.md#password-requirements).\n- Disallowing [weak passwords](#block-weak-passwords).\n\nSelf-managed installations can configure the following additional password requirements:\n\n- [Password minimum and maximum length limits](../../security/password_length_limits.md).
# User passwords\n\nDETAILS:\n**Tier:** Free, Premium, Ultimate\n**Offering:** SaaS, self-managed\n\nIf you use a password to sign in to GitLab, a strong password is very important. A weak or guessable password makes it\neasier for unauthorized people to log into your account.\n\nSome organizations require you to meet certain requirements when choosing a password.\n\nImprove the security of your account with [two-factor authentication](account/two_factor_authentication.md)\n\n## Choose your password\n\nYou can choose a password when you [create a user account](account/create_accounts.md).\n\nIf you register your account using an external authentication and\nauthorization provider, you do not need to choose a password. GitLab\n[sets a random, unique, and secure password for you](../../security/passwords_for_integrated_authentication_methods.md).\n\n## Change your password\n\n> - Password reset emails sent to any verified email address [introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/16311) in GitLab 16.1.\n\nYou can change your password. GitLab enforces [password requirements](#password-requirements) when you choose your new\npassword.\n\n1. On the left sidebar, select your avatar.\n1. Select **Edit profile**.\n1. On the left sidebar, select **Password**.\n1. In the **Current password** text box, enter your current password.\n1. In the **New password** and **Password confirmation** text box, enter your new password.\n1. Select **Save password**.\n
If you regenerate 2FA recovery codes, save them. You can't use any previously created 2FA codes.\n\n## Sign in with two-factor authentication enabled\n\nSigning in with 2FA enabled is only slightly different than the typical sign-in process. Enter your username and password\nand you're presented with a second prompt, depending on which type of 2FA you've enabled.\n\n### Sign in using a one-time password\n\nWhen asked, enter the pin from your one-time password authenticator's application or a recovery code to sign in.\n\n### Sign in using a WebAuthn device\n\nIn supported browsers, you should be automatically prompted to activate your WebAuthn device (for example, by touching\nor pressing its button) after entering your credentials.\n\nA message displays indicating that your device responded to the authentication request and you're automatically signed\nin.\n\n## Disable two-factor authentication\n\nTo disable 2FA:\n\n1. Access your [**User settings**](../index.md#access-your-user-settings).\n1. Select **Account**.\n1. Select **Manage two-factor authentication**.\n1. Under **Register Two-Factor Authenticator**, enter your current password and select **Disable two-factor\n   authentication**.\n\nThis clears all your 2FA registrations, including mobile applications and WebAuthn devices.\n\n## Recovery options\n\nIf you don't have access to your code generation device, you can recover access to your account:\n\n- [Use a saved recovery code](#use-a-saved-recovery-code), if you saved them when you enabled two-factor
During sign in, use one of the codes above when prompted for your\n   two-factor code. Then, visit your Profile Settings and add a new device\n   so you do not lose access to your account again.\n   ```\n\n1. Go to the GitLab sign-in page and enter your username or email, and password. When prompted for a two-factor code,\n   enter one of the recovery codes obtained from the command-line output.\n\nAfter signing in, immediately set up 2FA with a new device.\n\n### Have two-factor authentication disabled on your account\n\nDETAILS:\n**Tier:** Premium, Ultimate\n**Offering:** SaaS\n\nIf other methods are unavailable, have a GitLab support contact submit a [support ticket](https://support.gitlab.com) to request\na GitLab global administrator disable 2FA for your account:\n\n- This service is only available for accounts that have a GitLab.com subscription. For more information, see our\n  [blog post](https://about.gitlab.com/blog/2020/08/04/gitlab-support-no-longer-processing-mfa-resets-for-free-users/).\n- Disabling this setting temporarily leaves your account in a less secure state. You should sign in and re-enable two-factor\n  authentication as soon as possible.\n\n## Information for GitLab administrators\n\nDETAILS:\n**Tier:** Free, Premium, Ultimate\n**Offering:** Self-managed\n
The user can now sign in with the new username and password, and they are asked\nto change the password you set up for them.\n\nNOTE:\nIf you wanted to create a test user, you could follow the previous steps\nby providing a fake email and using the same password in the final confirmation.\n\n## Create users through authentication integrations\n\nUsers are:\n\n- Automatically created upon first sign in with the [LDAP integration](../../../administration/auth/ldap/index.md).\n- Created when first signing in using an [OmniAuth provider](../../../integration/omniauth.md) if\n  the `allow_single_sign_on` setting is present.\n- Created when first signing with [Group SAML](../../group/saml_sso/index.md).\n- Automatically created by [SCIM](../../group/saml_sso/scim_setup.md) when the user is created in\n  the identity provider.\n\n## Create users through the Rails console\n\nWARNING:\nCommands that change data can cause damage if not run correctly or under the right conditions. Always run commands in a test environment first and have a backup instance ready to restore.\n\nTo create a user through the Rails console:\n\n1. [Start a Rails console session](../../../administration/operations/rails_console.md#starting-a-rails-console-session).\n1. Run the following commands:\n\n   ```ruby\n   u = User.new(username: 'test_user', email: 'test@example.com', name: 'Test User', password: 'password', password_confirmation: 'password')\n   u.assign_personal_namespace

demo

Getting the top most relevant results for a question takes <50ms (running on a local machine with not much resources).

Getting it production-ready

  • Upgrade ES version
  • Deploy ELSER model on staging and prod after checking ML resource requirements
  • Create a migration to create the index and ingestion pipeline
Edited by Madelein van Niekerk

Merge request reports