The AI Gateway is a core component of our AI Architecture. It's a python application built using poetry. At present, there's no support in omnibus-gitlab for users to deploy their own local instance of the AI Gateway. This will become an increasingly important requirement as we aim to support air-gapped AI setups and Custom Models
Google Vertex API (AIGW_VERTEX_TEXT_MODEL__ENDPOINT)
Data requirements
Working directory
Log directory
No database, cache, or persistent file storage needed
Expected performance characteristics
I'd model this after the Gitlab Webservice containers, so e.g. on k8s land the defaults for the Webservice chart should be a good start for this service, since at the moment it basically operates as an HTTP REST API server. In the future, custom models will require adapting this service to more gpu-centric workloads.
When should it be operated (enabled)?
@WarheadsSE as in, in which situations would users enable this service, or what is our timeline for having this in omnibus-gitlab? As for the former, this service should be disabled by default, as most self-managed customers will rely on cloud.gitlab.com. As for the latter, deferring to @sean_carroll.
I'll be preparing a draft MR to incorporate this service on this repo, and getting feedback from the distribution team.
@WarheadsSE sorry, what's "scratch use" in this context? If you mean how much space is required for the application files, the Dockerfile images are around 200mb of space.
Does this need read/write access to the filesystem at all?
We have code that reads from the filesystem but that seems to be unused since this MR. So unless I'm missing something, the answer seems to be "no", /cc @achueshev to double-check. Caching seems to be done all in memory as well (ref).
What write permissions, and how much data is written into the working directory?
Ah, I see. @tle_gitlab can you weight in here? Does the AI Gateway service write stuff to the filesystem? My assumption is that it doesn't, but I don't know if tree-sitter does. What about in the future with Custom Models?
@alejandro The Hugging Face tokenizer in AI Gateway might write to the user's home folder (/root/.cache) if there are no available cache in this location, or the cache structure has changed (upstream code).
@tle_gitlab, out of curiosity, could you please elaborate on when we run the Hugging Face tokenizer? Is it something we always do, or should there be some conditions for that?
Calculate remaining tokens when performing pre-processing (when adding imports, function signatures, etc.) - only applicable for Code Completions with Vertex AI models (code).
@sean_carroll The dependencies that @alejandro laid out above would be slightly different for the AIG dependencies required by the Custom Models team, correct? For example, we would not have the dependency of the Anthropic API or the Google Vertex API (AIGW_VERTEX_TEXT_MODEL__ENDPOINT) as we are not expecting the customers to connect to those APIs at the outset. As such, the dependencies would be:
Thank you @WarheadsSE@pursultani and apologies for the incomplete requirements. I will respond to this in the next couple of days: we are working on some groupcustom models planning at the moment.
Self-hosted Runway will be the preferred delivery mechanism for deploying the AI Gateway. Future options, in order of preference are:- Runway [discussion](https://gitlab.com/gitlab-com/gl-security/security-assurance/fedramp/fedramp-certification/-/issues/452#note_1832261170)- Kubernetes deployment [issue](https://gitlab.com/gitlab-org/gitlab/-/issues/452490)- Omnibus packaging [issue](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/8467
This documents that shipping the AI-gateway through omnibus is the least preferred approach, because we can't reuse it for the other GitLab environments (GET-hybrid, and isolated dedicated instances). So perhaps it is fair to hold off on this for a while if we provide the way for self-managed user to temporarily handle it themselves (gitlab#452489 (closed)).
I know it's not user friendly, and I don't want to slow anything down, but in my opinion we will need to build something to easily ship Runway-services to other environments, and I think it's preferable to do it generic so we can apply the same thing for other services.
This documents that shipping the AI-gateway through omnibus is the least preferred approach, because we can't reuse it for the other GitLab environments (GET-hybrid, and isolated dedicated instances).
Please note that there are GitLab instances that solely rely on Omnibus Linux packages and do not use Kubernetes to manage their workload. Even GitLab reference architecture documents do not explicitly refer to Kubernetes. If this is meant for self-managed instances then we need to ship it with BOTH Omnibus and CNG.
Using the Spamcheck as example of an auxiliary service could be helpful:
@alejandro@reprazent@sean_carroll@oregand@swiskow@rnienaber: the effort in doing this would be significant, and would be once-off for the AI gateway. It would be better, IMO, to put this effort into supporting Runway-on-Kubernetes for Self-Managed use-cases.
That approach would bring significant advantages over this approach
For application developers: a single model for AI Gateway deployment across cloud and SAAS
For platform teams: a single set of best practices that can be followed by self-managed and cloud deployments.
For GitLab: a single effort to deploy many Runway services, not just a single service.
This sounds like a great approach, and I am generally a fan of leaning into Kubernetes, @alejandro what do you think? If we're aligned this would be a better approach we can repurpose this issue.
groupcustom models is agnostic about how the AI Gateway is deployed and are happy to follow any wider company direction. For the MVP we are moving forward with a Docker install, but as noted on the Blueprint this is a temporary measure.
put this effort into supporting Runway-on-Kubernetes for Self-Managed use-cases.
@andrewn@oregand@alejandro if the effort to implement in Omnibus is low (as Python is already packaged), are there any reasons not to proceed with it?
the effort in doing this would be significant, and would be once-off for the AI gateway. It would be better, IMO, to put this effort into supporting Runway-on-Kubernetes
if the effort to implement in Omnibus is low (as Python is already packaged), are there any reasons not to proceed with it?
This is the part I'm not clear on yet. I considered the scope of this issue to be basically following the Adding a new Service to Omnibus doc, which should be a single MR effort; so it seemed to me that if we were able to quickly ship that we'd unlock at least the customers interested in self-managed AI Gateway that are already running omnibus. Maybe I'm missing some necessary follow-up steps to that?
Thats a interesting point, perhaps I had misunderstood too.
If I considered the scope of this issue to be basically following the Adding a new Service to Omnibus doc, is the scope entirely and its a single MR, I see no reason not to do it for unlocking customers already running omnibus.
if the effort to implement in Omnibus is low (as Python is already packaged), are there any reasons not to proceed with it?
What about all the upstream C/C++ dependencies and libraries that need to be built and packaged in order to support the Python libraries included in the AI Gateway? have you included the packaging and support for those in this assessment? https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/poetry.lock. As I understand it, some of these libraries are pretty complicated to build, and lend themselves much better to being packaged in a container than in Omnibus.
Secondly, once this arrives in Omnibus, it's going to stay there, putting a huge burden on the team going forward. Providing an early-adopter client with a temporary approach as discussed previously gives the teams the space to figure out how to do this properly. To put it bluntly: jamming this into Omnibus feels like a missed opportunity to move certain auxiliary workloads in Kubernetes which will ultimately - in my opinion - be better for us to manage and our customers to operate.
What about all the upstream C/C++ dependencies and libraries that need to be built and packaged in order to support the Python libraries included in the AI Gateway?
Aha, I had not properly weighted this. There's a chance we're lucky and the current gcc/libs bundle that omnibus already relies on is sufficient, which is what I was tacitly assuming, but there's also a chance that it can get quite messy.
Providing an early-adopter client with a temporary approach as discussed previously gives the teams the space to figure out how to do this properly. To put it bluntly: jamming this into Omnibus feels like a missed opportunity to move certain auxiliary workloads in Kubernetes
@andrewn I guess what I'm not quite grasping is whether we'll be pushing Runway-on-Kubernetes as the preferred path for all customers. It will definitely be the preferred path for us (GitLab.com, Dedicated), but my approach to this issue was "meet customers where they're at", assuming that customers running containerized deployments would be unlikely to deploy via Omnibus, and vice-versa, and thus implementing only one of the options would be mostly meaningless for the other segment of customers. I could be wrong in this assumption (see this comment for an opposing perspective). Specifically for the purpose of self-managed, my understanding is that one of the major customers interested is running an omnibus-based HA deployment, so my push for this issue is based on two assumptions:
This would require little effort and thus not detract from our focus on Runway-on-Kubernetes. This could be hastily proven wrong by dependency/library management. Maybe whipping out a quick POC would help us establish this
This would be a major win with a large customer that's already running on omnibus, as it would make them adopt SM AIGW quickly. Maybe here it'd be helpful reaching out to the customers and validating if adding a containerized deployment to their setup would be an acceptable path forward/temporary solution.
Do we have a good idea of "where customers are at" ? I understand that Omnibus is a popular installation method for GitLab.
However, have we verified that customers who run Omnibus GitLab are unwilling or unable to add an AI Gateway deployment to a Kubernetes cluster they already manage within their business IT infrastructure?
When we do the analysis and talk to customers, we may find that a relatively small number of them cannot commit to a Kubernetes deployment of the AI Gateway alongside the Omnibus deployment. We may also find the contrary, but I think it merits investigation before we commit to something that is closer to a one way than a two way door.
groupcustom models is agnostic to how AI Gateway is deployed for self-managed.
We do have 2 customers interested in the feature who are on HA Omnibus (if that is the right term), but we will start with the Docker deployment Document installing AI Gateway via Docker (gitlab#452489 - closed) as a stop-gap. A longer term deployment for these large customers would depend on their specific needs and where we are in terms of the other options.
From distribution perspective, there are two major issues:
Software definitions, i.e. build instructions, for the required dependencies of AIGW. For example the extensions of transitive dependencies such libtorch and/or libtensorflow.
Reopening this issue for further discussion, we have some high-profile customers who have been struggling to get the Docker install of AI Gateway running. More context in the Internal Note below.