Implement a query language in Gitaly
Gitaly has over 150 RPCs to cover a wide variety of use cases. Many of them perform the same operations internally as well such as updating references, and many of the read-only RPCs are specifically optimized for a given use case. This is unideal and creates a maintenance burden:
- Gitaly team has to maintain all of these RPCs. The improvements done in one RPC don't automatically translate to other RPC implementations.
- Good examples of above are changed calling conventions, where some RPCs don't return proper error while others do. Plugging in additional logic, such as the timestamp parameter, required touching all write RPCs which may write commits.
- The number of RPCs grows as more use cases arise from the clients. The clients may use more fine grained RPCs to build complex ones but that's inefficient due to requiring multiple calls. If we don't provide fine grained enough RPC, the clients over fetch data.
- Good example of the above are the authorization checks done by Rails, where Rails calls back to Gitaly tens of times to get the required data for authorizing a given write.
This could be greatly improved upon by implementing a query language in Gitaly. Gitaly would have a single endpoint for querying data, and few special purpose ones for accepting pushes and such. This brings benefits:
- Clients can retrieve exactly the data they need. No need to implement bunch of different RPCs to cover different use cases.
- Maintaining Gitaly becomes easier. We no longer have to maintain 150 RPCs but we'll maintain a query planner that optimizes the data access. Optimizations done in the query execution automatically optimizes all access, since all queries go through the same optimizer.
Gitaly is storing Git repositories which consist of graph data. We should likely choose an existing language for that use case, such as the Graph Query Language. GQL is being standardized as a query language for graph databases to co-exist with SQL. The standard can be large but subsets of it could be implemented iteratively.
/cc @andrashorvath @mjwood this is likely something that we should do in the future. It solves for good the N+1 query issues impacting Gitaly, improves maintainability, and allows us to optimize storage access transparently. As a project, this is decoupled from the other on-going projects and could be parallelized.