I've noticed that what we do is slightly different than what we say we do, and I wanted to flag this and adjust either our process or the documentation so they match. Here's what is formally set up:
GraphQL API foundational development falls officially to the groupproject management of the devopsplan team (@gweaver DRI) - this does not include individual features being implemented in GraphQL (e.g. devopssecure vulnerability, devopsmonitor alert management, etc)
As a part of Product, there is a process for scheduling and tracking GraphQL issues within the groupproject management team
I personally think what we do works well, but it does not match the above Here are a bunch of thoughts:
Much of the GraphQL foundational work (by "foundational" I mean common to all GraphQL API, such as complexity or lookaheads) is done by a group of GraphQL 'volunteer' experts on a variety of teams.
The experts identify pain points, tag everyone else, begin a conversation, and often come up with a solution (or at least have a good discussion)
GraphQL is complex and while one single person could theoretically understand it all, we have pockets of deeper expertise - e.g. @digitalmoksha knows keyset pagination, @ntepluhina knows heaps about Apollo, @seanarnold knows externalised API integration/pagination, etc.
It's not really acknowledged that the above is a thing. Maybe it should be?
Some possible challenges/tensions:
There is a lot of GraphQL foundational work, and it flies under the radar a bit.
Much of this GraphQL work does not affect customers directly, but it would affect quality and developer efficiency if they didn't get addressed. So some of it is "Product" but most falls under the "Engineering" purview.
We sometimes ping @gweaver as the nominal DRI, but sometimes that doesn't happen because bias for action and that.
If there isn't an immediate pain point (or if it's not championed by an engineer), the issue sometimes falls by the wayside (put more succinctly by @tkuah below... we're not really developing towards a vision or a roadmap to that vision at the moment. it's all a bit ad-hoc)
GraphQL experts have a higher review load since we are called as GraphQL domain experts to review both codebase-wide GraphQL implementations (that's where a lot of conversation and improvements come from), as well as our own group's foundational improvements
If this group of volunteers wasn't available, I'm not sure that the Project Management team would be able to manage it (though I could be wrong here, there are only five backend engineers and a very wide product scope).
A couple of suggestions to get the conversation started
Create a GraphQL-foundation-specific label or way to track "foundational" GraphQL API issues? GraphQL and api are often placed on feature implementations that use GraphQL
Create a board for GraphQL issues using the above with checkins, say, during Office Hours?
Figure out a process to make it clear who's the DRI for issues (@gweaver or an Engineering DRI, say, whoever is assigned to the issue or volunteers to take the solution)
Thanks for putting this together @cablett! I think we definitely suffer a fair bit of pain within devopsplan due to the additional development cost of going GraphQL first - I love that we're pushing this forward and supporting the foundational work of GraphQL - but I agree that we should do a better job acknowledging just how much work it is.
From a visibility standpoint, a board might be helpful, or some other way of just keeping a visible log of just how much time and effort this type of work takes up - because I definitely think (looking at you and @digitalmoksha's MRs for example) we're putting in a ton of work in this domain that isn't necessarily being planned or represented in the scope of work that we're getting done each milestone - which leads to being spread thin and contributes to burnout.
I'm happy to support however possible in documenting or formalizing this process.
better use of labels (the ~"feature::maintenance" and featureaddition make it not quite as easy to determine what is foundational work - backstage [DEPRECATED] was really helpful for that)
We really need a DRI for https://docs.gitlab.com/ee/api/graphql/#vision. It is currently suggesting that GraphQL must be in parity with the REST API and share implementations, but this is leading to confusion as there's no concrete implementation for sharing currently.
Nick started this off with gitlab-foss!24636 (merged), but we we're currently not enforcing adding new things like that for the sake of velocity.
Perhaps we should split up that process for new functionality and existing things. The REST counterpart should be added in the same MR, or we should create an issue assigned to the original author to add the REST api.
For functionality that exists in the REST api, but isn't using GraphQL, we could decide to switch the the REST implementation once we add it to GraphQL for our own use.
I agree, a DRI for making this concrete would be needed!
Note: We had a significant discussion about GraphQL first in a Configure group meeting. In the end we concluded that we will make a new API GraphQL only.
@tkuah: We had a significant discussion about GraphQL first in a Configure group meeting. In the end we concluded that we will make a new API GraphQL only.
Sharing as another data point.
In groupthreat insights we have also decided to be GraphQL only. The existing REST vulnerabilities API (and vuln. findings) are in alpha and I have an action agreed with @matt_wilson to write an Epic to remove the REST APIs.
So what is our policy of keeping the two APIs up-to-date? I understand there are customers who use REST API who might assume that it would continue to be added to in a predictable manner. See #216456 (comment 463642459) too.
There are no plans to deprecate the REST API. To reduce the technical burden of supporting two APIs in parallel, they should share implementations as much as possible.
Our current policy is that we want both. There are advantages and disadvantages for both REST and GraphQL.
Do we want to add new functionality to both APIs? It may be tricky for sales if they refer to the REST API to use new feature X, but the GraphQL to access new feature Y.
I wonder if until #235701 gains some traction if we should at least be creating back-fill issues for the un-implemented side?
@thiagocsf It sounds like from this discussion that deprecating our specific pieces of the RESTv4 API is off the table.
@tkuah Is there a location in our handbook or docs that is more explicit about the requirement for functional parity? I see "preferences" and "recommendations" but not specific guidance/policy that gaps in one versus the other need to be reconciled.
unstable and can cause performance and stability issues
the configuration and dependencies are likely to change
features and functions may be removed
data loss can occur (be that through bugs or updates)
What we're talking about is removing the already-deprecated endpoints.
There is no commitment to maintaining parity from GraphQL to REST (we do have a statement of intent for the opposite). The GraphQL Vision states:
(...) To achieve this, it needs full coverage - anything possible in the REST API should also be possible in the GraphQL API.
So we should have everything in GraphQL but not the opposite , i.e. "anything possible in GraphQL API should be possible in the REST API".
In the same section there's also this, which might be what you're referring to:
There are no plans to deprecate the REST API. To reduce the technical burden of supporting two APIs in parallel, they should share implementations as much as possible.
We're not deprecating the "REST API". We're removing an alpha feature (which is already deprecated, in the sense that it's not recommended for production) from an endpoint. Both endpoints that we want to remove, vulnerabilities and vulnerability_findings, are marked as alpha:
This API is in an alpha stage and considered unstable. The response payload may be subject to change or breakage across GitLab releases.
@thiagocsf Ah OK, I interpreted that differently but reading your points, it sounds like we are fine to deprecate specific REST methods as long as we have GraphQL replacements.
it sounds like we are fine to deprecate specific REST methods as long as we have GraphQL replacements.
Sorry but not quite .
My interpretation is that we are fine to remove specific methods/endpoints outside a major release because they're in alpha. Whether or not the new recommendation is in REST or GraphQL is up to us (although we should do GraphQL first).
Thank you for writing this down @cablett! On the frontend side it's more or less the same - we're doing some foundational work in the scope of devopscreate and devopsplan mostly for Apollo Client usage and sometimes it increases a workload significantly. For me, personally, it's a bit easier as my team allowed me to dedicate a part of my working time to these issues but I believe making it more formal and visible will make everyone's life easier. I love the suggestions you mentioned and I totally support implementing them.
charlie ablettchanged the descriptionCompare with previous version
@.luke I don't think we will ever have a GraphQL team for the same reason why we don't have a REST API team. As I've had it explained to me -- creating a function based team instead of a customer facing team leads to the same problems, just with a slightly different flavor -- local optimization for one specific piece of our stack as opposed to looking globally at the one thing a team could work out that would contribute the most value to the customer and ROI for the business. Sometimes that may be GraphQL, other times it may be something completely unrelated.
A Working Group should have one or more exit criteria and should work doggedly to shut itself down by meeting them (I know you'll appreciate that, @cablett, because I know you've read the John Gall book on systems!)
With that in mind I can't think of an existing precedent for what's being described here. I wonder if we need to state the problem more clearly in order to find the solution. What's broken at the minute? Is foundational work not being scheduled as it should? Is there a bottleneck on @gweaver as the DRI?
The work is not really visible outside the merry band of engineers cutting the MRs
The work is not being tracked in any meaningful capacity beyond "is this technical pain point resolved?"
The work is not scheduled
The work is a bit ad-hoc and there doesn't seem to be a vision for our GraphQL API (beyond "GraphQL first" which doesn't seem to be a thing anymore)
There is no bottleneck wrt process with @gweaver because although he's the DRI, he's not insisting on signing off on every MR (he's trusting the engineers, which is good!) It does mean that we perhaps need an engineering DRI or group. I was imagining it'd look a bit like maintainers, where frontend/backend makes a transparent and sensible decision based on discussion and gathering of viewpoints.
The work is not really visible outside the merry band of engineers cutting the MRs
Honestly there's probably not gonna be a good way to fix this, but I like your idea of creating a group equivalent to maintainers for this. We could ask for this group to be pinged as part of the review process for MRs and they could do a review based on how the endpoint in the MR fits with the rest of our GraphQL ecosystem. This would allow an opportunity to bring up recently completed improvements, gotchas, etc.
The work is not being tracked in any meaningful capacity beyond "is this technical pain point resolved?"
I don't see how else we'd track it, to be honest. We can generate a single epic that tracks all fundamental GraphQL work if what you want is a way to relate issues to each other and have a way to see completed work on a particular "theme" (authorization for example) but that sounds a bit convoluted.
The work is not scheduled
I think this is why it's important to have a group be the DRI - we need to align with Product in order to ensure GraphQL work gets scheduled consistently. That said, since we have many people that are not part of groupproject management that contribute to GraphQL, having a separate grouping would make sense in my opinion
The work is a bit ad-hoc and there doesn't seem to be a vision for our GraphQL API
For visibility, I asked IT to create the group @gitlab-org/graphql-experts which anyone can ping if they would like help with a GraphQL problem (we also have the #f_graphql Slack channel, which is also quite active with questions and "is this known" bug reports).
I'm 100% supportive of going GraphQL first and usually put that as an acceptance criteria on all new features groupproject management builds. As far as being the DRI for the vision of GraphQL, I feel like that shouldn't be my domain as that is a technical implementation decision which ought to be owned by Engineering.
I did notice that this recently merged MR (gitlab-com/www-gitlab-com!56454 (diffs)) reverts the "graphql first" standard previously outlined in our handbook. I think this is not necessarily the best thing for our product and codebase given our values of boring solutions and that they should not be conflated with technical debt. From the linked article:
New technology choices might be purely additive (for example: “we don’t have caching yet, so let’s add memcached”). But they might also overlap or replace things you are already using. If that’s the case, you should set clear expectations about migrating old functionality to the new system. The policy should typically be “we’re committed to migrating,” with a proposed timeline. The intention of this step is to keep wreckage at manageable levels, and to avoid proliferating locally-optimal solutions.
I'm committed to helping us move forward GraphQL forward at GitLab, but it feels like we need a few things:
Executive sponsorship within the Engineering department to help craft a plan to migrate purely to one solution -- Haml/VueX + REST or Vue Apollo + GraphQL -- and commit to a timeline. One way to help achieve this transition could be by using our quarterly OKRs similar to what we are doing for Pajamas (gitlab-com/www-gitlab-com#8431 (closed), gitlab-com/www-gitlab-com#8363 (closed), gitlab-com&734 (closed))
A clear plan for powering 100% of our REST API from GraphQL -- and a commitment to a timeline. If we are considering a v5 for our REST API, maybe this would be the time to do it.
Without eliminating overlapping solutions at some ideal date in the future being a common goal across engineering, I don't think we will ever get the proper recognition and support for all the effort being put into GraphQL. It will also severely limit future contributions from other groups across GitLab as they will continue to default to Rails/HAML or REST + VueX first. All this to say, as a non-engineering outside observer, I see a lot of thrashing and technical debt coming out of this "non-commitment". As a Product Manager, I'm not sure how to prioritize various refactoring initiatives we want to take on because I struggle to see some end point in the future where we don't have overlapping solutions. At the end of the day, ya'll are the Engineers and i respect whatever path we go down -- I just hope we pick one ;)
If I were to define a vision for GraphQL, it would be something like:
[insert evidence (not opinion) as to why standardizing on GraphQL + Vue Apollo is a better implementation decision vs. Rails first / HAML and VueX + REST]. Further, these alternative implementation paths are overlapping solutions with GraphQL and Vue Apollo. In line with our Efficiency value of boring solutions, "boring" should not be conflated with technical debt. Maintaining these multiple overlapping architectures is the very definition of technical debt -- which is one of our CEO's identified biggest risks to GitLab being successful in the long run.
To alleviate this risk, we are setting clear expectations about migrating old functionality to the new system so that we avoid proliferating locally-optimal solutions. By [insert date], the GraphQL API will be the default means of interacting programmatically with GitLab. As part of this transition, the REST API will be fully generated from our GraphQL API with no breaking changes to continue to support our wider community that prefers to interact with REST over GraphQL.
To further support this vision and also help deliver a consistent user experience across the product, by [insert date], GitLab's user interface will be powered exclusively by Pajamas, GitLab's design system, and Vue Apollo so we can exclusively dogfood our GraphQL API.
To wrap things up, I'd love to delegate the vision for GraphQL to the maintainers that are doing all of the work. It is an implementation decision, not really a product decision unless there are certain decisions that have the potential to deeply impact our customers and revenue streams -- then I'd like to be the DRI ;)
How can I help?
/cc @jlear@johnhope@donaldcook as I think it's worth surfacing some of the points about overlapping solutions -- and getting clarity on what "boring" solutions really means -- within the Engineering org.
So are we not GraphQL first? I've been telling folks that, but I'm not sure now. I wasn't even aware of this MR. I don't really have enough context on the frontend to understand fully the "why" (but that's probably on me )
Executive sponsorship within the Engineering department to help craft a plan to migrate purely to one solution -- Haml/VueX + REST or Vue Apollo + GraphQL -- and commit to a timeline. One way to help achieve this transition could be by using our quarterly OKRs similar to what we are doing for Pajamas (gitlab-com/www-gitlab-com#8431 (closed), gitlab-com/www-gitlab-com#8363 (closed), gitlab-com&734 (closed))
A clear plan for powering 100% of our REST API from GraphQL -- and a commitment to a timeline. If we are considering a v5 for our REST API, maybe this would be the time to do it.
I think in any case, we need visibility into the work that's happening. Here's a potential thing to get started:
Distinguish between customer-facing GraphQL features (@gweaver's GraphQL purview) and GraphQL foundational work (Engineering DRI's purview). Create a new gitlab-org group label for GraphQL - maybe a scoped label ~"GraphQL::feature" and ~"GraphQL::foundation" or something. The former is individual teams' features being exposed via GraphQL, and the latter is foundational work common to the GraphQL API. (on second thought, we could also use ~"feature::maintenance" and featureaddition to make this distinction?)
Label all the foundational issues we can think of (maybe recent ones, the last six months or so?) to get an idea of how much effort is going into foundational work.
Cut a handbook MR detailing the process involving using whatever labels we decide to GraphQL MRs/issues
Use a board and prioritise issues regularly during GraphQL Office Hours, or some other recorded regular checkin
Once we have an idea of what needs doing and how much is done, we can curate a vision and plan for the future.
I did notice that this recently merged MR (gitlab-com/www-gitlab-com!56454 (diffs)) reverts the "graphql first" standard previously outlined in our handbook. I think this is not necessarily the best thing for our product and codebase given our values of boring solutions and that they should not be conflated with technical debt.
I have to admit I was a bit...stunned...by this removal (thank you for pointing it out @gweaver). And I didn't see a comment in the MR or description explaining such a big change. I would have hoped that our GitLab over communication mantra would have extended to informing the group currently responsible for GraphQL. Though it's more than possible I just missed it somewhere. The Plan team engineers and others actually made the decision to go GraphQL first, knowing it would be somewhat painful but also that it set up future direction for GitLab.
Here's the removed paragraph:
GraphQL first
When adding new functionality, we should use GraphQL where possible on
the [backend] and the [frontend]. We have a long-term goal to [use
GraphQL everywhere] because it lets us increase development speed,
reduces dependencies between frontend and backend engineers, and gives
us a single source of truth for application data.
Defaulting to GraphQL for new work means that the distance from that
goal doesn't increase over time.
This does not override [the importance of velocity]: if something is
significantly more work to ship using GraphQL, rather than extending an
existing implementation (in a Rails controller or the REST API), we
should not block ourselves on using GraphQL. Instead, we should ship the
feature and create a follow-up issue to move that resource to GraphQL in
future. That follow-up issue can be scheduled by the relevant Product
Manager, in consultation with Engineering Managers, as with any other
[engineering proposed initiative].
I honestly think the only reason we've gotten as far as we have implementing GraphQL is because we've tried to be GraphQL first. Without that push, we wouldn't be so far long.
I'm strongly in favour of establishing a graphql group and labeling work on GraphQL appropriately. It would be good to get the same level of support for this facet of development as we see for database changes.
I have to admit I was a bit...stunned...by this removal
I have to agree, and wonder how I missed this. It is a fairly fundamental change in direction. I think we have already seen some impressive wins by implementing a powerful GraphQL endpoint.
Exactly that, no change in strategy (which would have been widely announcced), we were simply asked to make the description clearer about targetted frontend implementation by seperating the FE implementation technology (HAML vs. Vue) and the actual API technology (REST vs GraphQL) as this was leading to more confusion.
Perhaps we should take a look to have somewhere else more seperate clarification about API public/internal public and the used technologies whic is where the previous paragrapgh applied to. GraphQL first in a pragmatic approach forward is still the target.
Moving the public facing API fully to GraphQL is in my opinion a topic for the Ecosystem team. So again we should there focus on clear seperation between the topics.
TO make this topic also more wider known how impactful GraphQL was so far at GitLab, can someone collect a couple of positive examples/success stories as I would highlight those in discussions for a faster adoption of GraphQL?
Forgive me, maybe I'm totally looking at this wrong. But this is the main Engineering page. There used to be two second level headers:
## Rails by default, VueJS where it counts followed by ## GraphQL first. These had nothing to do with each other - they are preceded by ## Engineering Proposed Initiatives and followed by ## Demos.
But now, the entire GraphQL first section, which specifically called out backend and frontend, is now completely gone. I'm not aware of it being documented anywhere else.
Rails by default, VueJS where it counts seems to relate to how we develop our frontend, and what leads us to choose when to use HAML or Vue. And while it says "app backed by our API (preferrable GraphQL)", this is a much different than a statement of direction for GraphQL first.
The GraphQL section seems on par with that - a set direction (and some reasoning) of why we choose it over the REST API, and when it is appropriate to do something in REST. If it's gone, I don't think there is anywhere else that sets that direction and lays out when to choose REST.
But if this is really what was intended, I won't belabor it anymore.
Do you think we can restore the section "GraphQL First" @timzallmann , or at least put it into a different place, if it doesn't belong in that location?
@cablett Yes I think in the sense of API's we definitely should find a new better place where the reasoning and an explanation between public/internal public etc can be stated. If you have suggestions happy to take them!
@gweaver I wonder if it would make sense to use Architecture Practice and Architecture Workflow processes here. The new process assumes that a huge initiative like this would benefit a lot from having an Engineering Leader, Product Manager and an Engineer assigned as DRIs.
I'm suggesting this here because I'm receiving questions from people in Verify about GraphQL first approach / transition. Some questions include:
how to separate internal API from external API (or should we even do that)
how to avoid the inflation of deprecated fields describing volatile data
how to maintain parity with REST API (or should we even do that)
what is the cost of not extending REST API for small projects that value simplicity more over GraphQL flexibility
These are examples of questions I'm hearing. What I'm missing is a blueprint / document describing these concerns, DRIs listed somewhere where I can find them and hold them accountable for getting answers in case of doubts like this blocking teams / progress.
@grzesiek great. It will probably be a while before I have capacity to create a blueprint (currently in 3 working groups), but as soon as some of those long running commitments are done, I can shift back to this if no one else has started the process.
@cablett@.luke feel free to drive this forward (or any of the other graphQL maintainers) as I don't want to slow down momentum here if ya'll feel like this would be helpful.
Observability of graphql query is a big problem. Most of our monitoring systems are based on request routes, but with graphql there's only one route.
There is an ongoing effort from the scalability team to match each route with a feature category, I think we should think about how we can make this also for graphql.
The goal of this epic is to be able to connect resource usage or incident root causes to a feature category. We already have this in place for Sidekiq, and would like to extend this to Web and API requests.
We will first create a framework to be able to categorize actions and endpoints, then we will perform a first-pass categorization, and we will make this information available in the metrics and logs.
The benefit for Stage Groups is that they will have more information available to them about how their feature categories perform on GitLab.com.
With regards to observability, we've already have some metrics (an example) for GraphQL that are more fine-grained than the overarching request latencies. But those are not yet consumable in a Grafana dashboard.
I think we're also emitting some logs (into log/graphql_json.log), but I don't think those are ingested in our logging infra yet.
Thanks @grzesiek for reporting those concerns. I'd like to expand a bit on How to make sure that we can accelerate uploads with GraphQL the same way we do with REST API? as I'm more familiar with that topic.
We have extensive documentation on handling uploads in our codebase. Over the course of the years we invested a lot of time and effort in making uploads scalable, k8s compatible, and performant.
With the design management feature, we introduced the apollo library to handle graphql uploads but this bypassed all the workhorse optimization we have in place.
If we want to keep uploading with graphql we should implement direct upload.
The challenge I see is that workhorse cannot efficiently inspect the content of the query, so it would upload every single file to the same bucket. But GitLab is designed with feature-based buckets, so we will likely need to add a new one, and then eventually move the uploaded object to the final destination in rails.
It might be worth watching and reading the following materials:
GraphQL evaluation is not monolithic - while there is one route, each field and object we return goes through instances of ::Types::BaseObject, ::Types::BaseField, etc. We can add instrumentation on these classes to report usage and error reporting. Some of this is already done, with each field we resolve being included in metrics.
@cablett We accept the responsibility! I will apply labels though so we can refer back to this amazing feedback. This isn't an issue I've come across and it has some great points to keep in mind. Thanks for the ping!