13.12 Manage:Optimize retrospective

changed due date to May 26, 2021

added groupoptimize retrospective labels

What went well this release?

What didn't go well this release?

Group-level DevOps Adoption has suffered from a series of surprises:

We did not enable the feature flag in %13.11 as planned, and we were surprised at the beginning of %13.12 when we realized it had slipped.
We merged the release post MR before the feature flag was enabled by default. As a result, users were told the feature was available but couldn't find it.
We implemented in Starter instead of Ultimate, which blocked us from enabling the feature for SaaS, which blocked us from enabling the feature by default.

Our engineering efforts need to be more detail-oriented and proactive than this. I include myself in that – I didn't mark the feature flag enabling as a blocker on the release post MR, and I didn't prep @wortschi to handle his first release post before I went OOO. Let's all make an effort to double-check our work for the next few weeks.

Bigger picture, this many mistakes feels like more than a coincidence. Why were ALL of us failing at the same time? Normally when one of us misses something, another will notice. Was there something strange going on with us a few weeks ago? Contagious brainwaves? Alien abductions? I suspect the overlap between the instance-level and group-level features is partially to blame, because it was hard to track their statuses separately. But that wouldn't explain everything...

@wortschi @blabuschagne @pshutsin what are your thoughts?

Thanks for adding this @djensen. It does feel like there was some communication missing or some misunderstandings here. Please let me know if there is something I can be doing differently to avoid situations like this in the future.

Thanks for raising this @djensen!

I opened the issue to track the group level feature flag but didn't announce that it wasn't enabled by default by the end of 13.11.

Perhaps going forward we can discuss feature flag status issues and their outstanding requirements as part of our walk the board section in the weekly? This could help ensure that we're all on the same page and know what's still outstanding before opening a release post item?

discuss feature flag status issues and their outstanding requirements as part of our walk the board section in the weekly?

I think that's a good idea. The easiest way to make sure that happens is to make sure that feature flag rollout issues are prioritized as either ~priority::1 or ~priority::2, because we review every single one of those issues. Happily, that makes conceptual sense too - a feature flag enabling is a big deal, and we should have high confidence in it, meaning 1 or 2 is the appropriate label.

@ljlane @wortschi are you on board with this idea? ^

How did it happen?

Feature flag

gitlab-org/gitlab#299606 (closed) states that we want to fully refactor Segments into "enabled Groups" before we enable it by default. The refactoring is scheduled to 14.0 as a braking change for the API. All DevOps adoption pages use the same API so this is the reason why group-level wasn't enabled by default and was treated as beta too.

License check

Not sure how it happened, but it was Starter from very beginning of implementation and slipped from there to group level.

What can we do

Literally every feature can be initially behind a feature flag, so we always need to think about it when writing release post and considering feature as "done and released for general audience".
Don't merge release post until you see the feature working in production?
Be more explicit what is beta and what is not. As it turned out the fact that 2 devops adoption pages shared the same API wasn't clear.

P.S. Entire "this feature is beta" thingy could be avoided if we agreed with Sid or any other person who has high influence on decisions before introducing entire Segments abstractions.

@blabuschagne @djensen

discuss feature flag status issues and their outstanding requirements as part of our walk the board section in the weekly?

The easiest way to make sure that happens is to make sure that feature flag rollout issues are prioritized as either ~priority::1 or ~priority::2

That's a good idea. Adding a reminder to our weekly agenda to check the status of feature flags as we walk the board seems not too intrusive and could help us in quickly identifying similar situations in the future

With regards to @pshutsin's suggestion, I think we should still allow merging release posts without the feature being deployed to production. By preventing release posts from being merged before a feature is in production, we would either give developers less time to implement a feature or hold off with release posts until the next milestone. I think both options are not ideal

To be honest, I don't know the exact reason why the rollout of Group-level DevOps Adoption was causing so many problems but I'm assuming that it was a combination of all the reasons mentioned above and the fact that probably nobody felt responsible for communicating the actual state of the feature flag. Thus, I'm thinking that we need to either define a DRI for a feature flag (e.g, the BE or FE dev) or we keep the rollout issue as the Single Source of Truth as suggested by @ekigbo in gitlab-org/gitlab!60437 (comment 562260787).

Perhaps going forward we can discuss feature flag status issues and their outstanding requirements as part of our walk the board section in the weekly? This could help ensure that we're all on the same page and know what's still outstanding before opening a release post item?

@blabuschagne We (PMs) are required to have all release post items created by the 10th of each month. We have until 17th to merge them. I think blocking the release post issue with the feature flag issue could help.

The easiest way to make sure that happens is to make sure that feature flag rollout issues are prioritized as either ~priority::1 or ~priority::2

@djensen Yes, I would support making feature flag issues ~priority::1

All DevOps adoption pages use the same API so this is the reason why group-level wasn't enabled by default and was treated as beta too.

I think this is the crux of the mix up. I didn't realize that we were blocking group-level enable by default with the API refactoring because I didn't know that the instance-level API was so intertwined with the group-level feature.

I think we should still allow merging release posts without the feature being deployed to production

@wortschi This is the policy on when a release post can be merged. I'm sorry I wasn't around to better support you on this.

Once all content is reviewed and complete, add the ~"Ready" label and assign this issue to the Engineering Manager (EM). The EM is responsible for merging as soon as the implementing feature is deployed to GitLab.com, after which this content will appear on the GitLab.com Release page and can be included in the next release post. All release post items must be merged on or before the 17th of the month. If a feature is not ready by the 17th due date, the EM should push the release post item to the next milestone.

Entire "this feature is beta" thingy could be avoided if we agreed with Sid or any other person who has high influence on decisions before introducing entire Segments abstractions.

@pshutsin The decision to introduce Segments was made before I joined groupoptimize so I don't have much context except I think we were trying to address customer concerns about excluding certain projects or trying to map to their org structure. TBH if I was the PM at the time I don't think I would have predicted the amount of internal pushback we got on this, but after hearing the complexity argument I do get why we wouldn't want to do that. This was one of the motivators for me to set up those vision review sessions with Anoop so we can be more certain we're not introducing unsupported surprises.

@gitlab-org/manage/optimize I really appreciate the open discussion we're having around this and the willingness of team member's to add their opinions. This is a sure sign that we can learn and get better together.

Great discussion! I really like the idea of assigning priority labels to feature flag enabling / removal, definitely (to me) helps mentally position feature flag work more like "first class" work, and should help ensure its something we dont miss / forget in the process of building a new feature.

+1 Also to adding feature flags into the agenda, great suggestion!

@ljlane

This is the policy on when a release post can be merged. I'm sorry I wasn't around to better support you on this.

Thanks for linking to this section. The following paragraph caught my intention:

The EM is responsible for merging as soon as the implementing feature is deployed to GitLab.com.

This means we need to be very careful with last minute merges. Merging into master doesn't guarantee that this is deployed to production at the same time. Typically, there's always some time between a merge into master and a production deployment and we need to hold off with merging a RP until we have verified that the feature is actually deployed to .com. Just wanted to call this out here as I believe there's an important detail. We recently a related problem when we merged a feature for to let users schedule un-setting of their busy status. (see https://gitlab.com/gl-retrospectives/manage/-/issues/77#note_541733446 for details)

Anyway, with our soft cutoff date this shouldn't cause too many issues in the future

Here's an MR to add a release post checklist item for adding a blocking MR: gitlab-com/www-gitlab-com!81350 (merged)

What can we improve?

What praise do you have for the team?

Thank you for supporting me while I was out for surgery, and for the beautiful
We're having a great retro and making an effort to do things better

What should be surfaced in the company retrospective?

I guess any outcomes from the feature flag template split should probably get some amplification around the company

@ekigbo I was disappointed to see that split get completely reverted. Thanks for your support in that subsequent conversation, we're definitely on the same page. I just asked for alternative proposals, because I still want to solve this problem. In the meanwhile, we can continue using the feature flag removal issue template at will

changed the description

@ahegyi @djensen @pshutsin @m_frankiewicz (backend)
@blabuschagne @ekigbo @wortschi (frontend)
@npost (ux)
@ljlane (pm)
@msedlakjakubowski (technical_writing)

THANK YOU ALL for the very transparent discussion we had about how we can make fewer mistakes as a group. We've decided on some concrete steps:

Assigning a single engineer as the DRI on each "feature flag rollout" issue, so responsibility is clear.
Applying a P1/P2 priority label on each "feature flag rollout" issue, so they are part of every board-walking.
Setting a blocking code MR on every release post MR.

Those are process improvements. I suspect there are also improvements we can make to our feature refinement and workload balancing. I'd like to send a survey and get feedback from you all - I'll share a link soon.

Closing this because it has reached its due date. Please check out the 14.0 retrospective issue next!

closed

mentioned in issue gitlab-com/www-gitlab-com#11844 (closed)

made the issue visible to everyone

After reviewing this and finding nothing to sanitize, disabling confidentiality so this can be linked in the GitLab 13.12 retrospective.

cc @m_gill

moved to gl-retrospectives/manage-stage/optimize#10 (closed)