Moving the Engineer/SRE Line

Through a variety of interactions and comments that have come up lately, I've been considering the distinction between Engineers and SREs.

The Scalability Team has an interesting place in the Engineering department given that we are a hybrid team and a sort of bridge between Development and Infrastructure. Inside the team, we still have a clear distinction between the two types of roles, but I think we can challenge ourselves to move this line.

Now, I'm stopping short of saying that everyone in the team should have production access and be able to deploy changes. I think it's still important to retain the SRE's place of being the people with write access to production systems. Having this access comes with the responsibilities of being on-call, being on incident calls, and treating the health of the system as a primary focus for each day. And I think that is a fair trade-off to make - if you have the ability to bring down production systems, then you should have the responsibility of looking after it. If everyone has this ability, then I am concerned that we become another reliability team that is constantly pulled into each incident, and it becomes harder to move forward with the projects that we'd like to achieve.

The ask here is for people to get involved with tasks that feel like they are "on the other side of the line". What that practically means is:

For @reprazent @smcgivern @oswaldo @jacobvosmaer-gitlab :

  • finding coding tasks that are more straightforward, and work with @cmiskell to enable him to be able to take on those tasks
  • if you see an SRE style task that you'd like to try, contact @cmiskell and ask him to work with you on it
  • if you are debugging something in code, consider recording yourself walking through it so that @cmiskell can watch this later (or pair if timezones permit)
  • creating your own production change requests

For @cmiskell :

  • finding SRE style tasks that are more straightforward, finding one of the engineers that are interested in learning this, and guiding them through the task
  • if you find a coding task you'd like to pick up, contact one of the engineers to work with you on it
  • record yourself performing some of the change requests that have come from the Scalability Team so that they can see what other things are required that aren't always captured in the text on the issue
  • helping to guide people through creating production change requests

This is a short list to get the idea started, but I'm sure there are others that we could add.

I'm also interested to hear any thoughts or concerns about this idea. Please let me know what you think.