Waterfall, Agile, DevOps, VSM, and Flows

Initial brainstorming

https://www.youtube.com/watch?v=Oea7nMuJko0

Waterfall vs Agile vs DevOps vs Agile DevOps

In traditional Waterfall software development and delivery, there are strict linear stages, that are non-overlapping. This does not promote collaboration, and the structure inherently fosters divisions/silos. There are at least three types of silos:

In time: For example, there are strict stages in this order, Development, QA, UAT. So in this case, developers are not actively collaborating with QA folks, and they are also not collaborating with business owners, because each group of user's responsibilities are encouraged to be done in their mutually exclusive time periods.
In space: Cross-functional teams are not encouraged. Developers and QA folks and business owners have no notion of sharing a team.
In tools: Each stage is often has its own tool, so by the nature of the work and artifacts, users are separated.

Agile is an evolution of Waterfall and focuses on breaking down the barriers and working in cross-functional teams. DevOps is an evolution after Agile where the cross-functional teams extend to Operations roles (not just Development roles). And finally Agile DevOps emphasizes both Agile and DevOps as a coherent and complete software delivery framework.

Agile DevOps addresses the problems above because different users in different roles are encouraged to work together in space-time. They are in cross-functional teams. And also, there is less strict linearity in stages. For example, quality stakeholders and security stakeholders are encouraged to participate in the design and discussion of features early on, before even a single line of code is written. This increased collaboration means that quality and security risks are mitigated earlier in the process, resulting in less re-work on the average. This is often called "shift left".

The tools problem still exists in Agile DevOps as a framework (because the framework itself doesn't specify which tools you use or how many you should use). But GitLab directly addresses the tools problem because with one tool, you have no tool division.

Value Stream Management (VSM) and Flows

Value stream management (or more specifically, value stream mapping), gets its roots from lean manufacturing, where physical goods are created in a linear production line. The idea in lean manufacturing (and thus in software delivery based on VSM), is that production is split into linear stages. VSM aims to shorten each individual stage of the end-to-end lifecycle of production of a single unit (i.e. single feature/change for software delivery), in order to shorten the entire end-to-end time. So VSM tries to characterize waste in each stage, and have teams identify that waste, and then remove it.

VSM also emphasizes WIP (work-in-progress) limits for a given stage. The idea is that if that there is too many units stuck in a certain stage, it is a reflection of inefficiency. An early part of the flow is going at a pace faster than a later part of the flow can handle. So putting a WIP limit in a given stage is a forcing function to help teams ensure that units are not stuck. For example, if you are a developer, and your main task is development, but you see a software change stuck in QA (and there's already lots of issues in QA), you should probably help the QA folks relieve some of the pressure in the QA stage.

Flows are important because it brings organization and structure. However, we should be careful that the flow concept doesn't revert back to Waterfall. In particular, there needs to be some healthy tension between structure and free-form collaboration. For example, a flow could characterize Compliance stakeholders reviewing a feature change before it is released to production. Due to industry regulation and/or a company having a low risk appetite, this flow could dictate that the Compliance review step must always occur as a final sign-off. However, since we are in a world of Agile DevOps, the Compliance stakeholders should be actively engaged in the earlier feature discussions in the first place, so that collaboration is maximized and re-work is minimized.

VSM and Cycle Time

Cycle time is paramount. Cycle time is directly correlated to business success. If a team ships software changes with a short cycle time, that means that customers get realized business value more often, leading to more feedback, and leading to a product that converges to what the customer needs, leading to business success. Cycle time is often explained in the context of VSM, because the entire time of a VSM flow is the cycle time.

(Frequency of deploys is also important, but that is usually not directly in the context of VSM. Note that throughput is not super important. Because high throughput can be infrequent deploys, but a high volume of output per each deploy.)

Stages vs Events in VSM

Ultimately, our goal in the VSM framework is to reduce cycle time. So there are loosely to ways to think about this.

(1) A VSM flow can be characterized by multiple stages, such as Development, QA, UAT. Again, the goal here is not to introduce rigorous non-overlapping stages. Instead, the point is to introduce the minimum structure to aid software changes to proceed through a software delivery pipeline. In particular, in this example, developers, QA folks, and business owners should be involved in all stages, but just with different degrees of emphasis. This is in stark contrast with Waterfall, where stages are indeed distinct and there is no collaboration. From this perspective, if GitLab can help users identify how long units of production (software changes) are in a given stage, then teams can identify waste, and improve their process. The weakness of this approach that needs to be mitigated is the over-indexing on the times in a given stage. And it's hard to characterize waste and times when know in Agile DevOps, the stages are overlapping. So putting this identify-waste-framework onto stages may overly simplify the problem.

We have some designs that are based on this approach: https://gitlab.com/gitlab-org/gitlab-ee/issues/10432.

(2) Another approach is identifying events based on function or role. A compliance stakeholder may be participating in discussions throughout the VSM flow of a given change. GitLab can log all the times that a compliance stakeholder says something in GitLab, and each time represents a "compliance stakeholder event". By analyzing events, GitLab can tell you if teams are collaborating well, and correlate that with (hopefully) improved cycle times. For example, suppose in a given month, GitLab tells us that compliance stakeholder events happen very rarely, and they usually happen only during the late stages of the given workflow, shortly before a feature is planned to be released. This results in high cycle times. In a subsequent month, GitLab shows that compliance stakeholder events are more spread out, and happen earlier in the workflow on the average. This results in lower cycle times. GitLab will then aggregate all the statistics together and draw a correlation result: More ongoing and frequent compliance stakeholder collaboration results in lower cycle times. This information is pushed to the users/teams of GitLab, and they document as a best practice, and continue to measure/monitor this metric over time.

Edited Mar 22, 2019 by Victor Wu