DevOps #10: Deploy Independently with Workspaces and Workloads

Thursday, November 18, 2021

13 minute read

How Can We Deploy Independently Without Chaos?

Continuous and independent deployment sounds great from the team’s perspective. Nothing blocks the flow of value. They can ship each feature when it completes, and fix any bugs with an immediate rollback. The team can interact quickly and directly with customers through A/B tests and previews, and they can do all of this free of the company bureaucracy around release management, timing, and overhead.

But from the organization’s perspective, continuous and independent deployment promises sheer chaos.

Imagine you have 130 teams, each of which owns 8-12 components. With those thousands of moving parts, the organization needs to:

Market and sell a predictable set of value that customers can understand, budget around, and consume.
Never release a feature too early or it will cannibalize current sales.
Always have accurate scripts for first-line tech support to handle simple customer issues, no matter where they wander in the product suite.
Provide timely and regular training to customers about how to use both new and existing capabilities.
Prevent failures or resource over-utilization in one component from cascading to others.
Monitor overall stability and responsiveness to observe customer problems across the solution.
Maintain both total and per-item costs, in appropriate ratio to changing revenues.
Manage the different needs of different types of components — libraries consumed across the organization, applications installed and supported at client sites, and services operated by the organization.

Continuous, Independent Delivery works great for teams. It is not enough for organizations.

How can the organization meet its full-system needs with minimal impact to team independence?

Maintaining system-level properties requires some amount of standardization. However, all standardization constrains team independence. And that limits team responsibility, the ability to experiment, and the flow of value.

Therefore, we must standardize the right things. We often worry about over-standardization or under-standardization. However, the usual problem is standardizing the wrong things. And that’s hard because the system pains and the productivity pains are experienced by different people. No one person knows both what would be useful to standardize and what would be painful to standardize.

We need a way to standardize incrementally with contributions from all 100+ development teams and all 50+ support teams, without requiring expertise in our standards. Everything needs to remain simple to understand, even as it changes frequently.

Standardize Around Apps, Purposes, Workloads, and Workspaces

The primary technique will be encapsulation. We use the Hexagonal Architecture again, just at a larger scale than in the last newsletter.

The Standard Hexagons

We define 4 key concepts. Each encapsulates one kind of complexity and provides a standard API. There will be many instances of each concept. Those instances will differ internally but be uniform from the outside.

For example, each application versions and stores builds in the same way. The team working on an application has complete independence for the application, yet the teams working on application deployment can treat all applications as uniform “deployable objects.”

4 concepts seem to provide a good set for most organizations:

Application: a separately-deployable set of code or other work products. Examples include a traditional app, a set of micro-services, a library, a documentation website, or a set of evidence to support regulatory approval. This is the connection point for team-level concerns.
Purpose: a goal the organization is trying to meet. Examples include “verify each build of the Foo app,” “run Bar for all regular customers,” or “Run Foo, Bar, and Quux together for customer X in their own separate environment to meet their special regulatory needs.”
Workload: a Purpose and the set of Apps which work together to accomplish that purpose. This is the actual unit of deployment.
Workspace: a place to host, monitor, and support one or more Workloads. This is the connection point for all system-level concerns.

Both Workspaces and Applications can be arbitrarily complex. Each one often will sub-divide into separate components, each representing a sub-concept. Each sub-concept can have its own team.

What is a Workspace?

A Workspace is an abstract “box” that you can install anything into. The “box” provides an API between whatever you install into it and the teams in your organization.

For example, a Workspace at an older organization might consist of:

Interfaces with Installation and Infrastructure:
- 1 resource group in Azure, plus
- 1 AWS workspace, plus
- 1 set of Containers in a private data center, plus
- 1 set of hardware run by the client at their site, plus
- 1 pipeline to execute the application’s installation steps, plus
- 1 pipeline to execute an automated roll-back to the previous version.
Interfaces with Support, Monitoring, and Execution Performance:
- 1 internal website for tech support scripts, plus
- 1 set of operational dashboards, plus
- 1 centralized app config, plus
- 1 internal website to host the documentation for app config settings and operational how-tos, plus
- 1 feature-flag control system that Ops can use to roll back buggy features, plus
- 1 control board to trigger an automated code roll-back (triggered manually or in response to measures).
Interfaces with Accounting:
- 1 set of accounting and budgeting reports, plus
- 1 Slack channel to handle purchasing discussions.
Interfaces with Approvals:
- 1 internal website for specs, plus
- 1 document repository to hold traceability docs, plus
- 1 set of dashboards to display test results, plus
- 1 gate checklist where people each give their approval, plus
- 1 pipeline to execute automated tasks in response to approvals.
Interfaces with Sales and Marketing:
- 1 set of sales reports, plus
- 1 customer-facing announcements and marketing channel, plus
- 1 feature-flag control system that marketing uses to control feature go-live.

Workspaces are the primary focus for system-level properties. They implement these properties by defining a boundary between a Workload and the organization as a whole. This boundary consists of a set of capabilities. Some capabilities are defined by the Workspace and used by the Workload. Others are defined by the Workload and used by the Workspace.

Together, these capabilities make up the standards for your organization.

The expected capabilities can change over time. Any team in the organization can offer a new capability or can ask that others offer a new capability. The organization approves the offer or request by choosing to incorporate it into the Workspace boundary; it is now part of the standard.

Making It Happen

There are concrete steps to create encapsulation and standardize the right things. Use the recipes to implement these five steps.

Initial Purposes and Workspaces There are two kinds of Purposes for Workspaces. The recipe explores both and defines the smallest set of Workspaces that could possibly work.
Initial Workspace Standard Capabilities Each Workspace should start out simple. The simplest Workspace that could possibly work standardizes four specific properties. The recipe details that minimal standard.
Assemble the Hexagons The core operation is to run a Workload in a Workspace, but supporting that requires many operations performed by many departments. The recipe identifies some of those operations and how to assemble them.
Develop Standards Iteratively Workspaces are the mechanism for standardization. The recipe shows how to find the correct standards for your organization.
Workspaces to Avoid There are workspaces that are generally something to avoid, but are occasionally needed. The recipe explores these and when to use them.

Most Importantly, Don’t Standardize Everything

Once you get started, everyone in the organization will want to be part of the Workspace. Be careful about what you allow in; each thing you standardize will impair productivity and development effectiveness. This is also your chance to Lean the organization by finding and removing waste. You will typically find four kinds of waste:

Pet Projects. Activities hidden within the organization that do not provide organizational value. Note: these differ from experiments. Experiments are time-boxed, fully visible to the organization, and always stop at the end of their time-box, followed by sharing findings.
Failure Demand. Projects, teams, or departments that exist only to detect and fix problems caused within the organization. This is your opportunity to work upstream and stop creating the problems, which will allow you to eliminate the downstream team and re-allocate the people to useful work.
Sacred Cows. These exist out of inertia. They provide value, but there would be a much more effective and efficient way to get that value today. First replace them with the more effective approach, then evaluate whether that should be part of the Workspace.
Non-core Business. These provide value but are not worth standardizing. The standardization cost, including ongoing development complexity, is greater than the value. This indicates that these are simply not core to your business. Find a way to outsource the activity and free your entire team to work on your core missions.

Interface organizational needs with team independence using Workspaces.

Access the recipe to Standardize Deployment with Workspaces.

Standardize the Right Things

Having a single point of standardization allows you to achieve organizational needs while minimizing impacts on teams. Making clear interfaces helps you identify the organizational needs that are worth the cost of standardization. The company can then automate and standardize the needs that are worth it, while eliminating and outsourcing the needs that aren’t core to the business.

The standard Workspaces also provide value to the dev teams. The sensible defaults minimize the things that dev teams have to think about. Teams can now incrementally add product complexity while still maintaining sufficient organizational capabilities. They can take on each new consideration one at a time, in priority order. This increases dev team release cadence and productivity, especially when starting new projects.

Benefits:

Decrease workload in non-development groups by unifying products.
Simplify to 1-3 common ways to do each thing.
Teams automate deployments at significantly reduced cost.
Easily separate concerns. Allow product teams to worry about capabilities and components, while Ops and Accounting worry about performance and cost.
Decrease TCO for all projects.
Decrease new project start time. Get a first version to production in a matter of weeks.
Eliminates waste in all parts of the business.

Downsides:

No more pet projects. Each team that doesn’t directly produce a product is either part of the standard or not. The organization must explicitly decide whether each is worth slowing down product development on all projects. Those that are worth it will make the cut and be standardized. Those that can’t make the cut will be cut.
Threatens sacred cows. Organizations have many teams, processes, and even departments that exist out of inertia or failure demand. Each of these gets exposed if the organization carefully considers what should and should not be added to the Workspace.

Demo the value to team and management…

This month’s demo recommendation is slightly different because standardization work is done by management and demoed to the team, rather than the other way around. Management likely does not have a regular demo cadence — there are probably various all-hands and roll-out meetings, but not meetings for management to demonstrate the work that they’ve accomplished and get feedback from the teams. However, now it is time to create such a demo.

As a manager, show three things at your demo to the team:

Example: workspace with one standard.
Progress: remaining special-case overheads list.
Impact: overhead hours per project per release and per deployment.

Example: Workspace with One Standard

First show the costs to the whole organization of the current practice. You can do this by showing the following points:

The most important organizational need that impacts teams.
What product teams had to do previously in order for the organization to meet that need.
What the support team had to do to integrate with all of the products.
How it varied per product team, per release, and per purpose.
Describe, or at least outline, all the manual steps required by a product team in order to comply.
Also outline the number of interruptions, rework, and errors that happened in a typical execution of the process.

Now show the new approach. Show the following points:

Sensible defaults that allow most teams to simply ignore the concern and still get an adequate result.
How a team can override the default when they have a specific need.
The reduced effort in both the product teams and the team that owns the now-standard process.
How it all integrates together at the Workspace, so that everyone involved can easily coordinate on a single purpose.
How the owning team can distinguish and manage efforts across the hundreds of purposes present in the organization.

Finally, put these side-by-side to clearly show the decreased cost and increased flow in every part of the organization.

Progress: Special-case Overheads List

Create a single chart that shows all known organizational needs per week. This should categorize those needs into 3 categories: ad-hoc, in Workspace, and removed. One good display is a stacked bar chart by week that shows the number of known needs in each category, along with a detail table. Clicking on any week int he bar chart shows the detail table for that week, which lists all the known needs and their current state.

The hardest part of creating this chart is finding all the organizational needs. Lots of people and teams are performing lots of activity, but the underlying needs can be hard to spot. Here is one good approach:

Gather a list of all the various overheads that impact any project teams. Identify anything they have to do that is for a purpose other than directly enhancing product functionality.
Identify each team that does not directly create or sell a product. List the things that they do, focusing first on any interactions they have with the product or the product teams.
Cluster those lists of activities by their nominal objective. Name those objectives. Identify the team that owns the objective and which teams contribute to or depend on that team.

Every week, report on which concerns fall into each of the three categories. Highlight weeks in which you discover new concerns or shift them between categories — especially any that you manage to remove. It is also important to show that removing a need results in those people getting to do more important and interesting work.

Over time, the list of concerns in the ad-hoc category should become empty. Continue demos as long as ad-hoc items exist.

Impact: Overhead Hours Per Project

The impact will be decreased overhead time. Most of this will be communication and organization time, spent both with product teams and within the teams that own the various organizational needs. Some will also be activity time spent on low-value needs which you manage to remove.

One good way to show all this is as total person-hours saved per quarter. You can show this with a stacked bar chart by week. Calculate it as follows:

Start with your needs list that you gathered for your progress chart.
For each need, identify the typical amount of time spent by each team per quarter to do the various communication and organizational activities. This includes both owning and contributing teams. Assume typical numbers of releases, deployments, sales campaigns, and so forth.
Total that for each need, across all activities and teams. Show that as “remaining overhead” in your details table in the progress chart.
Each time that your standardization eliminates an activity, move that time from “remaining overhead” to “removed overhead” for that need item.
If you decide to remove a need entirely, then add in the entire owning team’s work time as “remaining overhead” for that need. Move it all to “removed overhead” once you manage to remove the need by outsourcing or dropping it and are able to re-allocate the team.
Graph the total “removed overhead” for each week. You could leave this as a single bar or divide it into categories, such as by department.

←Previous
Next→