DevOps #8: Find Integration Bugs without Integration Tests

Thursday, September 02, 2021

11 minute read

How Do I Keep This Working?

A team that has been following this year’s Devops Series and performing each technique would find themselves able to:

Plan features without dependencies,
Edit code independently,
Build and verify independently,
Isolate changes, and
Isolate its integration complexity from its main collaborator.

For that team, this is great progress and things seem pretty good for a while. Work is fast, deployment is practical, and integration bugs are lower than they’ve ever been.

But then integration bugs start climbing.

The Future is Dark

We have all seen the future from here:

We start asking each other to be careful when deploying and working.
Teams start checking in on each other to discover integration issues earlier.
These reviews become blocking commit or deploy requirements.

And for our team, it’s not long before they’ve undone all of their development speed improvements.

But what else can we do? We need a better way to discover integration errors early.

Automated Integration Tests Don’t Work

The obvious answer is to start using automated integration tests. Many, many codebases have followed this path and consistently find the same problems:

Test oversensitivity. Tests that span components are easily disrupted by changes unrelated to the purpose of that test. So we get many false positives.
Combinatorial complexity. We can’t write enough tests to cover the interactions between components for every combination of each component’s behaviors. So bugs slip through.
Test duration. Each component encapsulates its database, network, and other resources, and we can’t stub them out. So the tests take too long to run.
Complex and fragile setup. Getting the components to a starting state requires modifying the previously mentioned encapsulated resources. So changes break tests, which we assume are false positives.

The ultimate result is that the team stops trusting the integration test suite because it gives slow, unreliable feedback.

It Seems Nothing Will Work

Automated integration tests were our last hope.

We’re already thrown out manual integration testing and cross-team reviews because they grind productivity to a halt. Now we’re saying automated integration testing gives us false confidence and misses bugs.

So what’s left?

Use Automated Port Tests

Integration tests exist to verify how the edge of my system interacts with an external system. Specifically, we need to verify that the combination of the external system + my adapting code + my calling sequences has known, consistent, and intended behavior.

We can verify correct system integration in pieces if we can prove four verifications:

Verification #1: My system uses the Port as intended.
Verification #2: Each outside system, when adapted by my Adapter, implements the Port’s intention.
Verification #3: The tests for each Adapter in item 2 use the same intention.
Verification #4: The intention in items 1 and 2 are the same as each other.

Let’s start with verifications 2 and 3, because that solution is independent of how we solve the rest of the problem. Restating our goal,

We need to verify that each Adapter behaves exactly the same when viewed through the Port.

We verify that the Adapters behave the same way with Contract Testing.

We define the expected interactions of our Adapters using a contract, just like a company uses legal contracts to define its interactions with suppliers. While the legal industry represents a contract with a document, we represent that contract as a single test suite.

We then run that test suite against each Adapter. The test suite defines the contract that the Adapters need to meet, and each execution ensures that one of them meets it. The only requirement for Contract Testing is that we need to be able to interact with each Adapter in a consistent way — and our Port does that.

The most common implementation is abstract inheritance in the test classes. The base class defines all the tests; each of which is written against a testSubject of type Port. Each child class simply defines the one function _SupplyTestSubject to return a Port bound to the correct Adapter.

We Probably need to Extend Our Port

The Contract Test must only reference the Port. It can’t reach into the Adapter for any implementation-specific behavior. However, the test commonly needs to do more than the product code. While the product code needs to use the Port to do things, the test needs to both do things to set up the initial condition and then query about the current state in order to verify results.

That will usually require extending the Port definition. Common extensions include:

Support for error conditions, such as objects that are guaranteed to be missing.
Global property queries, such as all objects that have ever been stored.
Adapter-specific invalid values, such as object keys that are malformed for that Adapter. Note: each Port must return a key that is malformed in the same semantic way, but the actual value likely has to be Adapter-specific.

Keep your Port’s metaphor clearly in mind as you make these extensions.

An Example Would be Useful…

Assume I am abstracting a local file system, a remote file system, and a CouchDB instance. My system treats them all as document repositories, each able to hold several collections of documents. Each collection has a template and a bunch of documents created off that template.

I want to write a contract test that verifies that each Adapter gives the same error when I ask for a template that doesn’t exist. I’d like to verify both the case where the collection is missing and where the collection exists but there is no template. In fact, since the distinction between those two cases isn’t important to my system, I want the Adapter to simplify the real world so that the Port responds the same in each case.

I need to extend the Port so that I can set it up in 2 different invalid situations, neither of which I want for my real code.

Missing collection.
Collection missing its template.

I can accomplish the first by providing a collection name that can never be part of my application’s real keyspace. For example, the local file system Adapter could give a name that includes a colon, and CouchDB might give a name that starts with _design/. Each will cause an error when we try to read or write a document or template, which our Adapter then has to capture and Adapt correctly.

The second is a little trickier, because we generally don’t want to allow our code to put a production system into this invalid state. In fact, we don’t want our product to even know that it is possible!

I generally collect these kinds of functions into a sub-Port on my main Port. For example, the test code may call badCollection = testSubject.PerformIllegalAction.CreateCollectionWithoutTemplate(). This method is very clearly separated from anything my product may want to use. Adding it does not corrupt my product’s tidy, simple view of the world, but it does allow my test to force the Adapter to respond to real-world complexities.

Are These Unit Tests?

Well, most of them aren’t; they are platform tests. Some of them might be unit tests.

For example, the Loan example in the picture has an InMemoryCacheAdapter. Assume that cache was implemented using a cache libraries that stores objects in a dictionary. Contract tests against that adapter meet all the requirements for unit tests — they execute in memory, with no network, shared state, or crossing process boundaries. We just instantiate a new cache instance for each test.

How do We Run Them?

We will break our Contract Test executions into several execution suites.

Unit Test Suite. Any which meet the requirements for unit tests, such as the InMemoryCacheAdapter.
Platform X Suite. A separate platform testing suite for each external system. OtherBankApiAdapter would be tested in a different platform suite from TransactionalDbAdapter, for example.

We then run each platform suite whenever:

We change the contract test,
The dependency changes, or
We change that dependency’s Adapter code.

These are all infrequent events in practice, so we usually end up running a given platform test suite a few times in a month. It does not need to be part of the main build & test.

We could put even the unit-test-like tests into a platform suite. However, they are well-behaved and fast, so it doesn’t hurt to run them more often — just like we do with other unit tests.

What About Verification #1?

Everything above demonstrated that my Adapter behaves as intended. But what about Verification #1 — checking that my code uses the Port as intended? The most common solution is to Mock out the Port when testing my system.

Unfortunately, using Mocks makes it impossible to meet Verification #4 because it duplicates the definition of intention. Each Mock’s setup defines the intended behavior for the Port, and the assertions in each test for the adapter in Verification #2 also define intended behavior for the Port. Nothing keeps the multiple definitions in sync, so they naturally drift over time and lock in your integration bugs.

The solution for this problem, Simulator Testing, is complex enough that it needs its own newsletter. Check Issue #9 to see how we will ensure that our component uses its dependencies correctly!

What Have We Gained?

We specified how our application wants the Port to behave, and verified that each external system works with its Adapter to behave that way. We know our dependencies all work as our code expects. Additionally, whenever a external dependency changes it will cause a test to fail so we can update the Adapter. Our test suite tells us when we have an integration problem without running an integration test on every build.

Implement Contract Tests to ensure that each external dependency behaves identically

Access the recipe to verify integrations with integration tests.

Know When Integrations Will Fail

Your team will know when an upstream change will cause an integration bug with your component. Your tests will help you either update your Adapter to encapsulate the change or update your Port, calling code, and other Adapters to allow your system to leverage the change. These tests cover only their component and only check the simplified behaviors your component needs, so they run fast and fail only when they should.

You can also leverage others’ Contract Tests when you provide a component. Run their test suite to measure the real impact of any proposed breaking changes. Notify your clients proactively and allow them to launch their update with you.

Benefits:

Catch more integration bugs.
Stop waiting for your integration test suite to complete.
No more false positives when you change a component.

Downsides:

Requires that your component use Ports and Adapters.

Demo the value to team and management…

Show three things at your sprint demo:

Example: Show false positives.
Progress: Count integration tests.
Impact: Count integration test updates.

Example: Show False Positives

Pick one integration test that has been updated many times in the past. Verify that most of those updates were false positives — the test failed because it depended on internals or called code unrelated to its primary goal.

Demonstrate a Contract Test that verifies the same intent, but is much simpler and less likely to have a false positive. Show how the last several changes that caused false positives with the integration test would not have disrupted the Contract Test.

Progress: Number of Integration Tests

Each week, count the total number of integration tests. You want to drive this to 0. If that doesn’t show initial progress clearly enough, count the number of lines executed by the integration suite as a whole. Depending on your language, you might calculate that by running a profiler or a source analysis. You can count only test lines or total lines.

Whichever measure you use, measure it weekly and chart the results as a line chart. Set a goal to initially reduce it by 25%, and eventually drive it to near-0.

Impact: Integration Test Updates

The largest impact comes from having to update integration tests in response to false positives. However, even test updates in response to real failures have cost. So just count the number of updates to the integration test suite per week. The best measure is probably the number of lines changed in commits for those files. Ignore changes that happen while moving stuff from an integration test to a Contract Test.

Count changes in the Contract Tests in the same way.

Chart both on the same line chart and show how the total number decreases as you move code from integration tests to Contract Tests.

Describe how each of these updates is not only a time waste, it also erodes trust in the tests. As trust erodes, devs stop fully debugging each failure. They assume the problem is in the test, and simply update the test to match the new behavior. Estimate a threshold for when that behavior changes; it is likely at a very low number, on the order of 1-2 changes per week, with 1-2 failing tests per change. Add that line to the chart as a goal. Show that contract tests remain below that threshold, and drive integration test updates down.