上QQ阅读APP看书，第一时间看更新

Composed of bounded, isolated components

Here are two scenarios I bet we all can relate to. You arrive at work in the morning only to find a firestorm. An important customer encountered a critical bug and it has to be fixed forthwith. The system as a whole is fine, but this specific scenario is a showstopper for this one client. So your team puts everything else on hold, knuckles down, and gets to work on resolving the issue. It turns out to be a one-line code change and a dozen or more lines of test code. By the end of the day, you are confident that you have properly resolved the problem and report to management that you are ready to do a patch release.

However, management understands that this means redeploying the whole monolith, which requires involvement from every team and inevitably something completely unrelated will break as a result of the deployment. So the decision is made to wait a week or so and batch up multiple critical bugs until the logistics of the deployment can be worked out. Meanwhile, your team has fallen one more day behind schedule.

That scenario is bad enough, but I'm sure we have all experienced worse. For example, a bug that leads to a runaway memory leak, which cripples the monolith for every customer. The system is unusable until a patch is deployed. You have to work faster than you want to and hope you don't miss something important. Management is forced to organize an emergency deployment. The system is stabilized and everyone hopes there weren't any unintended side effects.

The first scenario shows how a monolithic system itself can become the bottleneck to its own advancement, while the second scenario shows how the system can be its own Achilles heel. In cloud-native systems, we avoid problems such as these by decomposing the system into bounded isolated components. Bounded components are focused. They follow the single responsibility principle. As a result, these components are easier for teams to reason about. In the first scenario, the team and everyone else could be confident that the fix to the problem did not cause a side effect to another unrelated piece of code in the deployment unit because there is no unrelated code in the deployment unit. This confidence, in turn, eliminates the system as its own bottleneck. Teams can quickly and continuously deploy patches and innovations. This enables teams to perform experiments with small changes because they know they can quickly roll forward with another patch. This ability to experiment and gain insights further enables teams to rapidly deliver innovation.

So long as humans build systems, there will be human error. Automation and disposable infrastructure help minimize the potential for these errors and they allow us to rapidly recover from such errors, but they cannot eliminate these errors. Thus, cloud-native systems must be resilient to human error. To be resilient, we need to isolate the components from each other to avoid the second scenario, where a problem in one piece affects the whole. Isolation allows errors to be contained within a single component and not ripple across components. Other components can operate unabated while the broken component is quickly repaired.

Isolation further instills confidence to innovate, because the blast radius of any unforeseen error is controlled. Bounded and isolated components achieve resilience through data replication. This, in turn, facilitates responsiveness, because components do not need to rely on synchronous inter-component communication. Instead, requests are serviced from local materialized views. Replication also facilitates scale, as load is spread across many independent data sources. In Chapter 2, The Anatomy of a Cloud Native Systems, we will dive into these topics of bounded contexts, isolation and bulkheads, reactive architecture, and turning the database inside out.