Substrate leans heavily on three concepts in order to improve the reliability of the overall system by minimizing the blast radius of changes.

They're usually referred to in alphabetical order - domain, environment, and quality - but are presented here in a progression more suitable to readers new to Substrate.

Please use inline comments, Slack, or [email protected] to ask questions.

Environments

Environments identify a set of data and the infrastructure that stores and processes it. (After all, what distinguishes your production environment from development, staging, or another? Production data; your customers' real, business-critical data.) An environment's primary purpose is to protect its data against access from other environments.

Use multiple environments to protect your customers’ data from code that hasn’t been tested thoroughly in pre-production environments.

An AWS account in your organization is a member of exactly one environment and can only access the networks assigned to that environment.

Example

Organizations typically define environments like development, staging, and production though the names and number is entirely up to them. Add more environments to support more different kinds of testing with greater parallelism.

Qualities

Highly reliable services almost always implement changes gradually to give their operators a chance to detect and mitigate failures when the impact is small. Qualities help make gradual change possible for many AWS resources like load balancers and security groups, even within a single service.

Use multiple qualities to protect any one service from changes that affect that whole service immediately.

An AWS account in your organization is associated with exactly one quality but can access and use resources in any AWS account that shares its environment.

Example

Suppose your organization defined the qualities alpha, beta, and gamma (which are what Substrate recommends). You could run 1% of your production environment in your alpha accounts, 9% in your beta accounts, and the remaining 90% in your gamma accounts. This isn't as smooth as routing a slowly increasing percentage of traffic to your new software as it's being deployed (and you should strongly consider doing that, too) but this strategy works even for AWS resources like load balancers and security groups.

You could also decide to name your qualities blue and green and swing traffic back and forth between them. The slight disadvantage to this architecture is that the one that's not receiving any traffic is not, at that moment, proving that its configuration is functional and thus the first trickle of traffic that comes to it when you start to swing back to it is slightly higher risk.

Domains

Domains are collections of one or more software services that form an isolated failure domain (pun very much intended). The software may be that which you've written yourself, hosted in any serverless or serverful manner, or an AWS-managed service.

Use multiple domains to protect services in any one domain from changes in all other domains.

An AWS account in your organization is associated with exactly one domain but can access and use resources in any AWS account that shares its environment. There may be multiple AWS accounts within a domain, each of a different quality.