An Ideal CI/CD System

Dec 10, 2022

A good CI/CD system means developer productivity. What an ideal CI/CD system looks like today.

Serverless – A CI/CD system should be serverless. There's no reason to be managing Jenkins or individual nodes. Managing CI/CD machines makes it really easy to introduce configuration and environment drift, which causes hard to trace down bugs. Container-native is usually a good choice, especially if you often deploy software inside containers, but anything that provides a real ephemeral environment is good.

On-prem (Cloud-prem) – For small projects, it's fine to use hosted CI/CD services, but for anything bigger than a toy project, you should probably be spinning up your CI/CD runners inside your own cloud account.

Minimal permissions / native IAM – Workers should authenticate in a cloud-native way (such as OIDC) and be deployed with a minimal set of permissions. While it might seem like a hassle to expand the scope of workers while projects deploy a wider variety and number of services, limiting the blast radius and separating out environments will save you from a whole class of bugs and mistakes.

Easy to debug – Easy to debug means that it's possible to run the pipeline in a meaningful way locally. It also means that logs and artifacts are easily accessible from runs.

Triggers, but not complicated ones – GitHub Actions provides a good model for running workflows automatically (from pull requests or merge events) as well as manual triggers. However, don't build a complicated business process around the manual triggers.

Code, not YAML – Following my TypeScript for Infrastructure post, I believe it's much easier to design pipelines in a full programming language rather than a configuration one. You get reusability, control flow, strong typing, and other properties that you probably want when you're describing a dependency graph.

DAG – There's been some experimentation with having a fully event-driven CI/CD system. Generally, I think it's best to mostly have a static DAG that encodes the dependency graph. Events might trigger the initial workflow and might be emitted at the competition (or failure) of the DAG, but don't fire in between.

Matt Rickard

Discussion about this post