Workflow Engine Paradigms

Jan 24, 2023

All happy workflow engines are alike; every unhappy workflow engine is unhappy in its own way. – Tolstoy, on workflow engines.

Workflow engines automate a series of tasks. These tasks are usually related to CI/CD, infrastructure automation, ETL, or some other data or batch processing.

Execution environment – Modern workflow engines have mostly converged on either container-native or serverless execution environments. This is done for idempotence and reproducibility, testability, and cost savings. Argo is one of the best examples of a Kubernetes and container-native workflow engine.

AWS Step Functions uses AWS Lambda to stitch together a serverless workflow engine.

DAG – Most workflow engines like Airflow operate on a static graph. Each job defines it's dependencies and downstream tasks.

Another variable on the DAG-as-ground-truth workflow engine is event-based. The DAG is designed implicitly – workflows emit or trigger events that are consumed by certain services. Those services know little about the workflow topology besides the event they are listening for. Brigade is an example of an event-driven workflow engine for Kubernetes.

Configuration – Workflow tasks are defined in a variety of ways. Argo uses Kubernetes resource definitions (YAML). GitHub Actions uses it's own YAML definition. Prefect, Airflow, Dagster, Luigi, and other data-centric workflow engines define jobs as a python API.

Long-running or fault-tolerant workflows – Retry logic is often the hardest to get right. For many workflows, it doesn't matter: CI/CD workflows that fail are annoying to re-run but never impact the customer directly. Dealing with production-critical workflows is a different story. Temporal solves this problem as the basis of their engine (as does Cadence (Uber) and Conductor (Netflix)).

Matt Rickard

Discussion about this post