ðŸŠķ Apache Airflow + Makoto Integration Concept

DAG-driven orchestration — emit DBOMs at every task boundary.

Note: This page explores how Makoto Levels could be implemented on Apache Airflow. It is a conceptual integration proposal — illustrative, not a shipped library. The patterns shown use real Apache Airflow APIs; the Makoto pieces are sketches you (or we) could build out.

What is Apache Airflow?

Apache Airflow is the most widely deployed workflow orchestrator. DAGs describe task dependencies; the scheduler runs them on schedule or signal. Because every task already produces typed input/output context, Airflow is a natural surface for emitting per-task attestations.

XCom ContextPre/post-execute hooks carry the data your DBOM needs
Provider PatternDrop-in MakotoOperator subclasses native Airflow patterns
On-Success CallbackSign + persist attestation when a task succeeds
Lineage BackendOpenLineage backend → DBOM enrichment for free

Integration Approach

Primary pattern: Operator + on_success_callback. Below are the integration options ordered by lift required.

How Makoto attaches to Apache Airflow

  • MakotoOperator base class — Subclass of BaseOperator that wraps your existing operator. Computes content hash of XCom output and emits an attestation to your DBOM store.
  • on_success_callback hook — Drop-in callback you attach to existing DAGs — no operator changes needed.
  • OpenLineage bridge — If you already emit OpenLineage events, a small adapter translates them into Makoto attestations.
  • Sensors as origin attestors — S3KeySensor, KafkaSensor, etc. fire an Origin attestation the moment new data is observed.

Conceptual Code Example

Concept: MakotoOperator wraps any task

Per-task attestation emitted on success, signed via cosign

# dags/sales_pipeline.py
from datetime import datetime
from airflow import DAG
from airflow.providers.amazon.aws.operators.s3 import S3CopyObjectOperator
from makoto_airflow import MakotoOperator, makoto_callback

with DAG(
    "sales_etl",
    start_date=datetime(2026, 1, 1),
    schedule="@daily",
    default_args={"on_success_callback": makoto_callback(level=2)},
) as dag:

    # MakotoOperator emits Origin attestation for raw extract
    extract = MakotoOperator(
        task_id="extract_orders",
        attestation_type="origin/v1",
        level=2,
        sql="SELECT * FROM orders WHERE updated_at > {{ ds }}",
        conn_id="warehouse",
    )

    # Transform step gets a Transform attestation automatically
    transform = MakotoOperator(
        task_id="hash_pii",
        attestation_type="transform/v1",
        level=2,
        upstream_task_ids=["extract_orders"],
        python_callable=lambda df: df.assign(email=df.email.apply(sha256)),
    )

    # Any existing operator gets attestation via the callback
    copy = S3CopyObjectOperator(
        task_id="publish",
        source_bucket_key="warehouse/sales/{{ ds }}.parquet",
        dest_bucket_key="public/sales/{{ ds }}.parquet",
    )

    extract >> transform >> copy

Potential Use Cases

Daily ETL

Every ETL DAG run emits a DBOM proving which inputs produced which outputs.

Compliance Audit

Auditors replay any historical DAG run from its DBOM, no archeology required.

Multi-team Handoff

Downstream teams refuse to consume parquet files that don't ship with a valid DBOM.

Data Contract Enforcement

Schema and hash mismatches surface as task failures, not silent corruption.

Interested in Apache Airflow + Makoto?

This is a conceptual integration. If you're shipping Apache Airflow pipelines and want to add Makoto attestations, open an issue or reach out — we'd love to scope a real implementation.

Learn about Apache Airflow Read Makoto Spec All Integrations