OpenLineage vs Makoto

OpenLineage and Makoto address different aspects of data pipeline management. OpenLineage provides observational metadata for tracking job executions and dataset dependencies. Makoto provides cryptographic attestations that prove data provenance with verifiable guarantees.

Complementary tools: OpenLineage tells you what happened in your pipelines. Makoto proves what happened with cryptographic evidence. Many organizations will benefit from using both: OpenLineage for operational visibility, Makoto for compliance and trust.

Different Problems, Different Solutions

OpenLineage

Metadata collection for operational visibility

  • Track job executions across platforms
  • Discover dataset dependencies automatically
  • Debug pipeline failures
  • Power data catalogs and discovery tools
  • Enable impact analysis for schema changes

Makoto

Cryptographic proof for data integrity

  • Prove data origin with signed attestations
  • Verify transformation chain integrity
  • Meet regulatory compliance requirements
  • Enable trustless data sharing
  • Document AI training data provenance

Concept Mapping

OpenLineage Makoto Key Difference
Job Transform Makoto transforms include cryptographic hashes of code and inputs
Dataset Subject Makoto subjects have cryptographic digests, not just identifiers
Run Attestation Makoto attestations are signed; OpenLineage runs are metadata events
Facets Predicate fields Both support extensibility; Makoto predicates follow in-toto format
Input/Output Datasets inputs[] / subject[] Makoto inputs include attestation references for chain verification
Origin attestation Makoto has explicit origin type; OpenLineage infers from first dataset
Stream Window Makoto has dedicated streaming support with Merkle trees
DBOM Makoto provides complete lineage documentation format

Trust Model Comparison

The fundamental difference lies in the trust model. OpenLineage assumes you trust your infrastructure to report accurate metadata. Makoto provides cryptographic proof that can be verified independently.

Aspect OpenLineage Makoto
Trust basis Trust the emitting system Cryptographic verification
Signing Not built-in Required at L2+
Tamper detection Not provided Signature verification
Data binding Reference by name Cryptographic hash binding
Chain verification Graph traversal Hash chain verification
Non-repudiation No Yes (at L2+)

When trust matters: If you're sharing data externally, need regulatory compliance, or training AI models where provenance affects liability—cryptographic verification isn't optional. OpenLineage metadata can be falsified; Makoto attestations cannot (at L2+).

Technical Comparison

Event Structure

OpenLineage RunEvent

{
  "eventType": "COMPLETE",
  "eventTime": "2025-01-15T10:30:00Z",
  "run": {
    "runId": "uuid-here"
  },
  "job": {
    "namespace": "airflow",
    "name": "etl_pipeline"
  },
  "inputs": [...],
  "outputs": [...],
  "producer": "https://airflow.example.com"
}

Makoto Transform Attestation

{
  "_type": "https://in-toto.io/Statement/v1",
  "subject": [{
    "digest": {"sha256": "abc..."}
  }],
  "predicateType": "https://makoto.dev/transform/v1",
  "predicate": {
    "inputs": [{
      "digest": {"sha256": "def..."}
    }],
    "transform": {...}
  }
}

Specification Foundations

Aspect OpenLineage Makoto
Schema format OpenAPI + JSON Schema in-toto Statement + JSON Schema
Transport HTTP API, Kafka, custom Any (attestations are documents)
Extensibility Custom Facets Custom predicates, additional fields
Governance LF AI & Data Foundation Open specification (in-toto aligned)
Reference impl Marquez Expanso

Streaming Data Support

Both frameworks support streaming data, but with different approaches:

OpenLineage Streaming

  • Events emitted per micro-batch or checkpoint
  • Same Job/Dataset/Run model as batch
  • Integrations for Spark Streaming, Flink
  • Metadata about streaming job state

Makoto Stream Windows

  • Dedicated stream-window/v1 predicate type
  • Merkle tree root for all records in window
  • Chain linking between consecutive windows
  • Cryptographic proof of window integrity
  • Supports millions of events/second

Makoto's streaming support is designed for cryptographic verification at scale. The Merkle tree approach allows individual record verification without storing all records, and chain linking enables detection of missing or modified windows.

When to Use Each

Use OpenLineage when...

  • Building a data catalog or discovery platform
  • Debugging pipeline failures and data quality issues
  • Performing impact analysis for schema changes
  • Tracking job performance and SLAs
  • All consumers are internal and trusted

Use Makoto when...

  • Sharing data with external partners who need verification
  • Meeting regulatory compliance (GDPR, CCPA, EU AI Act)
  • Documenting AI/ML training data provenance
  • Building data marketplaces with trust requirements
  • Needing tamper-evident audit trails

Use Both when...

  • You need operational visibility and compliance guarantees
  • Internal teams use OpenLineage for debugging; external data uses Makoto for trust
  • Building comprehensive data governance with different trust levels

Example: ML Training Data Pipeline

Use OpenLineage to track feature engineering jobs in Airflow, debug failures, and understand dataset dependencies. Use Makoto to create signed attestations for the final training dataset, proving its provenance for model governance and regulatory review.

Integration Possibilities

OpenLineage and Makoto can work together. OpenLineage events can trigger Makoto attestation generation, creating cryptographic proof from operational metadata:

# Conceptual integration flow
# 1. OpenLineage captures job completion
openlineage_event = {
    "eventType": "COMPLETE",
    "job": {"name": "anonymization_pipeline"},
    "outputs": [{"name": "anonymized_customers"}]
}

# 2. Generate Makoto attestation from OpenLineage metadata
makoto_attestation = {
    "_type": "https://in-toto.io/Statement/v1",
    "predicateType": "https://makoto.dev/transform/v1",
    "predicate": {
        "transform": {
            "name": openlineage_event["job"]["name"],
            # Add cryptographic hashes not in OpenLineage
            "codeRef": {"digest": {"sha256": "..."}}
        }
    }
}

# 3. Sign the attestation for L2 compliance
signed_attestation = sign_with_sigstore(makoto_attestation)

This approach lets organizations leverage existing OpenLineage integrations while adding cryptographic guarantees where needed.

Summary

Dimension OpenLineage Makoto
Primary purpose Operational metadata Cryptographic provenance
Trust model Trust the infrastructure Verify cryptographically
Core entities Job, Dataset, Run Origin, Transform, Stream Window
Data binding Name-based references Cryptographic hashes
Signing Not built-in Required at L2+
Best for Data catalogs, debugging Compliance, external trust
Output artifact Lineage graph DBOM (Data Bill of Materials)

Bottom line: OpenLineage is excellent for understanding your data pipelines internally. Makoto is essential when you need to prove data provenance to external parties or regulators.

Learn about Makoto Levels → | Compare with SLSA → | OpenLineage Documentation →