OpenLineage vs Makoto

OpenLineage and Makoto address different aspects of data pipeline management. OpenLineage provides observational metadata for tracking job executions and dataset dependencies. Makoto provides cryptographic attestations that prove data provenance with verifiable guarantees.

Complementary tools: OpenLineage tells you what happened in your pipelines. Makoto proves what happened with cryptographic evidence. Many organizations will benefit from using both: OpenLineage for operational visibility, Makoto for compliance and trust.

Different Problems, Different Solutions

OpenLineage

Metadata collection for operational visibility

Track job executions across platforms
Discover dataset dependencies automatically
Debug pipeline failures
Power data catalogs and discovery tools
Enable impact analysis for schema changes

Makoto

Cryptographic proof for data integrity

Prove data origin with signed attestations
Verify transformation chain integrity
Meet regulatory compliance requirements
Enable trustless data sharing
Document AI training data provenance

Concept Mapping

OpenLineage	Makoto	Key Difference
Job	Transform	Makoto transforms include cryptographic hashes of code and inputs
Dataset	Subject	Makoto subjects have cryptographic digests, not just identifiers
Run	Attestation	Makoto attestations are signed; OpenLineage runs are metadata events
Facets	Predicate fields	Both support extensibility; Makoto predicates follow in-toto format
Input/Output Datasets	inputs[] / subject[]	Makoto inputs include attestation references for chain verification
—	Origin attestation	Makoto has explicit origin type; OpenLineage infers from first dataset
—	Stream Window	Makoto has dedicated streaming support with Merkle trees
—	DBOM	Makoto provides complete lineage documentation format

Trust Model Comparison

The fundamental difference lies in the trust model. OpenLineage assumes you trust your infrastructure to report accurate metadata. Makoto provides cryptographic proof that can be verified independently.

Aspect	OpenLineage	Makoto
Trust basis	Trust the emitting system	Cryptographic verification
Signing	Not built-in	Required at L2+
Tamper detection	Not provided	Signature verification
Data binding	Reference by name	Cryptographic hash binding
Chain verification	Graph traversal	Hash chain verification
Non-repudiation	No	Yes (at L2+)

When trust matters: If you're sharing data externally, need regulatory compliance, or training AI models where provenance affects liability—cryptographic verification isn't optional. OpenLineage metadata can be falsified; Makoto attestations cannot (at L2+).

Technical Comparison

Event Structure

OpenLineage RunEvent

{
  "eventType": "COMPLETE",
  "eventTime": "2025-01-15T10:30:00Z",
  "run": {
    "runId": "uuid-here"
  },
  "job": {
    "namespace": "airflow",
    "name": "etl_pipeline"
  },
  "inputs": [...],
  "outputs": [...],
  "producer": "https://airflow.example.com"
}

Makoto Transform Attestation

{
  "_type": "https://in-toto.io/Statement/v1",
  "subject": [{
    "digest": {"sha256": "abc..."}
  }],
  "predicateType": "https://makoto.dev/transform/v1",
  "predicate": {
    "inputs": [{
      "digest": {"sha256": "def..."}
    }],
    "transform": {...}
  }
}

Specification Foundations

Aspect	OpenLineage	Makoto
Schema format	OpenAPI + JSON Schema	in-toto Statement + JSON Schema
Transport	HTTP API, Kafka, custom	Any (attestations are documents)
Extensibility	Custom Facets	Custom predicates, additional fields
Governance	LF AI & Data Foundation	Open specification (in-toto aligned)
Reference impl	Marquez	Expanso

Streaming Data Support

Both frameworks support streaming data, but with different approaches:

OpenLineage Streaming

Events emitted per micro-batch or checkpoint
Same Job/Dataset/Run model as batch
Integrations for Spark Streaming, Flink
Metadata about streaming job state

Makoto Stream Windows

Dedicated stream-window/v1 predicate type
Merkle tree root for all records in window
Chain linking between consecutive windows
Cryptographic proof of window integrity
Supports millions of events/second

Makoto's streaming support is designed for cryptographic verification at scale. The Merkle tree approach allows individual record verification without storing all records, and chain linking enables detection of missing or modified windows.

When to Use Each

Use OpenLineage when...

Building a data catalog or discovery platform
Debugging pipeline failures and data quality issues
Performing impact analysis for schema changes
Tracking job performance and SLAs
All consumers are internal and trusted

Use Makoto when...

Sharing data with external partners who need verification
Meeting regulatory compliance (GDPR, CCPA, EU AI Act)
Documenting AI/ML training data provenance
Building data marketplaces with trust requirements
Needing tamper-evident audit trails

Use Both when...

You need operational visibility and compliance guarantees
Internal teams use OpenLineage for debugging; external data uses Makoto for trust
Building comprehensive data governance with different trust levels

Example: ML Training Data Pipeline

Use OpenLineage to track feature engineering jobs in Airflow, debug failures, and understand dataset dependencies. Use Makoto to create signed attestations for the final training dataset, proving its provenance for model governance and regulatory review.

Integration Possibilities

OpenLineage and Makoto can work together. OpenLineage events can trigger Makoto attestation generation, creating cryptographic proof from operational metadata:

# Conceptual integration flow
# 1. OpenLineage captures job completion
openlineage_event = {
    "eventType": "COMPLETE",
    "job": {"name": "anonymization_pipeline"},
    "outputs": [{"name": "anonymized_customers"}]
}

# 2. Generate Makoto attestation from OpenLineage metadata
makoto_attestation = {
    "_type": "https://in-toto.io/Statement/v1",
    "predicateType": "https://makoto.dev/transform/v1",
    "predicate": {
        "transform": {
            "name": openlineage_event["job"]["name"],
            # Add cryptographic hashes not in OpenLineage
            "codeRef": {"digest": {"sha256": "..."}}
        }
    }
}

# 3. Sign the attestation for L2 compliance
signed_attestation = sign_with_sigstore(makoto_attestation)

This approach lets organizations leverage existing OpenLineage integrations while adding cryptographic guarantees where needed.

Summary

Dimension	OpenLineage	Makoto
Primary purpose	Operational metadata	Cryptographic provenance
Trust model	Trust the infrastructure	Verify cryptographically
Core entities	Job, Dataset, Run	Origin, Transform, Stream Window
Data binding	Name-based references	Cryptographic hashes
Signing	Not built-in	Required at L2+
Best for	Data catalogs, debugging	Compliance, external trust
Output artifact	Lineage graph	DBOM (Data Bill of Materials)

Bottom line: OpenLineage is excellent for understanding your data pipelines internally. Makoto is essential when you need to prove data provenance to external parties or regulators.

Learn about Makoto Levels → | Compare with SLSA → | OpenLineage Documentation →