OpenLineage vs Makoto
OpenLineage and Makoto address different aspects of data pipeline management. OpenLineage provides observational metadata for tracking job executions and dataset dependencies. Makoto provides cryptographic attestations that prove data provenance with verifiable guarantees.
Complementary tools: OpenLineage tells you what happened in your pipelines. Makoto proves what happened with cryptographic evidence. Many organizations will benefit from using both: OpenLineage for operational visibility, Makoto for compliance and trust.
Different Problems, Different Solutions
OpenLineage
Metadata collection for operational visibility
- Track job executions across platforms
- Discover dataset dependencies automatically
- Debug pipeline failures
- Power data catalogs and discovery tools
- Enable impact analysis for schema changes
Makoto
Cryptographic proof for data integrity
- Prove data origin with signed attestations
- Verify transformation chain integrity
- Meet regulatory compliance requirements
- Enable trustless data sharing
- Document AI training data provenance
Concept Mapping
| OpenLineage | Makoto | Key Difference |
|---|---|---|
| Job | Transform | Makoto transforms include cryptographic hashes of code and inputs |
| Dataset | Subject | Makoto subjects have cryptographic digests, not just identifiers |
| Run | Attestation | Makoto attestations are signed; OpenLineage runs are metadata events |
| Facets | Predicate fields | Both support extensibility; Makoto predicates follow in-toto format |
| Input/Output Datasets | inputs[] / subject[] | Makoto inputs include attestation references for chain verification |
| — | Origin attestation | Makoto has explicit origin type; OpenLineage infers from first dataset |
| — | Stream Window | Makoto has dedicated streaming support with Merkle trees |
| — | DBOM | Makoto provides complete lineage documentation format |
Trust Model Comparison
The fundamental difference lies in the trust model. OpenLineage assumes you trust your infrastructure to report accurate metadata. Makoto provides cryptographic proof that can be verified independently.
| Aspect | OpenLineage | Makoto |
|---|---|---|
| Trust basis | Trust the emitting system | Cryptographic verification |
| Signing | Not built-in | Required at L2+ |
| Tamper detection | Not provided | Signature verification |
| Data binding | Reference by name | Cryptographic hash binding |
| Chain verification | Graph traversal | Hash chain verification |
| Non-repudiation | No | Yes (at L2+) |
When trust matters: If you're sharing data externally, need regulatory compliance, or training AI models where provenance affects liability—cryptographic verification isn't optional. OpenLineage metadata can be falsified; Makoto attestations cannot (at L2+).
Technical Comparison
Event Structure
OpenLineage RunEvent
{
"eventType": "COMPLETE",
"eventTime": "2025-01-15T10:30:00Z",
"run": {
"runId": "uuid-here"
},
"job": {
"namespace": "airflow",
"name": "etl_pipeline"
},
"inputs": [...],
"outputs": [...],
"producer": "https://airflow.example.com"
}
Makoto Transform Attestation
{
"_type": "https://in-toto.io/Statement/v1",
"subject": [{
"digest": {"sha256": "abc..."}
}],
"predicateType": "https://makoto.dev/transform/v1",
"predicate": {
"inputs": [{
"digest": {"sha256": "def..."}
}],
"transform": {...}
}
}
Specification Foundations
| Aspect | OpenLineage | Makoto |
|---|---|---|
| Schema format | OpenAPI + JSON Schema | in-toto Statement + JSON Schema |
| Transport | HTTP API, Kafka, custom | Any (attestations are documents) |
| Extensibility | Custom Facets | Custom predicates, additional fields |
| Governance | LF AI & Data Foundation | Open specification (in-toto aligned) |
| Reference impl | Marquez | Expanso |
Streaming Data Support
Both frameworks support streaming data, but with different approaches:
OpenLineage Streaming
- Events emitted per micro-batch or checkpoint
- Same Job/Dataset/Run model as batch
- Integrations for Spark Streaming, Flink
- Metadata about streaming job state
Makoto Stream Windows
- Dedicated
stream-window/v1predicate type - Merkle tree root for all records in window
- Chain linking between consecutive windows
- Cryptographic proof of window integrity
- Supports millions of events/second
Makoto's streaming support is designed for cryptographic verification at scale. The Merkle tree approach allows individual record verification without storing all records, and chain linking enables detection of missing or modified windows.
When to Use Each
Use OpenLineage when...
- Building a data catalog or discovery platform
- Debugging pipeline failures and data quality issues
- Performing impact analysis for schema changes
- Tracking job performance and SLAs
- All consumers are internal and trusted
Use Makoto when...
- Sharing data with external partners who need verification
- Meeting regulatory compliance (GDPR, CCPA, EU AI Act)
- Documenting AI/ML training data provenance
- Building data marketplaces with trust requirements
- Needing tamper-evident audit trails
Use Both when...
- You need operational visibility and compliance guarantees
- Internal teams use OpenLineage for debugging; external data uses Makoto for trust
- Building comprehensive data governance with different trust levels
Example: ML Training Data Pipeline
Use OpenLineage to track feature engineering jobs in Airflow, debug failures, and understand dataset dependencies. Use Makoto to create signed attestations for the final training dataset, proving its provenance for model governance and regulatory review.
Integration Possibilities
OpenLineage and Makoto can work together. OpenLineage events can trigger Makoto attestation generation, creating cryptographic proof from operational metadata:
# Conceptual integration flow # 1. OpenLineage captures job completion openlineage_event = { "eventType": "COMPLETE", "job": {"name": "anonymization_pipeline"}, "outputs": [{"name": "anonymized_customers"}] } # 2. Generate Makoto attestation from OpenLineage metadata makoto_attestation = { "_type": "https://in-toto.io/Statement/v1", "predicateType": "https://makoto.dev/transform/v1", "predicate": { "transform": { "name": openlineage_event["job"]["name"], # Add cryptographic hashes not in OpenLineage "codeRef": {"digest": {"sha256": "..."}} } } } # 3. Sign the attestation for L2 compliance signed_attestation = sign_with_sigstore(makoto_attestation)
This approach lets organizations leverage existing OpenLineage integrations while adding cryptographic guarantees where needed.
Summary
| Dimension | OpenLineage | Makoto |
|---|---|---|
| Primary purpose | Operational metadata | Cryptographic provenance |
| Trust model | Trust the infrastructure | Verify cryptographically |
| Core entities | Job, Dataset, Run | Origin, Transform, Stream Window |
| Data binding | Name-based references | Cryptographic hashes |
| Signing | Not built-in | Required at L2+ |
| Best for | Data catalogs, debugging | Compliance, external trust |
| Output artifact | Lineage graph | DBOM (Data Bill of Materials) |
Bottom line: OpenLineage is excellent for understanding your data pipelines internally. Makoto is essential when you need to prove data provenance to external parties or regulators.
Learn about Makoto Levels → | Compare with SLSA → | OpenLineage Documentation →