Which Standard Should I Use?
Multiple standards address provenance and lineage in different ways. This guide helps you choose the right standard—or combination of standards—for your use case. The good news: these standards are often complementary, not competing.
TL;DR: Use SLSA for software builds, Makoto for data integrity, OpenLineage for pipeline observability, and W3C PROV for semantic interoperability. Many organizations use multiple standards together.
Quick Decision Matrix
Find your primary need in the left column to identify the best starting point:
| Primary Need | Recommended Standard | Why |
|---|---|---|
| Prove software wasn't tampered with | SLSA | Purpose-built for software supply chain security |
| Prove data origin and transformations | Makoto | Cryptographic attestations for data integrity |
| Track pipeline runs and dependencies | OpenLineage | Runtime metadata for observability and debugging |
| Semantic provenance interchange | W3C PROV | Standard ontology for cross-system interoperability |
| ML model training data governance | Makoto + SLSA | Makoto for data, SLSA for model artifacts |
| Data catalog with lineage visibility | OpenLineage + Makoto | OpenLineage for discovery, Makoto for verification |
Standard Profiles
SLSA — Supply-chain Levels for Software Artifacts
Purpose: Secure the software build process with verifiable provenance.
Focus: Software artifacts (binaries, packages, containers)
Strengths
- Mature ecosystem (GitHub, GitLab support)
- Clear level progression (L1-L3)
- Strong tooling (Sigstore, cosign)
- Industry adoption (OpenSSF)
Limitations
- Software-focused (not data)
- Single build step model
- No streaming support
- No privacy considerations
Best for: CI/CD pipelines, container builds, package publishing, infrastructure-as-code
Makoto — Data Integrity Framework
Purpose: Cryptographic attestations for data origin, transformation, and lineage.
Focus: Data artifacts (datasets, streams, ETL outputs)
Strengths
- Multi-stage transform chains
- Streaming data support (Merkle windows)
- Privacy-preserving options
- DBOM output format
- in-toto compatible
Limitations
- Newer standard (less tooling)
- Requires implementation effort
- L3 requires platform support
Best for: ETL pipelines, ML training data, data marketplaces, regulatory compliance
OpenLineage — Open Standard for Data Lineage
Purpose: Runtime metadata collection for pipeline observability and lineage tracking.
Focus: Job/task execution metadata and dataset dependencies
Strengths
- Broad tool integration (Airflow, Spark, dbt)
- Rich runtime metadata
- Active community (LF AI & Data)
- Great for data catalogs
Limitations
- No cryptographic verification
- Observability focus (not security)
- No integrity guarantees
- Metadata only (no data hashing)
Best for: Data catalogs, pipeline debugging, impact analysis, data discovery
W3C PROV — Provenance Data Model
Purpose: Standard ontology for representing provenance information across systems.
Focus: Semantic interoperability and provenance interchange
Strengths
- W3C recommendation (stable)
- Rich semantic model
- Cross-domain applicability
- RDF/OWL support
Limitations
- Generic (not data-specific)
- No security levels
- Complex for simple use cases
- Limited modern tooling
Best for: Research data, cross-organization sharing, semantic web applications, archives
Feature Comparison
| Feature | SLSA | Makoto | OpenLineage | W3C PROV |
|---|---|---|---|---|
| Primary domain | Software | Data | Data | General |
| Cryptographic signing | Yes (L2+) | Yes (L2+) | No | No |
| Content hashing | Yes | Yes | No | Optional |
| Security levels | L1-L3 | L1-L3 | No | No |
| Streaming support | No | Yes (windows) | Limited | No |
| Multi-stage lineage | No | Yes | Yes | Yes |
| Privacy features | No | Yes | No | No |
| Runtime metadata | Basic | Basic | Rich | Basic |
| Tool integrations | Many | Growing | Many | Limited |
| Output format | SBOM | DBOM | Events | RDF/JSON |
| Attestation format | in-toto/DSSE | in-toto/DSSE | JSON | PROV-O/PROV-JSON |
Use Case Scenarios
Scenario 1: ML Training Pipeline
"I'm training ML models and need to prove the provenance of both my training data and model artifacts."
Recommendation: Use Makoto for training data attestations and SLSA for model artifact provenance.
- Makoto tracks data origin, transformations, and feature engineering
- SLSA tracks model training code and resulting artifacts
- Both use in-toto format, enabling unified verification policies
Scenario 2: Data Warehouse Governance
"I need to understand data lineage for impact analysis and also prove data hasn't been tampered with."
Recommendation: Use OpenLineage for lineage visibility and Makoto for integrity verification.
- OpenLineage integrates with dbt, Airflow, and data catalogs for discovery
- Makoto provides cryptographic proof of data integrity for high-value tables
- Use OpenLineage broadly, Makoto for regulated or sensitive data
Scenario 3: Cross-Organization Data Sharing
"I'm sharing data with external partners and need to provide verifiable provenance claims."
Recommendation: Use Makoto L2 with signed attestations. Consider W3C PROV if partners use semantic web systems.
- Makoto L2 provides signed, tamper-evident attestations
- Partners can independently verify data hasn't been modified
- DBOM documents complete chain of custody
Scenario 4: Real-Time Streaming Analytics
"I'm processing millions of events per second and need to attest to data integrity without impacting throughput."
Recommendation: Use Makoto stream-window attestations.
- Window-based Merkle trees enable high-throughput attestation
- Single signature per window (not per record)
- Hash chaining provides tamper-evident stream history
Scenario 5: Research Data Publication
"I'm publishing research datasets and need to document provenance for reproducibility."
Recommendation: Use W3C PROV for semantic richness, optionally with Makoto L1 for structured attestations.
- W3C PROV provides rich semantic descriptions understood by research tools
- PROV-O enables integration with research data repositories
- Makoto adds structured, machine-verifiable attestations
Using Multiple Standards Together
These standards are designed to complement each other. Here's how they fit together:
The Full Stack Approach
| Layer | Standard | Purpose |
|---|---|---|
| Observability | OpenLineage | Runtime lineage events for data catalogs and debugging |
| Data Integrity | Makoto | Cryptographic attestations for data provenance |
| Code Integrity | SLSA | Build provenance for pipeline code and tools |
| Interchange | W3C PROV | Semantic export for cross-system interoperability |
Integration Points
- Makoto + SLSA: Both use in-toto/DSSE attestation format—store together, verify together
- Makoto + OpenLineage: OpenLineage events can reference Makoto attestation URIs
- Makoto + W3C PROV: Makoto attestations can be exported to PROV-JSON for semantic queries
- OpenLineage + W3C PROV: OpenLineage lineage can be mapped to PROV entities/activities
Decision Flowchart
Answer these questions to find your starting point:
1. What type of artifact are you attesting?
- Software (binaries, packages, containers) → SLSA
- Data (datasets, streams, ETL outputs) → Continue to question 2
2. Do you need cryptographic integrity verification?
- Yes, I need to prove data wasn't tampered with → Makoto
- No, I just need lineage visibility for observability → OpenLineage
- Both → Makoto + OpenLineage
3. Do you need to share provenance across different systems/organizations?
- Yes, with semantic web or research systems → Consider W3C PROV export
- Yes, with modern data systems → Makoto/OpenLineage native formats work well
- No, internal use only → Use native format of your chosen standard
4. What security level do you need?
- Documentation only → Makoto L1 or OpenLineage
- Signed, tamper-evident records → Makoto L2
- Hardware-backed, unforgeable attestations → Makoto L3
Summary
| Standard | Use When You Need | Don't Use When |
|---|---|---|
| SLSA | Software build provenance, supply chain security | Data provenance, streaming, multi-stage transforms |
| Makoto | Data integrity, cryptographic verification, streaming, privacy | Software builds, basic lineage without security needs |
| OpenLineage | Pipeline observability, data catalogs, impact analysis | Cryptographic integrity, security requirements |
| W3C PROV | Semantic interoperability, research data, cross-system sharing | Simple use cases, when modern tooling is needed |
Bottom line: Start with the standard that matches your primary need. Add complementary standards as your requirements grow. Most organizations benefit from using 2-3 standards together for comprehensive provenance coverage.
Compare SLSA and Makoto in depth → | Learn about Makoto Levels →