Which Standard Should I Use?

Multiple standards address provenance and lineage in different ways. This guide helps you choose the right standard—or combination of standards—for your use case. The good news: these standards are often complementary, not competing.

TL;DR: Use SLSA for software builds, Makoto for data integrity, OpenLineage for pipeline observability, and W3C PROV for semantic interoperability. Many organizations use multiple standards together.

Quick Decision Matrix

Find your primary need in the left column to identify the best starting point:

Primary Need Recommended Standard Why
Prove software wasn't tampered with SLSA Purpose-built for software supply chain security
Prove data origin and transformations Makoto Cryptographic attestations for data integrity
Track pipeline runs and dependencies OpenLineage Runtime metadata for observability and debugging
Semantic provenance interchange W3C PROV Standard ontology for cross-system interoperability
ML model training data governance Makoto + SLSA Makoto for data, SLSA for model artifacts
Data catalog with lineage visibility OpenLineage + Makoto OpenLineage for discovery, Makoto for verification

Standard Profiles

SLSA — Supply-chain Levels for Software Artifacts

Purpose: Secure the software build process with verifiable provenance.

Focus: Software artifacts (binaries, packages, containers)

Strengths

  • Mature ecosystem (GitHub, GitLab support)
  • Clear level progression (L1-L3)
  • Strong tooling (Sigstore, cosign)
  • Industry adoption (OpenSSF)

Limitations

  • Software-focused (not data)
  • Single build step model
  • No streaming support
  • No privacy considerations

Best for: CI/CD pipelines, container builds, package publishing, infrastructure-as-code

Makoto — Data Integrity Framework

Purpose: Cryptographic attestations for data origin, transformation, and lineage.

Focus: Data artifacts (datasets, streams, ETL outputs)

Strengths

  • Multi-stage transform chains
  • Streaming data support (Merkle windows)
  • Privacy-preserving options
  • DBOM output format
  • in-toto compatible

Limitations

  • Newer standard (less tooling)
  • Requires implementation effort
  • L3 requires platform support

Best for: ETL pipelines, ML training data, data marketplaces, regulatory compliance

OpenLineage — Open Standard for Data Lineage

Purpose: Runtime metadata collection for pipeline observability and lineage tracking.

Focus: Job/task execution metadata and dataset dependencies

Strengths

  • Broad tool integration (Airflow, Spark, dbt)
  • Rich runtime metadata
  • Active community (LF AI & Data)
  • Great for data catalogs

Limitations

  • No cryptographic verification
  • Observability focus (not security)
  • No integrity guarantees
  • Metadata only (no data hashing)

Best for: Data catalogs, pipeline debugging, impact analysis, data discovery

W3C PROV — Provenance Data Model

Purpose: Standard ontology for representing provenance information across systems.

Focus: Semantic interoperability and provenance interchange

Strengths

  • W3C recommendation (stable)
  • Rich semantic model
  • Cross-domain applicability
  • RDF/OWL support

Limitations

  • Generic (not data-specific)
  • No security levels
  • Complex for simple use cases
  • Limited modern tooling

Best for: Research data, cross-organization sharing, semantic web applications, archives

Feature Comparison

Feature SLSA Makoto OpenLineage W3C PROV
Primary domain Software Data Data General
Cryptographic signing Yes (L2+) Yes (L2+) No No
Content hashing Yes Yes No Optional
Security levels L1-L3 L1-L3 No No
Streaming support No Yes (windows) Limited No
Multi-stage lineage No Yes Yes Yes
Privacy features No Yes No No
Runtime metadata Basic Basic Rich Basic
Tool integrations Many Growing Many Limited
Output format SBOM DBOM Events RDF/JSON
Attestation format in-toto/DSSE in-toto/DSSE JSON PROV-O/PROV-JSON

Use Case Scenarios

Scenario 1: ML Training Pipeline

"I'm training ML models and need to prove the provenance of both my training data and model artifacts."

Recommendation: Use Makoto for training data attestations and SLSA for model artifact provenance.

  • Makoto tracks data origin, transformations, and feature engineering
  • SLSA tracks model training code and resulting artifacts
  • Both use in-toto format, enabling unified verification policies

Scenario 2: Data Warehouse Governance

"I need to understand data lineage for impact analysis and also prove data hasn't been tampered with."

Recommendation: Use OpenLineage for lineage visibility and Makoto for integrity verification.

  • OpenLineage integrates with dbt, Airflow, and data catalogs for discovery
  • Makoto provides cryptographic proof of data integrity for high-value tables
  • Use OpenLineage broadly, Makoto for regulated or sensitive data

Scenario 3: Cross-Organization Data Sharing

"I'm sharing data with external partners and need to provide verifiable provenance claims."

Recommendation: Use Makoto L2 with signed attestations. Consider W3C PROV if partners use semantic web systems.

  • Makoto L2 provides signed, tamper-evident attestations
  • Partners can independently verify data hasn't been modified
  • DBOM documents complete chain of custody

Scenario 4: Real-Time Streaming Analytics

"I'm processing millions of events per second and need to attest to data integrity without impacting throughput."

Recommendation: Use Makoto stream-window attestations.

  • Window-based Merkle trees enable high-throughput attestation
  • Single signature per window (not per record)
  • Hash chaining provides tamper-evident stream history

Scenario 5: Research Data Publication

"I'm publishing research datasets and need to document provenance for reproducibility."

Recommendation: Use W3C PROV for semantic richness, optionally with Makoto L1 for structured attestations.

  • W3C PROV provides rich semantic descriptions understood by research tools
  • PROV-O enables integration with research data repositories
  • Makoto adds structured, machine-verifiable attestations

Using Multiple Standards Together

These standards are designed to complement each other. Here's how they fit together:

The Full Stack Approach

Layer Standard Purpose
Observability OpenLineage Runtime lineage events for data catalogs and debugging
Data Integrity Makoto Cryptographic attestations for data provenance
Code Integrity SLSA Build provenance for pipeline code and tools
Interchange W3C PROV Semantic export for cross-system interoperability

Integration Points

Decision Flowchart

Answer these questions to find your starting point:

1. What type of artifact are you attesting?

  • Software (binaries, packages, containers) → SLSA
  • Data (datasets, streams, ETL outputs) → Continue to question 2

2. Do you need cryptographic integrity verification?

  • Yes, I need to prove data wasn't tampered with → Makoto
  • No, I just need lineage visibility for observability → OpenLineage
  • Both → Makoto + OpenLineage

3. Do you need to share provenance across different systems/organizations?

  • Yes, with semantic web or research systems → Consider W3C PROV export
  • Yes, with modern data systems → Makoto/OpenLineage native formats work well
  • No, internal use only → Use native format of your chosen standard

4. What security level do you need?

  • Documentation only → Makoto L1 or OpenLineage
  • Signed, tamper-evident records → Makoto L2
  • Hardware-backed, unforgeable attestations → Makoto L3

Summary

Standard Use When You Need Don't Use When
SLSA Software build provenance, supply chain security Data provenance, streaming, multi-stage transforms
Makoto Data integrity, cryptographic verification, streaming, privacy Software builds, basic lineage without security needs
OpenLineage Pipeline observability, data catalogs, impact analysis Cryptographic integrity, security requirements
W3C PROV Semantic interoperability, research data, cross-system sharing Simple use cases, when modern tooling is needed

Bottom line: Start with the standard that matches your primary need. Add complementary standards as your requirements grow. Most organizations benefit from using 2-3 standards together for comprehensive provenance coverage.

Compare SLSA and Makoto in depth → | Learn about Makoto Levels →