W3C PROV vs Makoto

W3C PROV is a family of specifications for representing provenance information on the Web. While both PROV and Makoto address data provenance, they serve different purposes and operate at different abstraction levels.

Complementary standards: W3C PROV provides a semantic data model for describing provenance relationships. Makoto provides an operational framework with cryptographic attestations and security levels. Makoto's lineage graphs can be informed by PROV concepts.

Different Goals, Different Approaches

W3C PROV

Goal: Enable interoperable provenance interchange

  • Semantic data model (RDF/OWL)
  • Describes "what happened"
  • Focuses on interoperability
  • No security requirements
  • Human and machine readable

Makoto

Goal: Cryptographically verifiable data integrity

  • Attestation framework (in-toto/DSSE)
  • Proves "this is trustworthy"
  • Focuses on verification
  • Progressive security levels
  • Machine verifiable signatures

Think of it this way: PROV tells you "Entity X was generated by Activity Y, which was attributed to Agent Z." Makoto tells you "Here's cryptographic proof that this data came from where it claims, hasn't been tampered with, and meets security level L2."

Core Concept Mapping

W3C PROV defines three core concepts. Here's how they relate to Makoto's model:

PROV Concept Definition Makoto Equivalent
Entity A physical, digital, or conceptual thing Subject - The data artifact being attested (with digest)
Activity Something that occurs over time and acts upon entities Transform - A data processing operation with attestation
Agent Something that bears responsibility for an activity Collector/Executor - The system performing the operation

PROV Relations in Makoto

PROV Relation Meaning Makoto Implementation
wasGeneratedBy Entity was created by Activity Transform attestation's subject field
used Activity consumed Entity Transform attestation's inputs array
wasAttributedTo Entity is attributed to Agent Origin attestation's collector field
wasDerivedFrom Entity derived from another Entity Hash-chained attestations via attestationRef
wasAssociatedWith Activity was associated with Agent Transform attestation's executor field

Data Model Comparison

PROV and Makoto represent provenance information very differently:

PROV-N Notation

// PROV expresses relationships semantically
entity(dataset:transactions_2025)
entity(dataset:transactions_anonymized)

activity(transform:anonymize,
  2025-12-20T10:05:00,
  2025-12-20T10:15:00)

agent(expanso:pipeline-prod-1)

wasGeneratedBy(
  dataset:transactions_anonymized,
  transform:anonymize)

used(transform:anonymize,
  dataset:transactions_2025)

wasAssociatedWith(
  transform:anonymize,
  expanso:pipeline-prod-1)

Makoto Attestation

// Makoto provides signed, verifiable proof
{
  "_type": "in-toto.io/Statement/v1",
  "predicateType": "makoto.dev/transform/v1",
  "subject": [{
    "digest": {"sha256": "xyz..."}
  }],
  "predicate": {
    "inputs": [{
      "digest": {"sha256": "abc..."}
    }],
    "executor": {
      "id": "expanso.io/pipelines/prod-1"
    }
  }
}
// + DSSE signature envelope

Key Differences

Aspect W3C PROV Makoto
Serialization RDF, PROV-N, PROV-JSON, PROV-XML JSON with DSSE signature envelope
Identity URIs (semantic web style) Cryptographic digests (sha256, merkle roots)
Integrity None built-in Hash binding + digital signatures
Validation PROV-CONSTRAINTS (logical) Signature verification + hash verification
Querying SPARQL (semantic queries) Attestation chain traversal

What Each Does Best

W3C PROV Excels At

  • Rich semantics - Express complex provenance relationships
  • Interoperability - Standard vocabularies, multiple serializations
  • Querying - SPARQL enables complex provenance queries
  • Human readability - PROV-N is designed to be readable
  • Extensibility - Easy to add domain-specific terms via OWL

Makoto Excels At

  • Verification - Cryptographic proof of integrity
  • Trust levels - Progressive security requirements (L1-L3)
  • Streaming - Window-based attestation for high-throughput
  • Tamper evidence - Hash chaining detects modifications
  • Automation - Machine-verifiable without human interpretation

Use Case Guidance

Use Case Better Fit Why
Scientific reproducibility W3C PROV Rich description of methodology and dependencies
Supply chain security Makoto Cryptographic verification, tamper detection
Regulatory compliance (GDPR) Both PROV for documentation, Makoto for audit proof
ML training data lineage Makoto + PROV Makoto for integrity, PROV for rich metadata
Real-time streaming Makoto Window-based attestation handles scale
Provenance visualization W3C PROV Semantic structure maps well to graphs
Zero-trust environments Makoto Signature verification doesn't require trust

Using PROV and Makoto Together

These standards are complementary. A sophisticated data governance system might use both:

Hybrid Architecture Pattern

Makoto Layer (Verification)

  • Cryptographic attestations at each pipeline stage
  • Hash-chained lineage for tamper detection
  • Security level enforcement (L1/L2/L3)
  • Automated verification at data consumption

PROV Layer (Documentation)

  • Rich semantic provenance metadata
  • SPARQL-queryable provenance store
  • Human-readable lineage documentation
  • Integration with knowledge graphs

Integration Example

A Makoto attestation could embed PROV concepts or reference a PROV document:

{
  "_type": "https://in-toto.io/Statement/v1",
  "predicateType": "https://makoto.dev/transform/v1",
  "subject": [{
    "name": "dataset:customer_analysis_v3",
    "digest": {"sha256": "abc123..."}
  }],
  "predicate": {
    "transform": {
      "type": "aggregation",
      "name": "Customer Behavior Analysis"
    },
    // Reference to detailed PROV document
    "provenance": {
      "type": "prov:Bundle",
      "location": "https://prov.example.com/bundles/abc123",
      "digest": {"sha256": "def456..."}
    },
    "metadata": {
      // PROV-inspired fields
      "prov:wasAssociatedWith": "https://expanso.io/agents/etl-prod",
      "prov:generatedAtTime": "2025-12-20T10:15:00Z"
    }
  }
}

W3C PROV Family Overview

The PROV family includes 12 specifications. Here's how they relate to Makoto's concerns:

PROV Spec Purpose Makoto Relevance
PROV-DM Core data model Conceptual foundation for lineage graphs
PROV-O OWL2 ontology Could extend for semantic Makoto queries
PROV-N Human-readable notation Useful for documentation, not verification
PROV-CONSTRAINTS Validity rules Similar goal to Makoto schema validation
PROV-AQ Access and query Could inform attestation registry APIs
PROV-LINKS Bundle connections Similar to attestation references/chaining

Summary Comparison

Aspect W3C PROV Makoto
Primary purpose Interoperable provenance representation Verifiable data integrity
Data model Entity-Activity-Agent (semantic) Attestation statements (cryptographic)
Serialization RDF, JSON, XML, PROV-N JSON + DSSE signature
Security None built-in Digital signatures, security levels
Verification Logical constraints (PROV-CONSTRAINTS) Cryptographic signature + hash verification
Streaming support Not designed for streaming Window-based attestation
Trust model Implicit (trust the source) Explicit (verify signatures)
Standards body W3C (Recommendation since 2013) Building on in-toto (OpenSSF)

Bottom Line

Use W3C PROV when you need rich semantic provenance documentation, interoperability with semantic web systems, or human-readable provenance descriptions. Use Makoto when you need cryptographically verifiable data integrity, tamper detection, or security level compliance. Use both when you need comprehensive data governance with both documentation and verification.

SLSA Comparison → | W3C PROV Specification →