W3C PROV vs Makoto
W3C PROV is a family of specifications for representing provenance information on the Web. While both PROV and Makoto address data provenance, they serve different purposes and operate at different abstraction levels.
Complementary standards: W3C PROV provides a semantic data model for describing provenance relationships. Makoto provides an operational framework with cryptographic attestations and security levels. Makoto's lineage graphs can be informed by PROV concepts.
Different Goals, Different Approaches
W3C PROV
Goal: Enable interoperable provenance interchange
- Semantic data model (RDF/OWL)
- Describes "what happened"
- Focuses on interoperability
- No security requirements
- Human and machine readable
Makoto
Goal: Cryptographically verifiable data integrity
- Attestation framework (in-toto/DSSE)
- Proves "this is trustworthy"
- Focuses on verification
- Progressive security levels
- Machine verifiable signatures
Think of it this way: PROV tells you "Entity X was generated by Activity Y, which was attributed to Agent Z." Makoto tells you "Here's cryptographic proof that this data came from where it claims, hasn't been tampered with, and meets security level L2."
Core Concept Mapping
W3C PROV defines three core concepts. Here's how they relate to Makoto's model:
| PROV Concept | Definition | Makoto Equivalent |
|---|---|---|
| Entity | A physical, digital, or conceptual thing | Subject - The data artifact being attested (with digest) |
| Activity | Something that occurs over time and acts upon entities | Transform - A data processing operation with attestation |
| Agent | Something that bears responsibility for an activity | Collector/Executor - The system performing the operation |
PROV Relations in Makoto
| PROV Relation | Meaning | Makoto Implementation |
|---|---|---|
wasGeneratedBy |
Entity was created by Activity | Transform attestation's subject field |
used |
Activity consumed Entity | Transform attestation's inputs array |
wasAttributedTo |
Entity is attributed to Agent | Origin attestation's collector field |
wasDerivedFrom |
Entity derived from another Entity | Hash-chained attestations via attestationRef |
wasAssociatedWith |
Activity was associated with Agent | Transform attestation's executor field |
Data Model Comparison
PROV and Makoto represent provenance information very differently:
PROV-N Notation
// PROV expresses relationships semantically entity(dataset:transactions_2025) entity(dataset:transactions_anonymized) activity(transform:anonymize, 2025-12-20T10:05:00, 2025-12-20T10:15:00) agent(expanso:pipeline-prod-1) wasGeneratedBy( dataset:transactions_anonymized, transform:anonymize) used(transform:anonymize, dataset:transactions_2025) wasAssociatedWith( transform:anonymize, expanso:pipeline-prod-1)
Makoto Attestation
// Makoto provides signed, verifiable proof { "_type": "in-toto.io/Statement/v1", "predicateType": "makoto.dev/transform/v1", "subject": [{ "digest": {"sha256": "xyz..."} }], "predicate": { "inputs": [{ "digest": {"sha256": "abc..."} }], "executor": { "id": "expanso.io/pipelines/prod-1" } } } // + DSSE signature envelope
Key Differences
| Aspect | W3C PROV | Makoto |
|---|---|---|
| Serialization | RDF, PROV-N, PROV-JSON, PROV-XML | JSON with DSSE signature envelope |
| Identity | URIs (semantic web style) | Cryptographic digests (sha256, merkle roots) |
| Integrity | None built-in | Hash binding + digital signatures |
| Validation | PROV-CONSTRAINTS (logical) | Signature verification + hash verification |
| Querying | SPARQL (semantic queries) | Attestation chain traversal |
What Each Does Best
W3C PROV Excels At
- Rich semantics - Express complex provenance relationships
- Interoperability - Standard vocabularies, multiple serializations
- Querying - SPARQL enables complex provenance queries
- Human readability - PROV-N is designed to be readable
- Extensibility - Easy to add domain-specific terms via OWL
Makoto Excels At
- Verification - Cryptographic proof of integrity
- Trust levels - Progressive security requirements (L1-L3)
- Streaming - Window-based attestation for high-throughput
- Tamper evidence - Hash chaining detects modifications
- Automation - Machine-verifiable without human interpretation
Use Case Guidance
| Use Case | Better Fit | Why |
|---|---|---|
| Scientific reproducibility | W3C PROV | Rich description of methodology and dependencies |
| Supply chain security | Makoto | Cryptographic verification, tamper detection |
| Regulatory compliance (GDPR) | Both | PROV for documentation, Makoto for audit proof |
| ML training data lineage | Makoto + PROV | Makoto for integrity, PROV for rich metadata |
| Real-time streaming | Makoto | Window-based attestation handles scale |
| Provenance visualization | W3C PROV | Semantic structure maps well to graphs |
| Zero-trust environments | Makoto | Signature verification doesn't require trust |
Using PROV and Makoto Together
These standards are complementary. A sophisticated data governance system might use both:
Hybrid Architecture Pattern
Makoto Layer (Verification)
- Cryptographic attestations at each pipeline stage
- Hash-chained lineage for tamper detection
- Security level enforcement (L1/L2/L3)
- Automated verification at data consumption
PROV Layer (Documentation)
- Rich semantic provenance metadata
- SPARQL-queryable provenance store
- Human-readable lineage documentation
- Integration with knowledge graphs
Integration Example
A Makoto attestation could embed PROV concepts or reference a PROV document:
{
"_type": "https://in-toto.io/Statement/v1",
"predicateType": "https://makoto.dev/transform/v1",
"subject": [{
"name": "dataset:customer_analysis_v3",
"digest": {"sha256": "abc123..."}
}],
"predicate": {
"transform": {
"type": "aggregation",
"name": "Customer Behavior Analysis"
},
// Reference to detailed PROV document
"provenance": {
"type": "prov:Bundle",
"location": "https://prov.example.com/bundles/abc123",
"digest": {"sha256": "def456..."}
},
"metadata": {
// PROV-inspired fields
"prov:wasAssociatedWith": "https://expanso.io/agents/etl-prod",
"prov:generatedAtTime": "2025-12-20T10:15:00Z"
}
}
}
W3C PROV Family Overview
The PROV family includes 12 specifications. Here's how they relate to Makoto's concerns:
| PROV Spec | Purpose | Makoto Relevance |
|---|---|---|
| PROV-DM | Core data model | Conceptual foundation for lineage graphs |
| PROV-O | OWL2 ontology | Could extend for semantic Makoto queries |
| PROV-N | Human-readable notation | Useful for documentation, not verification |
| PROV-CONSTRAINTS | Validity rules | Similar goal to Makoto schema validation |
| PROV-AQ | Access and query | Could inform attestation registry APIs |
| PROV-LINKS | Bundle connections | Similar to attestation references/chaining |
Summary Comparison
| Aspect | W3C PROV | Makoto |
|---|---|---|
| Primary purpose | Interoperable provenance representation | Verifiable data integrity |
| Data model | Entity-Activity-Agent (semantic) | Attestation statements (cryptographic) |
| Serialization | RDF, JSON, XML, PROV-N | JSON + DSSE signature |
| Security | None built-in | Digital signatures, security levels |
| Verification | Logical constraints (PROV-CONSTRAINTS) | Cryptographic signature + hash verification |
| Streaming support | Not designed for streaming | Window-based attestation |
| Trust model | Implicit (trust the source) | Explicit (verify signatures) |
| Standards body | W3C (Recommendation since 2013) | Building on in-toto (OpenSSF) |
Bottom Line
Use W3C PROV when you need rich semantic provenance documentation, interoperability with semantic web systems, or human-readable provenance descriptions. Use Makoto when you need cryptographically verifiable data integrity, tamper detection, or security level compliance. Use both when you need comprehensive data governance with both documentation and verification.