Level
L1

Level 1: Provenance Exists

This document specifies the formal requirements for achieving Makoto Level 1 (L1). L1 establishes the foundation for data provenance by requiring machine-readable documentation of data origin and processing. At this level, attestations may be self-reported—the goal is ensuring provenance information exists in a structured, parseable format.

"The data pipeline produces attestations that document data origin and processing."

Key Principle: L1 is intentionally achievable with minimal tooling. Any organization can reach L1 by generating structured JSON attestations—no cryptographic infrastructure required. This enables progressive adoption where teams start documenting provenance immediately.

Requirements Summary

Category Requirement Description
Producer Attestation Generation Generate machine-readable attestation for each data artifact
Origin Documentation Document data source with sufficient detail for identification
Transform Documentation Document processing steps applied to the data
Platform Format Support Support generation and storage of Makoto attestation format
Attestation Delivery Deliver attestations alongside or linked to data artifacts
Verification Schema Validation Validate attestation conforms to Makoto schema
Completeness Check Verify required fields are present and non-empty

Producer Requirements

Data producers are entities that collect, generate, or transform data. At L1, producers must generate attestations documenting their data artifacts.

P1: Attestation Generation

Requirement: The producer MUST generate a machine-readable attestation for each data artifact that claims L1 compliance.

AspectSpecification
Format JSON document conforming to in-toto Statement v1
Predicate Type One of: https://makoto.dev/origin/v1, https://makoto.dev/transform/v1, or https://makoto.dev/stream-window/v1
Encoding UTF-8 encoded JSON
Timing Attestation SHOULD be generated at the time of data production or as close as practical

P2: Origin Documentation

Requirement: For data collection/ingestion, the producer MUST document the data source using the origin/v1 predicate type.

FieldRequiredDescription
origin.source REQUIRED URI or identifier of the data source (API endpoint, database, file path, etc.)
origin.sourceType RECOMMENDED Type of source: api, database, file, stream, manual
origin.collectionTimestamp REQUIRED ISO 8601 timestamp when data was collected
origin.collectionMethod RECOMMENDED How data was obtained: pull, push, export, manual
collector.id REQUIRED Identifier for the collecting system or process

Example Origin Attestation

{
  "_type": "https://in-toto.io/Statement/v1",
  "subject": [{
    "name": "dataset:customer_orders_2025q1",
    "digest": {
      "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
    }
  }],
  "predicateType": "https://makoto.dev/origin/v1",
  "predicate": {
    "origin": {
      "source": "https://erp.example.com/api/v2/orders",
      "sourceType": "api",
      "collectionTimestamp": "2025-01-15T08:00:00Z",
      "collectionMethod": "pull"
    },
    "collector": {
      "id": "data-platform/ingestion-service-v3"
    }
  }
}

P3: Transform Documentation

Requirement: For data transformations (ETL, filtering, aggregation, etc.), the producer MUST document each processing step using the transform/v1 predicate type.

FieldRequiredDescription
inputs REQUIRED Array of input data artifacts with names and digests
transform.type REQUIRED URI identifying the type of transformation
transform.name RECOMMENDED Human-readable name for the transformation
transform.parameters RECOMMENDED Configuration parameters used (sanitized of secrets)
executor.id REQUIRED Identifier for the system executing the transform
metadata.startedOn RECOMMENDED ISO 8601 timestamp when processing started
metadata.finishedOn RECOMMENDED ISO 8601 timestamp when processing completed

Example Transform Attestation

{
  "_type": "https://in-toto.io/Statement/v1",
  "subject": [{
    "name": "dataset:customer_orders_anonymized",
    "digest": {
      "sha256": "a1b2c3d4e5f6..."
    }
  }],
  "predicateType": "https://makoto.dev/transform/v1",
  "predicate": {
    "inputs": [{
      "name": "dataset:customer_orders_2025q1",
      "digest": { "sha256": "e3b0c44298fc1c149..." }
    }],
    "transform": {
      "type": "https://makoto.dev/transforms/anonymization",
      "name": "PII Removal",
      "parameters": {
        "fields_removed": ["email", "phone"],
        "fields_hashed": ["customer_id"]
      }
    },
    "executor": {
      "id": "data-platform/etl-pipeline-v2"
    },
    "metadata": {
      "startedOn": "2025-01-15T09:00:00Z",
      "finishedOn": "2025-01-15T09:15:00Z"
    }
  }
}

P4: Subject Binding

Requirement: Each attestation MUST identify the data artifact it describes via the subject field.

FieldRequiredDescription
subject[].name REQUIRED Identifier for the data artifact (dataset name, file path, stream ID)
subject[].digest.sha256 REQUIRED SHA-256 hash of the data artifact content
subject[].digest.recordCount RECOMMENDED Number of records in the dataset (for tabular data)

Note: The digest provides content binding—if data changes, the digest changes. This allows verification that attestations match their referenced data.

Platform Requirements

Platforms are systems that host or execute data processing (ETL tools, data warehouses, streaming platforms, etc.). At L1, platforms must support attestation generation and delivery.

PL1: Attestation Format Support

Requirement: The platform MUST support generation of attestations in the Makoto format.

  • Generate valid JSON conforming to in-toto Statement v1 schema
  • Support all three Makoto predicate types (origin, transform, stream-window)
  • Compute SHA-256 digests of data artifacts
  • Generate ISO 8601 timestamps

Implementation Note: Platforms may implement attestation generation natively, via plugins, or through external tooling. The mechanism is not specified—only the output format.

PL2: Attestation Storage

Requirement: The platform MUST provide a mechanism to store attestations persistently.

OptionDescriptionExample
Sidecar File Store attestation alongside data with predictable naming data.jsondata.json.attestation
Attestation Registry Centralized service for attestation storage and retrieval REST API at /attestations/{digest}
Metadata Store Store attestations in data catalog or metadata system DataHub, Amundsen, or similar
Embedded Include attestation within data package ZIP archive with ATTESTATION.json

PL3: Attestation Delivery

Requirement: The platform MUST provide a mechanism for consumers to retrieve attestations for data they receive.

  • Attestations SHOULD be discoverable from the data artifact (via naming convention, metadata, or registry lookup)
  • Attestation retrieval SHOULD NOT require special permissions beyond data access
  • The platform SHOULD document the attestation discovery mechanism

Verification Requirements

Verification describes what consumers can check to confirm L1 compliance. At L1, verification focuses on attestation presence and format—not cryptographic authenticity.

V1: Attestation Presence

Check: Verify that a machine-readable attestation exists for the data artifact.

Locate Attestation
Parse JSON
Validate Schema
  • PASS: Attestation file exists and is valid JSON
  • FAIL: No attestation found, or attestation is not valid JSON

V2: Schema Compliance

Check: Verify that the attestation conforms to the Makoto schema.

FieldValidation
_type MUST equal "https://in-toto.io/Statement/v1"
subject MUST be non-empty array with at least one entry
subject[].name MUST be non-empty string
subject[].digest.sha256 MUST be valid SHA-256 hex string (64 characters)
predicateType MUST be valid Makoto predicate type URL
predicate MUST be object conforming to predicate type schema

JSON Schema files for validation are available in the schemas directory.

V3: Subject Binding Verification

Check: Verify that the attestation subject matches the actual data.

  1. Compute SHA-256 hash of the data artifact
  2. Compare computed hash to subject[].digest.sha256 in attestation
  3. PASS: Hashes match
  4. FAIL: Hashes do not match (data may have been modified)
# Example verification (bash)
EXPECTED=$(jq -r '.subject[0].digest.sha256' attestation.json)
ACTUAL=$(sha256sum data.json | cut -d' ' -f1)

if [ "$EXPECTED" = "$ACTUAL" ]; then
  echo "PASS: Data matches attestation"
else
  echo "FAIL: Data hash mismatch"
fi

V4: Required Field Completeness

Check: Verify that all required fields for the predicate type are present and non-empty.

For origin/v1:

  • predicate.origin.source — non-empty string
  • predicate.origin.collectionTimestamp — valid ISO 8601 timestamp
  • predicate.collector.id — non-empty string

For transform/v1:

  • predicate.inputs — non-empty array
  • predicate.transform.type — non-empty string (URI)
  • predicate.executor.id — non-empty string

Threat Mitigations

L1 provides partial mitigation against the following data supply chain threats. For full mitigation, see L2 and L3 requirements.

Threat Description L1 Mitigation
D1 Source Falsification — Claiming data came from an authoritative source when it didn't Partial Origin is documented, but claims are self-attested (not cryptographically verified)
D3 Transform Opacity — Hiding what transformations were applied to data Partial Transforms are documented, enabling audit trails and lineage visibility

Important: L1 does NOT mitigate threats requiring cryptographic verification: D2 (Collection Tampering), D4 (Lineage Forgery), D5 (Stream Injection), D6 (Aggregation Manipulation), D8 (Time Manipulation). These require L2 or higher.

Conformance Checklist

Use this checklist to verify L1 compliance. All items must pass for a data artifact to claim Makoto L1.

Producer Checklist

  • ☐ Attestation is generated for each data artifact
  • ☐ Attestation uses in-toto Statement v1 format
  • ☐ Predicate type is a valid Makoto type URL
  • ☐ Subject includes artifact name and SHA-256 digest
  • ☐ Origin attestations include source and collection timestamp
  • ☐ Transform attestations include inputs and transform type
  • ☐ All required fields are present and non-empty

Platform Checklist

  • ☐ Platform can generate Makoto-compliant attestations
  • ☐ Platform stores attestations persistently
  • ☐ Consumers can discover and retrieve attestations
  • ☐ Attestation discovery mechanism is documented

Verification Checklist

  • ☐ Attestation exists and parses as valid JSON
  • ☐ Attestation validates against Makoto JSON Schema
  • ☐ Data artifact hash matches attestation subject digest
  • ☐ All required fields for predicate type are present

Next Steps

View Examples

See complete L1 attestation examples for origin, transform, and streaming.

Browse examples →

Upgrade to L2

Add cryptographic signatures for tamper-evident provenance.

L2 requirements →

Get Started

Implement L1 in your pipeline with Expanso or other tools.

Expanso integration →