Attestation Examples

Complete examples of Makoto Levels (Makoto) attestations with field-by-field explanations.

Origin Attestation

makoto.dev/origin/v1

Documents where data came from, how it was collected, consent status, and geographic origin. This is the starting point for any data lineage chain.

origin-attestation.json
Makoto L2
{
  "_type": "https://in-toto.io/Statement/v1",
  "subject": [{
    "name": "dataset:customer_transactions_2025q4",
    "digest": {
      "sha256": "a1b2c3d4e5f6789012345678901234567890abcdef...",
      "recordCount": "1847293",
      "merkleRoot": "def456789012345678901234567890abcdef..."
    }
  }],
  "predicateType": "https://makoto.dev/origin/v1",
  "predicate": {
    "origin": {
      "source": "https://api.partner-bank.com/v2/transactions",
      "sourceType": "api",
      "collectionMethod": "scheduled-pull",
      "collectionTimestamp": "2025-12-20T08:00:00Z",
      "geography": "US-WEST-2",
      "consent": {
        "type": "contractual",
        "reference": "https://legal.example.com/dpa/2025"
      }
    },
    "collector": {
      "id": "https://expanso.io/collectors/prod-west-01",
      "version": {
        "expanso-cli": "1.4.2"
      }
    },
    "schema": {
      "format": "json-lines",
      "schemaRef": "https://schemas.example.com/transactions/v2"
    }
  }
}

Key Fields Explained

origin.source URI identifying the data source (API endpoint, database, etc.)
origin.geography Geographic region where data was collected (for compliance)
origin.consent Consent or legal basis for data collection
collector.id Identifier of the system that collected the data
digest.merkleRoot Merkle tree root for efficient partial verification

Transform Attestation

makoto.dev/transform/v1

Documents what processing was applied to data, linking to input attestations to form a complete lineage chain.

transform-attestation.json
Makoto L2
{
  "_type": "https://in-toto.io/Statement/v1",
  "subject": [{
    "name": "dataset:customer_transactions_anonymized",
    "digest": {
      "sha256": "xyz789012345678901234567890abcdef...",
      "recordCount": "1847293"
    }
  }],
  "predicateType": "https://makoto.dev/transform/v1",
  "predicate": {
    "inputs": [{
      "name": "dataset:customer_transactions_2025q4",
      "digest": { "sha256": "a1b2c3d4..." },
      "attestationRef": "https://attestations.example.com/origin/a1b2c3d4"
    }],
    "transform": {
      "type": "https://makoto.dev/transforms/anonymization",
      "name": "PII Anonymization Pipeline",
      "version": "1.2.0",
      "parameters": {
        "fieldsRemoved": ["email", "phone", "address"],
        "fieldsHashed": ["customer_id"],
        "kAnonymity": 5
      },
      "codeRef": {
        "uri": "git+https://github.com/example/[email protected]",
        "digest": { "sha256": "pipeline_hash..." }
      }
    },
    "executor": {
      "id": "https://expanso.io/pipelines/prod-cluster-01",
      "platform": "expanso"
    },
    "metadata": {
      "startedOn": "2025-12-20T08:00:45Z",
      "finishedOn": "2025-12-20T08:15:32Z",
      "recordsInput": 1847293,
      "recordsOutput": 1847293
    }
  }
}

Key Fields Explained

inputs[].attestationRef Reference to input data's attestation (creates lineage chain)
transform.parameters Configuration used for this transformation
transform.codeRef Git reference to the transformation code for reproducibility
executor.platform Platform that executed the transformation

Stream Window Attestation

makoto.dev/stream-window/v1

For high-throughput streaming: attests to time-bounded windows using Merkle trees for efficiency. Enables attestation at millions of events per second.

stream-window-attestation.json
Makoto L2
{
  "_type": "https://in-toto.io/Statement/v1",
  "subject": [{
    "name": "stream:iot_sensors:window_20251220_100000",
    "digest": {
      "merkleRoot": "abc123def456...",
      "windowStart": "2025-12-20T10:00:00Z",
      "windowEnd": "2025-12-20T10:01:00Z",
      "recordCount": "847293"
    }
  }],
  "predicateType": "https://makoto.dev/stream-window/v1",
  "predicate": {
    "stream": {
      "id": "iot_sensors",
      "source": "mqtt://sensors.factory.example.com:1883",
      "partitions": ["temp", "pressure", "vibration"]
    },
    "window": {
      "type": "tumbling",
      "duration": "PT1M",
      "watermark": "2025-12-20T09:59:55Z"
    },
    "integrity": {
      "merkleTree": {
        "algorithm": "sha256",
        "leafCount": 847293,
        "treeHeight": 20
      },
      "chain": {
        "previousWindowId": "window_20251220_095900",
        "previousMerkleRoot": "xyz789..."
      }
    }
  }
}

Merkle Tree Structure

Root Hash ← Signed Hash(L+R) Hash(L+R) Record 1-N Record N+1... Record ... Record M

Only the root hash is signed. Individual records can be verified with O(log n) proof.

Data Bill of Materials

Complete lineage document

A DBOM aggregates all attestations for a dataset into a single document, similar to an SBOM for software.

dbom.json
{
  "dbomVersion": "1.0.0",
  "dataset": {
    "name": "ml_training_dataset_v3",
    "version": "3.0.0",
    "created": "2025-12-20T12:00:00Z",
    "makotoLevel": "L2"
  },
  "sources": [
    {
      "name": "customer_transactions",
      "attestationRef": "https://attestations.example.com/origin/abc",
      "makotoLevel": "L2",
      "geography": "US"
    },
    {
      "name": "public_weather_data",
      "attestationRef": "https://attestations.example.com/origin/def",
      "makotoLevel": "L1",
      "license": "CC-BY-4.0"
    }
  ],
  "transformations": [
    { "order": 1, "name": "Join datasets", "attestationRef": "..." },
    { "order": 2, "name": "Anonymize PII", "attestationRef": "..." },
    { "order": 3, "name": "Feature engineering", "attestationRef": "..." }
  ]
}

Signed Envelope (L2+)

DSSE format

At L2 and above, attestations are wrapped in a DSSE (Dead Simple Signing Envelope) with cryptographic signatures.

signed-attestation.json
Makoto L2
{
  "payloadType": "application/vnd.in-toto+json",
  "payload": "eyJfdHlwZSI6Imh0dHBzOi8vaW4tdG90by5pby9TdGF0ZW1lbnQvdjEi...",
  "signatures": [{
    "keyid": "https://expanso.io/keys/prod-signer-01",
    "sig": "MEUCIQD2qN3..."
  }]
}

Note: The payload is base64-encoded attestation JSON. Signers sign the payload bytes, enabling verification without parsing.

A Real DBOM

rajpurkar/squad · Hugging Face

The fastest way to understand what a DBOM looks like is to see one applied to a dataset you probably already know.

We generated a DBOM for rajpurkar/squad — one of the most widely used NLP benchmarks. SQuAD is a useful example because its provenance is publicly documented: the source corpus is English Wikipedia, annotation was collected via Mechanical Turk, and the processing scripts are on GitHub. Three distinct transformations. Each with inputs and outputs that can be hashed.

View squad-dbom.json →

The intermediate step digests are placeholders — Rajpurkar never published hashes for the intermediate artifacts, only the final dataset. That gap is visible in the DBOM. A well-formed DBOM doesn't just document what exists. It surfaces what's missing.

The attestations array is empty. No third party has certified this dataset for bias, quality, or regulatory compliance. In a world where that field is populated — where a bias audit firm signs a claim against a specific digest, or a regulator attests to CRA or EU AI Act compliance — that signature travels with the data permanently.

Want to generate a DBOM for your own dataset? Use this prompt with any LLM. Point it at a Hugging Face dataset card and a paper. It will produce a draft DBOM and a list of the provenance gaps it couldn't fill. Those gaps are signal.