Attestation Examples
Complete examples of Makoto Levels (Makoto) attestations with field-by-field explanations.
Origin Attestation
Documents where data came from, how it was collected, consent status, and geographic origin. This is the starting point for any data lineage chain.
{
"_type": "https://in-toto.io/Statement/v1",
"subject": [{
"name": "dataset:customer_transactions_2025q4",
"digest": {
"sha256": "a1b2c3d4e5f6789012345678901234567890abcdef...",
"recordCount": "1847293",
"merkleRoot": "def456789012345678901234567890abcdef..."
}
}],
"predicateType": "https://makoto.dev/origin/v1",
"predicate": {
"origin": {
"source": "https://api.partner-bank.com/v2/transactions",
"sourceType": "api",
"collectionMethod": "scheduled-pull",
"collectionTimestamp": "2025-12-20T08:00:00Z",
"geography": "US-WEST-2",
"consent": {
"type": "contractual",
"reference": "https://legal.example.com/dpa/2025"
}
},
"collector": {
"id": "https://expanso.io/collectors/prod-west-01",
"version": {
"expanso-cli": "1.4.2"
}
},
"schema": {
"format": "json-lines",
"schemaRef": "https://schemas.example.com/transactions/v2"
}
}
}
Key Fields Explained
origin.source
URI identifying the data source (API endpoint, database, etc.)
origin.geography
Geographic region where data was collected (for compliance)
origin.consent
Consent or legal basis for data collection
collector.id
Identifier of the system that collected the data
digest.merkleRoot
Merkle tree root for efficient partial verification
Transform Attestation
Documents what processing was applied to data, linking to input attestations to form a complete lineage chain.
{
"_type": "https://in-toto.io/Statement/v1",
"subject": [{
"name": "dataset:customer_transactions_anonymized",
"digest": {
"sha256": "xyz789012345678901234567890abcdef...",
"recordCount": "1847293"
}
}],
"predicateType": "https://makoto.dev/transform/v1",
"predicate": {
"inputs": [{
"name": "dataset:customer_transactions_2025q4",
"digest": { "sha256": "a1b2c3d4..." },
"attestationRef": "https://attestations.example.com/origin/a1b2c3d4"
}],
"transform": {
"type": "https://makoto.dev/transforms/anonymization",
"name": "PII Anonymization Pipeline",
"version": "1.2.0",
"parameters": {
"fieldsRemoved": ["email", "phone", "address"],
"fieldsHashed": ["customer_id"],
"kAnonymity": 5
},
"codeRef": {
"uri": "git+https://github.com/example/[email protected]",
"digest": { "sha256": "pipeline_hash..." }
}
},
"executor": {
"id": "https://expanso.io/pipelines/prod-cluster-01",
"platform": "expanso"
},
"metadata": {
"startedOn": "2025-12-20T08:00:45Z",
"finishedOn": "2025-12-20T08:15:32Z",
"recordsInput": 1847293,
"recordsOutput": 1847293
}
}
}
Key Fields Explained
inputs[].attestationRef
Reference to input data's attestation (creates lineage chain)
transform.parameters
Configuration used for this transformation
transform.codeRef
Git reference to the transformation code for reproducibility
executor.platform
Platform that executed the transformation
Stream Window Attestation
For high-throughput streaming: attests to time-bounded windows using Merkle trees for efficiency. Enables attestation at millions of events per second.
{
"_type": "https://in-toto.io/Statement/v1",
"subject": [{
"name": "stream:iot_sensors:window_20251220_100000",
"digest": {
"merkleRoot": "abc123def456...",
"windowStart": "2025-12-20T10:00:00Z",
"windowEnd": "2025-12-20T10:01:00Z",
"recordCount": "847293"
}
}],
"predicateType": "https://makoto.dev/stream-window/v1",
"predicate": {
"stream": {
"id": "iot_sensors",
"source": "mqtt://sensors.factory.example.com:1883",
"partitions": ["temp", "pressure", "vibration"]
},
"window": {
"type": "tumbling",
"duration": "PT1M",
"watermark": "2025-12-20T09:59:55Z"
},
"integrity": {
"merkleTree": {
"algorithm": "sha256",
"leafCount": 847293,
"treeHeight": 20
},
"chain": {
"previousWindowId": "window_20251220_095900",
"previousMerkleRoot": "xyz789..."
}
}
}
}
Merkle Tree Structure
Only the root hash is signed. Individual records can be verified with O(log n) proof.
Data Bill of Materials
A DBOM aggregates all attestations for a dataset into a single document, similar to an SBOM for software.
{
"dbomVersion": "1.0.0",
"dataset": {
"name": "ml_training_dataset_v3",
"version": "3.0.0",
"created": "2025-12-20T12:00:00Z",
"makotoLevel": "L2"
},
"sources": [
{
"name": "customer_transactions",
"attestationRef": "https://attestations.example.com/origin/abc",
"makotoLevel": "L2",
"geography": "US"
},
{
"name": "public_weather_data",
"attestationRef": "https://attestations.example.com/origin/def",
"makotoLevel": "L1",
"license": "CC-BY-4.0"
}
],
"transformations": [
{ "order": 1, "name": "Join datasets", "attestationRef": "..." },
{ "order": 2, "name": "Anonymize PII", "attestationRef": "..." },
{ "order": 3, "name": "Feature engineering", "attestationRef": "..." }
]
}
Signed Envelope (L2+)
At L2 and above, attestations are wrapped in a DSSE (Dead Simple Signing Envelope) with cryptographic signatures.
{
"payloadType": "application/vnd.in-toto+json",
"payload": "eyJfdHlwZSI6Imh0dHBzOi8vaW4tdG90by5pby9TdGF0ZW1lbnQvdjEi...",
"signatures": [{
"keyid": "https://expanso.io/keys/prod-signer-01",
"sig": "MEUCIQD2qN3..."
}]
}
Note: The payload is base64-encoded attestation JSON. Signers sign the payload bytes, enabling verification without parsing.
A Real DBOM
The fastest way to understand what a DBOM looks like is to see one applied to a dataset you probably already know.
We generated a DBOM for rajpurkar/squad — one of the most widely used NLP benchmarks. SQuAD is a useful example because its provenance is publicly documented: the source corpus is English Wikipedia, annotation was collected via Mechanical Turk, and the processing scripts are on GitHub. Three distinct transformations. Each with inputs and outputs that can be hashed.
The intermediate step digests are placeholders — Rajpurkar never published hashes for the intermediate artifacts, only the final dataset. That gap is visible in the DBOM. A well-formed DBOM doesn't just document what exists. It surfaces what's missing.
The attestations array is empty. No third party has certified this dataset for bias, quality, or regulatory compliance. In a world where that field is populated — where a bias audit firm signs a claim against a specific digest, or a regulator attests to CRA or EU AI Act compliance — that signature travels with the data permanently.
Want to generate a DBOM for your own dataset? Use this prompt with any LLM. Point it at a Hugging Face dataset card and a paper. It will produce a draft DBOM and a list of the provenance gaps it couldn't fill. Those gaps are signal.