Level 1: Provenance Exists
This document specifies the formal requirements for achieving Makoto Level 1 (L1). L1 establishes the foundation for data provenance by requiring machine-readable documentation of data origin and processing. At this level, attestations may be self-reported—the goal is ensuring provenance information exists in a structured, parseable format.
"The data pipeline produces attestations that document data origin and processing."
Key Principle: L1 is intentionally achievable with minimal tooling. Any organization can reach L1 by generating structured JSON attestations—no cryptographic infrastructure required. This enables progressive adoption where teams start documenting provenance immediately.
Requirements Summary
| Category | Requirement | Description |
|---|---|---|
| Producer | Attestation Generation | Generate machine-readable attestation for each data artifact |
| Origin Documentation | Document data source with sufficient detail for identification | |
| Transform Documentation | Document processing steps applied to the data | |
| Platform | Format Support | Support generation and storage of Makoto attestation format |
| Attestation Delivery | Deliver attestations alongside or linked to data artifacts | |
| Verification | Schema Validation | Validate attestation conforms to Makoto schema |
| Completeness Check | Verify required fields are present and non-empty |
Producer Requirements
Data producers are entities that collect, generate, or transform data. At L1, producers must generate attestations documenting their data artifacts.
P1: Attestation Generation
Requirement: The producer MUST generate a machine-readable attestation for each data artifact that claims L1 compliance.
| Aspect | Specification |
|---|---|
| Format | JSON document conforming to in-toto Statement v1 |
| Predicate Type | One of: https://makoto.dev/origin/v1, https://makoto.dev/transform/v1, or https://makoto.dev/stream-window/v1 |
| Encoding | UTF-8 encoded JSON |
| Timing | Attestation SHOULD be generated at the time of data production or as close as practical |
P2: Origin Documentation
Requirement: For data collection/ingestion, the producer MUST document the data source using the origin/v1 predicate type.
| Field | Required | Description |
|---|---|---|
origin.source |
REQUIRED | URI or identifier of the data source (API endpoint, database, file path, etc.) |
origin.sourceType |
RECOMMENDED | Type of source: api, database, file, stream, manual |
origin.collectionTimestamp |
REQUIRED | ISO 8601 timestamp when data was collected |
origin.collectionMethod |
RECOMMENDED | How data was obtained: pull, push, export, manual |
collector.id |
REQUIRED | Identifier for the collecting system or process |
Example Origin Attestation
{
"_type": "https://in-toto.io/Statement/v1",
"subject": [{
"name": "dataset:customer_orders_2025q1",
"digest": {
"sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
}],
"predicateType": "https://makoto.dev/origin/v1",
"predicate": {
"origin": {
"source": "https://erp.example.com/api/v2/orders",
"sourceType": "api",
"collectionTimestamp": "2025-01-15T08:00:00Z",
"collectionMethod": "pull"
},
"collector": {
"id": "data-platform/ingestion-service-v3"
}
}
}
P3: Transform Documentation
Requirement: For data transformations (ETL, filtering, aggregation, etc.), the producer MUST document each processing step using the transform/v1 predicate type.
| Field | Required | Description |
|---|---|---|
inputs |
REQUIRED | Array of input data artifacts with names and digests |
transform.type |
REQUIRED | URI identifying the type of transformation |
transform.name |
RECOMMENDED | Human-readable name for the transformation |
transform.parameters |
RECOMMENDED | Configuration parameters used (sanitized of secrets) |
executor.id |
REQUIRED | Identifier for the system executing the transform |
metadata.startedOn |
RECOMMENDED | ISO 8601 timestamp when processing started |
metadata.finishedOn |
RECOMMENDED | ISO 8601 timestamp when processing completed |
Example Transform Attestation
{
"_type": "https://in-toto.io/Statement/v1",
"subject": [{
"name": "dataset:customer_orders_anonymized",
"digest": {
"sha256": "a1b2c3d4e5f6..."
}
}],
"predicateType": "https://makoto.dev/transform/v1",
"predicate": {
"inputs": [{
"name": "dataset:customer_orders_2025q1",
"digest": { "sha256": "e3b0c44298fc1c149..." }
}],
"transform": {
"type": "https://makoto.dev/transforms/anonymization",
"name": "PII Removal",
"parameters": {
"fields_removed": ["email", "phone"],
"fields_hashed": ["customer_id"]
}
},
"executor": {
"id": "data-platform/etl-pipeline-v2"
},
"metadata": {
"startedOn": "2025-01-15T09:00:00Z",
"finishedOn": "2025-01-15T09:15:00Z"
}
}
}
P4: Subject Binding
Requirement: Each attestation MUST identify the data artifact it describes via the subject field.
| Field | Required | Description |
|---|---|---|
subject[].name |
REQUIRED | Identifier for the data artifact (dataset name, file path, stream ID) |
subject[].digest.sha256 |
REQUIRED | SHA-256 hash of the data artifact content |
subject[].digest.recordCount |
RECOMMENDED | Number of records in the dataset (for tabular data) |
Note: The digest provides content binding—if data changes, the digest changes. This allows verification that attestations match their referenced data.
Platform Requirements
Platforms are systems that host or execute data processing (ETL tools, data warehouses, streaming platforms, etc.). At L1, platforms must support attestation generation and delivery.
PL1: Attestation Format Support
Requirement: The platform MUST support generation of attestations in the Makoto format.
- Generate valid JSON conforming to in-toto Statement v1 schema
- Support all three Makoto predicate types (origin, transform, stream-window)
- Compute SHA-256 digests of data artifacts
- Generate ISO 8601 timestamps
Implementation Note: Platforms may implement attestation generation natively, via plugins, or through external tooling. The mechanism is not specified—only the output format.
PL2: Attestation Storage
Requirement: The platform MUST provide a mechanism to store attestations persistently.
| Option | Description | Example |
|---|---|---|
| Sidecar File | Store attestation alongside data with predictable naming | data.json → data.json.attestation |
| Attestation Registry | Centralized service for attestation storage and retrieval | REST API at /attestations/{digest} |
| Metadata Store | Store attestations in data catalog or metadata system | DataHub, Amundsen, or similar |
| Embedded | Include attestation within data package | ZIP archive with ATTESTATION.json |
PL3: Attestation Delivery
Requirement: The platform MUST provide a mechanism for consumers to retrieve attestations for data they receive.
- Attestations SHOULD be discoverable from the data artifact (via naming convention, metadata, or registry lookup)
- Attestation retrieval SHOULD NOT require special permissions beyond data access
- The platform SHOULD document the attestation discovery mechanism
Verification Requirements
Verification describes what consumers can check to confirm L1 compliance. At L1, verification focuses on attestation presence and format—not cryptographic authenticity.
V1: Attestation Presence
Check: Verify that a machine-readable attestation exists for the data artifact.
- PASS: Attestation file exists and is valid JSON
- FAIL: No attestation found, or attestation is not valid JSON
V2: Schema Compliance
Check: Verify that the attestation conforms to the Makoto schema.
| Field | Validation |
|---|---|
_type |
MUST equal "https://in-toto.io/Statement/v1" |
subject |
MUST be non-empty array with at least one entry |
subject[].name |
MUST be non-empty string |
subject[].digest.sha256 |
MUST be valid SHA-256 hex string (64 characters) |
predicateType |
MUST be valid Makoto predicate type URL |
predicate |
MUST be object conforming to predicate type schema |
JSON Schema files for validation are available in the schemas directory.
V3: Subject Binding Verification
Check: Verify that the attestation subject matches the actual data.
- Compute SHA-256 hash of the data artifact
- Compare computed hash to
subject[].digest.sha256in attestation - PASS: Hashes match
- FAIL: Hashes do not match (data may have been modified)
# Example verification (bash)
EXPECTED=$(jq -r '.subject[0].digest.sha256' attestation.json)
ACTUAL=$(sha256sum data.json | cut -d' ' -f1)
if [ "$EXPECTED" = "$ACTUAL" ]; then
echo "PASS: Data matches attestation"
else
echo "FAIL: Data hash mismatch"
fi
V4: Required Field Completeness
Check: Verify that all required fields for the predicate type are present and non-empty.
For origin/v1:
predicate.origin.source— non-empty stringpredicate.origin.collectionTimestamp— valid ISO 8601 timestamppredicate.collector.id— non-empty string
For transform/v1:
predicate.inputs— non-empty arraypredicate.transform.type— non-empty string (URI)predicate.executor.id— non-empty string
Threat Mitigations
L1 provides partial mitigation against the following data supply chain threats. For full mitigation, see L2 and L3 requirements.
| Threat | Description | L1 Mitigation |
|---|---|---|
| D1 | Source Falsification — Claiming data came from an authoritative source when it didn't | Partial Origin is documented, but claims are self-attested (not cryptographically verified) |
| D3 | Transform Opacity — Hiding what transformations were applied to data | Partial Transforms are documented, enabling audit trails and lineage visibility |
Important: L1 does NOT mitigate threats requiring cryptographic verification: D2 (Collection Tampering), D4 (Lineage Forgery), D5 (Stream Injection), D6 (Aggregation Manipulation), D8 (Time Manipulation). These require L2 or higher.
Conformance Checklist
Use this checklist to verify L1 compliance. All items must pass for a data artifact to claim Makoto L1.
Producer Checklist
- ☐ Attestation is generated for each data artifact
- ☐ Attestation uses in-toto Statement v1 format
- ☐ Predicate type is a valid Makoto type URL
- ☐ Subject includes artifact name and SHA-256 digest
- ☐ Origin attestations include source and collection timestamp
- ☐ Transform attestations include inputs and transform type
- ☐ All required fields are present and non-empty
Platform Checklist
- ☐ Platform can generate Makoto-compliant attestations
- ☐ Platform stores attestations persistently
- ☐ Consumers can discover and retrieve attestations
- ☐ Attestation discovery mechanism is documented
Verification Checklist
- ☐ Attestation exists and parses as valid JSON
- ☐ Attestation validates against Makoto JSON Schema
- ☐ Data artifact hash matches attestation subject digest
- ☐ All required fields for predicate type are present
Next Steps
View Examples
See complete L1 attestation examples for origin, transform, and streaming.
Browse examples →