Demo 03 ~2 min

GitHub Action — Provenance on Every Release

Generate a DBOM for any data file with a single command — then automate it in CI.

Manual, forgotten provenance Automatic DBOM in CI

The Problem

Your team releases data files every sprint — updated models, refreshed training sets, new configuration bundles. Each release should include provenance metadata: where the data came from, who signed off, what hash it had at release time. But provenance is a manual step, and manual steps get skipped. By the third sprint, nobody remembers to generate the attestation file. By the sixth sprint, the compliance team notices.

The real cost isn't the audit finding — it's the scramble to retroactively reconstruct provenance for files that have already shipped. You're reverse-engineering hash values from old CI logs and guessing at signer identities from git blame. A five-second automation problem has become a five-day archaeology project.

What You Will See

Generated DBOM Output
$ uv run generate_dbom.py ../01-poisoned-pipeline/data/sensors_clean.csv Generated DBOM: sensors_clean.csv.dbom.json { "schema_version": "0.1", "id": "dbom-f3a7b2c1-9d4e-4f5a-8b6c-2e1d0f9a8b7c", "created_at": "2025-01-15T10:30:00Z", "source": { "uri": "local:sensors_clean.csv@HEAD", "hash": { "algorithm": "sha256", "value": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2" }, "format": "csv" }, "signature": { "algorithm": "sha256", "value": "a1b2c3d4...", "signer": "github:unknown" }, "lineage": [{ "step": 1, "description": "Auto-generated DBOM for sensors_clean.csv", "tool": "makoto-action v0.1", "input_hash": "n/a", "output_hash": "a1b2c3d4..." }] }

Run It

$ git clone https://github.com/makoto-project/makoto
$ cd makoto/demos/03-github-action
$ uv run generate_dbom.py path/to/your/data.csv
Key Insight: Provenance that depends on humans remembering will always fail — make it automatic and it becomes invisible infrastructure.

What Else This Handles