🧱 Databricks + Makoto Integration Concept
Unity Catalog + Job event hooks — DBOMs for every notebook and job run.
What is Databricks?
Databricks combines Spark, Delta Lake, and Unity Catalog. Unity already tracks lineage at the column level; Jobs already emit lifecycle events. Makoto sits on top of both, turning Unity lineage into Origin attestations and Job run events into Transform attestations.
Integration Approach
Primary pattern: Unity Catalog hook + Job webhook + DLT expectation expressions. Below are the integration options ordered by lift required.
How Makoto attaches to Databricks
- Databricks App — Install the Makoto app; it subscribes to Job events and reads Unity lineage on completion.
- DLT expectation library — DLT pipelines call `dlt.expect(makoto.attestation_valid(input_table))` to refuse unattested data.
- Notebook helper — `%pip install makoto-databricks` — wrap any cell with `with makoto.transform(level=2): ...`.
- Workflow on_failure / on_success — Job-level webhook posts the signed DBOM to the configured store.
Conceptual Code Example
Concept: Delta Live Tables with Makoto attestation
Per-table attestation, refusal of unattested upstreams, all in one decorator
import dlt from pyspark.sql.functions import col, sha2 from makoto_databricks import attest, expect_attestation @dlt.table(name="raw_orders") @attest.origin(level=2, signing_key="kms://aws/key/...") def raw_orders(): """Pulled from operational Postgres via CDC.""" return ( spark.readStream .format("kafka") .option("subscribe", "orders.cdc") .load() ) @dlt.table(name="curated_orders") @attest.transform(level=2) @dlt.expect("must_have_dbom", expect_attestation(upstream="raw_orders")) def curated_orders(): """Hash PII, retain only paid orders.""" return ( dlt.read_stream("raw_orders") .withColumn("email_hash", sha2(col("email"), 256)) .filter(col("status") == "paid") .drop("email") ) # On materialization, the @attest decorator: # - reads Unity Catalog lineage for `upstream` digests # - computes the output table content digest # - signs via Databricks Secrets-stored cosign key # - writes the DBOM to system.makoto.attestations
Potential Use Cases
Regulated Lakehouses
FedRAMP, HIPAA workloads — every Delta table mutation carries a signed receipt.
Cross-workspace Shares
Delta Sharing recipients verify DBOMs before consuming.
Feature Store Provenance
Every feature in the feature store carries a DBOM linked to its training data.
MLflow Model Cards
Model cards embed DBOMs proving the training data lineage.
Interested in Databricks + Makoto?
This is a conceptual integration. If you're shipping Databricks pipelines and want to add Makoto attestations, open an issue or reach out — we'd love to scope a real implementation.