🧱 Databricks + Makoto Integration Concept

Unity Catalog + Job event hooks — DBOMs for every notebook and job run.

Note: This page explores how Makoto Levels could be implemented on Databricks. It is a conceptual integration proposal — illustrative, not a shipped library. The patterns shown use real Databricks APIs; the Makoto pieces are sketches you (or we) could build out.

What is Databricks?

Databricks combines Spark, Delta Lake, and Unity Catalog. Unity already tracks lineage at the column level; Jobs already emit lifecycle events. Makoto sits on top of both, turning Unity lineage into Origin attestations and Job run events into Transform attestations.

Unity Catalog LineageColumn-level lineage → attestation subject set
Job Run HooksJob-completed event → publish DBOM
Delta Live TablesDLT expectations become attestation predicates
System Tables`system.access.audit` enriches the issuer field

Integration Approach

Primary pattern: Unity Catalog hook + Job webhook + DLT expectation expressions. Below are the integration options ordered by lift required.

How Makoto attaches to Databricks

  • Databricks App — Install the Makoto app; it subscribes to Job events and reads Unity lineage on completion.
  • DLT expectation library — DLT pipelines call `dlt.expect(makoto.attestation_valid(input_table))` to refuse unattested data.
  • Notebook helper — `%pip install makoto-databricks` — wrap any cell with `with makoto.transform(level=2): ...`.
  • Workflow on_failure / on_success — Job-level webhook posts the signed DBOM to the configured store.

Conceptual Code Example

Concept: Delta Live Tables with Makoto attestation

Per-table attestation, refusal of unattested upstreams, all in one decorator

import dlt
from pyspark.sql.functions import col, sha2
from makoto_databricks import attest, expect_attestation

@dlt.table(name="raw_orders")
@attest.origin(level=2, signing_key="kms://aws/key/...")
def raw_orders():
    """Pulled from operational Postgres via CDC."""
    return (
        spark.readStream
             .format("kafka")
             .option("subscribe", "orders.cdc")
             .load()
    )

@dlt.table(name="curated_orders")
@attest.transform(level=2)
@dlt.expect("must_have_dbom",
    expect_attestation(upstream="raw_orders"))
def curated_orders():
    """Hash PII, retain only paid orders."""
    return (
        dlt.read_stream("raw_orders")
           .withColumn("email_hash", sha2(col("email"), 256))
           .filter(col("status") == "paid")
           .drop("email")
    )

# On materialization, the @attest decorator:
#   - reads Unity Catalog lineage for `upstream` digests
#   - computes the output table content digest
#   - signs via Databricks Secrets-stored cosign key
#   - writes the DBOM to system.makoto.attestations

Potential Use Cases

Regulated Lakehouses

FedRAMP, HIPAA workloads — every Delta table mutation carries a signed receipt.

Cross-workspace Shares

Delta Sharing recipients verify DBOMs before consuming.

Feature Store Provenance

Every feature in the feature store carries a DBOM linked to its training data.

MLflow Model Cards

Model cards embed DBOMs proving the training data lineage.

Interested in Databricks + Makoto?

This is a conceptual integration. If you're shipping Databricks pipelines and want to add Makoto attestations, open an issue or reach out — we'd love to scope a real implementation.

Learn about Databricks Read Makoto Spec All Integrations