<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <title>Data Today: Databricks</title>
  <subtitle>Field notes for teams building on the Databricks Data Intelligence Platform.</subtitle>
  <link href="https://data-today.net/databricks/feed.xml" rel="self" />
  <link href="https://data-today.net/" />
  <updated>2026-06-07T00:00:00Z</updated>
  <id>https://data-today.net/</id>
  <author>
    <name>Data Today Newsroom</name>
  </author>
  <entry>
    <title>Databricks Data Intelligence Platform, explained for builders</title>
    <link href="https://data-today.net/databricks/databricks-data-intelligence-platform/" />
    <updated>2026-06-07T00:00:00Z</updated>
    <id>https://data-today.net/databricks/databricks-data-intelligence-platform/</id>
    <content type="html">&lt;p&gt;If you last touched Databricks as &amp;quot;the Spark notebooks company&amp;quot;, the platform has grown into something much wider, and the bill grows with it. The &lt;strong&gt;Databricks Data Intelligence Platform&lt;/strong&gt; is the lakehouse stack that sits on top of your own cloud storage and tries to be the one place a data team ingests, governs, queries, and runs AI on its data. This guide is the starting point for our Databricks studio: what the platform actually is, how a workload executes, and where the money goes.&lt;/p&gt;
&lt;p&gt;The pitch is that a single platform on open Delta tables can replace the old split between a data lake for raw files and a warehouse for SQL. The interesting question for a builder is not the marketing, it is the plumbing: what runs where, what a DBU really is, and where the platform earns its keep versus where it quietly drains the budget.&lt;/p&gt;
&lt;h2 id=&quot;what-is-the-data-intelligence-platform-concretely&quot; tabindex=&quot;-1&quot;&gt;&lt;a class=&quot;header-anchor&quot; href=&quot;https://data-today.net/databricks/databricks-data-intelligence-platform/#what-is-the-data-intelligence-platform-concretely&quot;&gt;&lt;span&gt;What is the Data Intelligence Platform, concretely?&lt;/span&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The platform is a managed control plane that Databricks hosts, connected to compute and storage that run in &lt;strong&gt;your own cloud account&lt;/strong&gt; on AWS, Azure, or GCP. Your data sits in your cloud object storage as &lt;strong&gt;Delta Lake&lt;/strong&gt; tables, an open format built on Parquet with a transaction log that adds ACID transactions, time travel, and schema enforcement to plain files.&lt;/p&gt;
&lt;p&gt;Four layers stack on top of that storage, and it helps to hold them separate in your head:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Storage:&lt;/strong&gt; Delta tables in your object store, the open foundation everything else reads and writes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Governance:&lt;/strong&gt; Unity Catalog, the single metastore that owns permissions, lineage, and discovery across every workspace.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compute:&lt;/strong&gt; clusters, SQL warehouses, and serverless, the engines that actually do work and burn money.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Intelligence:&lt;/strong&gt; Mosaic AI, the model serving, AI functions, and agent tooling layered across the rest.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The architectural fact worth internalizing is that &lt;strong&gt;storage and compute are fully decoupled&lt;/strong&gt;. Your tables persist whether or not anything is running, and you pay for compute only while an engine is on. That single design choice explains most of the platform&#39;s cost behaviour, which we will come back to.&lt;/p&gt;
&lt;h2 id=&quot;how-does-a-workload-actually-run&quot; tabindex=&quot;-1&quot;&gt;&lt;a class=&quot;header-anchor&quot; href=&quot;https://data-today.net/databricks/databricks-data-intelligence-platform/#how-does-a-workload-actually-run&quot;&gt;&lt;span&gt;How does a workload actually run?&lt;/span&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Every piece of work runs on compute you choose, and picking the wrong type is the most common and most expensive beginner mistake. There are three broad families.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Compute type&lt;/th&gt;
&lt;th&gt;What it is for&lt;/th&gt;
&lt;th&gt;Billing shape&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;All-purpose clusters&lt;/td&gt;
&lt;td&gt;Interactive notebooks and ad hoc exploration&lt;/td&gt;
&lt;td&gt;Highest DBU rate, easy to leave running&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Job clusters&lt;/td&gt;
&lt;td&gt;Scheduled pipelines and production jobs&lt;/td&gt;
&lt;td&gt;Lower DBU rate, spun up and torn down per run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL warehouses&lt;/td&gt;
&lt;td&gt;BI and SQL queries on Databricks SQL&lt;/td&gt;
&lt;td&gt;Sized in T-shirt sizes, autoscale on concurrency&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The unit on the Databricks side of the bill is the &lt;strong&gt;DBU&lt;/strong&gt;, a Databricks Unit, which meters processing per second at a rate that depends on the compute type and tier. Crucially, you pay that DBU charge &lt;strong&gt;on top of&lt;/strong&gt; the underlying cloud VM cost, which your cloud provider bills you separately. So a cluster left running overnight costs you twice: once in idle DBUs and once in idle EC2 or equivalent.&lt;/p&gt;
&lt;p&gt;The single most common cost surprise is an &lt;strong&gt;all-purpose cluster used for a job that should have run on a cheaper job cluster&lt;/strong&gt;, often with auto-termination disabled. Moving that workload to a job cluster with a short auto-termination window is usually the highest-return change a new team can make.&lt;/p&gt;
&lt;h2 id=&quot;where-does-the-cost-actually-go&quot; tabindex=&quot;-1&quot;&gt;&lt;a class=&quot;header-anchor&quot; href=&quot;https://data-today.net/databricks/databricks-data-intelligence-platform/#where-does-the-cost-actually-go&quot;&gt;&lt;span&gt;Where does the cost actually go?&lt;/span&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This is the question that decides whether a Databricks workspace stays affordable, and the answer has layers you pay separately. The chart below shows the rough shape of a typical monthly spend: compute DBUs dominate at around &lt;strong&gt;65 percent&lt;/strong&gt;, the underlying cloud VMs are roughly &lt;strong&gt;25 percent&lt;/strong&gt;, and storage and egress are about &lt;strong&gt;10 percent&lt;/strong&gt;. The exact mix moves with your workload, but the lesson holds: compute, not storage, is where the bill lives.&lt;/p&gt;
&lt;figure class=&quot;figure&quot;&gt;&lt;img src=&quot;https://data-today.net/posts/databricks-data-intelligence-platform-fig.png&quot; alt=&quot;Horizontal bars showing compute DBUs taking about 65 percent of a Databricks bill, underlying cloud VMs 25 percent, and storage 10 percent.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot;&gt;&lt;figcaption&gt;Illustrative: a typical split of monthly Databricks spend across compute DBUs, underlying cloud VMs, and storage. Data Today.&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;The good news is that the platform now bills its own usage into system tables you can query directly. The &lt;code&gt;system.billing.usage&lt;/code&gt; view in Unity Catalog records DBU consumption per workload, so you can attribute spend instead of guessing.&lt;/p&gt;
&lt;pre class=&quot;language-sql&quot; tabindex=&quot;0&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt;
  usage_metadata&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;job_id&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  sku_name&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token function&quot;&gt;SUM&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;usage_quantity&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; dbus
&lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; system&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;billing&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;usage&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;WHERE&lt;/span&gt; usage_date &lt;span class=&quot;token operator&quot;&gt;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;current_date&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;30&lt;/span&gt; DAYS
&lt;span class=&quot;token keyword&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;BY&lt;/span&gt; usage_metadata&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;job_id&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; sku_name
&lt;span class=&quot;token keyword&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;BY&lt;/span&gt; dbus &lt;span class=&quot;token keyword&quot;&gt;DESC&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That one query tells you which jobs are actually expensive, which is where any FinOps effort should start. We treat cost as its own recurring topic for exactly this reason.&lt;/p&gt;
&lt;h2 id=&quot;how-does-unity-catalog-change-governance&quot; tabindex=&quot;-1&quot;&gt;&lt;a class=&quot;header-anchor&quot; href=&quot;https://data-today.net/databricks/databricks-data-intelligence-platform/#how-does-unity-catalog-change-governance&quot;&gt;&lt;span&gt;How does Unity Catalog change governance?&lt;/span&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Unity Catalog is the platform&#39;s governance layer, and it is the piece that turns a pile of workspaces into one governed estate. It uses a &lt;strong&gt;three-level namespace&lt;/strong&gt;, &lt;code&gt;catalog.schema.table&lt;/code&gt;, so a fully qualified name like &lt;code&gt;prod.sales.orders&lt;/code&gt; means the same thing in every workspace attached to the metastore.&lt;/p&gt;
&lt;p&gt;Grants are standard SQL, which makes access reviews legible:&lt;/p&gt;
&lt;pre class=&quot;language-sql&quot; tabindex=&quot;0&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;GRANT&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;TABLE&lt;/span&gt; prod&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;sales&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;orders &lt;span class=&quot;token keyword&quot;&gt;TO&lt;/span&gt; &lt;span class=&quot;token identifier&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;`&lt;/span&gt;analysts&lt;span class=&quot;token punctuation&quot;&gt;`&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Beyond plain grants, Unity Catalog tracks &lt;strong&gt;column and table lineage&lt;/strong&gt; automatically, enforces row filters and column masks, and powers Delta Sharing for handing governed data to another organization without copying it. If you run more than one workspace, getting the catalog design right early is the decision that ages best, well before you tune a single query.&lt;/p&gt;
&lt;h2 id=&quot;where-does-mosaic-ai-fit&quot; tabindex=&quot;-1&quot;&gt;&lt;a class=&quot;header-anchor&quot; href=&quot;https://data-today.net/databricks/databricks-data-intelligence-platform/#where-does-mosaic-ai-fit&quot;&gt;&lt;span&gt;Where does Mosaic AI fit?&lt;/span&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Mosaic AI is Databricks&#39; name for the AI features layered across the platform: &lt;strong&gt;Model Serving&lt;/strong&gt; for hosting models behind an endpoint, &lt;strong&gt;Vector Search&lt;/strong&gt; for retrieval, agent tooling like Agent Bricks, and SQL-native AI functions. The last of these is the easiest on-ramp, because you can call a model straight from a query:&lt;/p&gt;
&lt;pre class=&quot;language-sql&quot; tabindex=&quot;0&quot;&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;SELECT&lt;/span&gt; ai_query&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;token string&quot;&gt;&#39;databricks-meta-llama-3-3-70b-instruct&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;token string&quot;&gt;&#39;Summarize this ticket: &#39;&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;||&lt;/span&gt; ticket_body
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;AS&lt;/span&gt; summary
&lt;span class=&quot;token keyword&quot;&gt;FROM&lt;/span&gt; support&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;tickets
&lt;span class=&quot;token keyword&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The honest read is that this is the fastest-moving and least settled part of the platform, which is precisely why it deserves sceptical coverage rather than hype. The practical stance for a builder is to let these functions accelerate well-bounded tasks, summarization, extraction, classification, while keeping a human reviewing anything that touches a customer or a production write. We will track each Mosaic AI capability as it ships and judge whether it is genuinely production-ready or still a demo.&lt;/p&gt;
&lt;h2 id=&quot;what-should-you-do-with-this&quot; tabindex=&quot;-1&quot;&gt;&lt;a class=&quot;header-anchor&quot; href=&quot;https://data-today.net/databricks/databricks-data-intelligence-platform/#what-should-you-do-with-this&quot;&gt;&lt;span&gt;What should you do with this?&lt;/span&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;If you are evaluating or running the Data Intelligence Platform, a few principles travel well:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Match compute to the workload.&lt;/strong&gt; Job clusters for jobs, SQL warehouses for BI, serverless where the startup latency hurts. Never run production on an all-purpose cluster by default.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Set auto-termination everywhere.&lt;/strong&gt; Idle compute is the most common line of waste, and it bills twice.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Query &lt;code&gt;system.billing.usage&lt;/code&gt;.&lt;/strong&gt; Attribute DBUs to jobs before you try to optimize anything.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Design Unity Catalog early.&lt;/strong&gt; The three-level namespace and lineage pay off most when set before the estate sprawls.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adopt Mosaic AI deliberately.&lt;/strong&gt; Use it where review is cheap; gate it where mistakes are expensive.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This guide is the foundation. From here the studio goes deeper on each piece: Lakeflow pipelines and ingestion, Databricks SQL warehouses and materialized views, Unity Catalog governance patterns, the compute and FinOps habits that keep DBUs in check, and the Mosaic AI features as they land. The platform is moving quickly, and the goal here is the same as everywhere on Data Today: tell you what actually changed and what it means for the thing you are building.&lt;/p&gt;
&lt;h2 id=&quot;sources&quot; tabindex=&quot;-1&quot;&gt;&lt;a class=&quot;header-anchor&quot; href=&quot;https://data-today.net/databricks/databricks-data-intelligence-platform/#sources&quot;&gt;&lt;span&gt;Sources&lt;/span&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.databricks.com/&quot;&gt;Databricks Data Intelligence Platform documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.databricks.com/aws/en/admin/system-tables/billing&quot;&gt;Databricks system tables for billing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.databricks.com/aws/en/feed.xml&quot;&gt;Databricks release notes feed&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content>
  </entry>
</feed>