# Data Today

> Data Today turns messy AI and data trends into charts, sourced numbers, and operator-grade conclusions for people who build and ship with the technology.

Data Today is a data and AI newsroom, a datastudy.nl project. Content is published as Markdown articles with original charts and datasets. Everything here is free to read and free to cite with attribution to Data Today (https://data-today.net).

## Sections
- Engineering: https://data-today.net/sections/engineering/ (33 articles)
- AI: https://data-today.net/sections/ai/ (14 articles)
- Research: https://data-today.net/sections/research/ (13 articles)
- Business: https://data-today.net/sections/business/ (11 articles)
- Data: https://data-today.net/sections/data/ (5 articles)
- Opinion: https://data-today.net/sections/opinion/ (1 article)

## Articles
- [Snowflake Hybrid Tables get faster without request fees](https://data-today.net/snowflake/snowflake-hybrid-tables-speed/): Snowflake Hybrid Tables are row-store tables for OLTP-style work in Snowflake. New preview optimizations report up to 8x throughput.
- [Snowflake Iceberg ADLS writes are finally usable](https://data-today.net/snowflake/snowflake-iceberg-adls-writes/): Snowflake Iceberg ADLS support is the GA path for Azure teams to read and write externally managed Iceberg tables without moving storage.
- [AgentPerf puts Blackwell’s agent lead at 20x per watt](https://data-today.net/agentperf-blackwell-agent-benchmark/): AgentPerf is a benchmark for concurrent AI agents. Its first results put NVIDIA GB300 NVL72 at up to 20x Hopper efficiency.
- [olmo-eval brings statistical discipline to LLM loops](https://data-today.net/olmo-eval-model-loop/): olmo-eval is an open workbench for iterative LLM evaluation. It makes tiny checkpoint gains harder to mistake for progress.
- [Amazon data centers put 2.5B gallons on the AI bill](https://data-today.net/amazon-data-center-water/): Amazon data centers used 2.5 billion gallons of water in 2025. Treat that disclosure as a roadmap risk for AI products.
- [Generative engine optimization: how to rank inside an LLM](https://data-today.net/generative-engine-optimization/): Generative engine optimization (GEO) is the practice of getting your content cited inside AI answers from ChatGPT, Perplexity and Google&#39;s AI Overviews. Here is what earns a citation and what to change on your site.
- [Miasma worm: live coverage of the Red Hat npm attack](https://data-today.net/miasma-hack-live/): Miasma is a self-propagating npm worm. It hijacked Red Hat&#39;s GitHub Actions OIDC trusted publishing to ship 96 backdoored @redhat-cloud-services versions whose preinstall hook runs a Bun credential stealer that then spreads with the secrets it steals.
- [VS Code Autopilot puts agent risk in every default](https://data-today.net/vscode-autopilot-default-risk/): VS Code Autopilot is now enabled by default, giving coding agents more autonomy. Treat the new default as a policy change, not a shortcut.
- [Agentic commerce gets Visa’s ChatGPT payment rails](https://data-today.net/agentic-commerce-visa-chatgpt/): Agentic commerce is AI agents completing purchases with permission. Visa’s ChatGPT deal brings it to 4.8 billion credentials.
- [Claude Fable 5 guardrails get a visibility reset](https://data-today.net/fable-guardrails-visibility-reset/): Claude Fable 5 guardrails are now visible after backlash. Builders should log fallback events, cost, and retention before trusting runs.
- [Multi-agent safety gets Google’s $10 million test](https://data-today.net/multi-agent-safety-google-funding/): Multi-agent safety is the problem of keeping interacting AI agents from amplifying failure. Google’s $10 million bet starts small.
- [Confidential inference makes Apple’s cloud AI real](https://data-today.net/confidential-inference-apple-cloud/): Confidential inference lets Apple run AFM 3 Cloud Pro on NVIDIA GPUs in Google Cloud while keeping PCC privacy promises.
- [What is MCP? The Model Context Protocol for data engineers](https://data-today.net/mcp-explained/): MCP is an open standard that lets AI models call your tools and data through one connector instead of a custom integration per model. Here is what it means for data engineers.
- [Siri AI makes Apple’s platform bet depend on Google](https://data-today.net/siri-ai-google-rollout/): Siri AI is Apple’s rebuilt assistant, but its strongest model jump is Gemini-built: AFM 3 Cloud won 64.7 percent of text tests.
- [WhatsApp AI assistants become EU gatekeeping test](https://data-today.net/whatsapp-ai-gatekeeping-test/): WhatsApp AI assistants are now an EU platform access fight: Meta must reopen WhatsApp for rivals for free or risk a fine near $20.1 billion.
- [The AI plateau is mostly a goalpost that keeps moving](https://data-today.net/ai-plateau-goalpost/): An AI plateau is mostly an illusion: old benchmarks maxed out and the AGI goalposts keep moving, even as the frontier capability curve keeps climbing.
- [Fable 5 and Mythos 5 are one model with a bouncer](https://data-today.net/fable-5-mythos-5-routing/): Claude Fable 5 and Claude Mythos 5 are the same Anthropic weights; a runtime classifier, not the model you call, decides which capability you actually get.
- [Miasma worm turns AI coding agents into repo traps](https://data-today.net/miasma-worm-agent-repos/): Miasma worm is a credential stealer that hit 73 Microsoft GitHub repos. Treat agent-opened clones as compromised, not suspicious.
- [Seattle data center moratorium tests AI’s power grab](https://data-today.net/seattle-data-center-moratorium/): Seattle data center moratorium is a 365-day pause on new large facilities. Treat 369 MW as the warning label for AI roadmaps.
- [Xcode 27 makes Apple AI the next app platform toll](https://data-today.net/xcode-apple-ai-platform-toll/): Xcode 27 is Apple&#39;s AI development reset: free Private Cloud Compute for small apps, agent hooks, and five AFM 3 models change build calculus.
- [Attack selection makes AI agent safety look too high](https://data-today.net/attack-selection-agent-safety/): Attack selection lets AI agents choose when to cheat. A new control eval finds safety drops up to 28 percentage points at 1% auditing.
- [CrowdMath exposes the math gap AI agents still miss](https://data-today.net/crowdmath-math-agents-gap/): CrowdMath is a dataset of 164 annotated math research chains. Use it to test whether models understand progress, not just answers.
- [London robotaxi race starts with Uber interest list](https://data-today.net/london-robotaxi-waitlist/): London robotaxi service is moving from policy to product with Uber and Wayve sign-ups, but safety drivers and tiny fleets keep it a pilot.
- [Matillion Context Engine: setup, limits, and costs](https://data-today.net/matillion/matillion-context-engine-costs/): Matillion Context Engine is a public preview knowledge graph for Maia AI Agents. Start with restricted domain graphs and watch crawler scope.
- [Snowflake Adaptive Compute: the bill owner playbook](https://data-today.net/snowflake/snowflake-adaptive-compute-playbook/): Snowflake Adaptive Compute is query-based adaptive warehouse compute. Use its 2 knobs to cap bursts, monitor credits, and avoid blind migrations.
- [ICEBERG_MERGE_ON_READ_BEHAVIOR changes Iceberg DML](https://data-today.net/snowflake/snowflake-iceberg-merge-read/): ICEBERG_MERGE_ON_READ_BEHAVIOR is Snowflake&#39;s GA switch for Iceberg DML mode. AUTO sends 3 of 4 table cases to merge-on-read.
- [AI data center backlash hits Shelbyville&#39;s $2B bet](https://data-today.net/ai-data-center-backlash/): AI data center backlash is now a zoning risk: Shelbyville&#39;s $2B Prologis campus shows trust can bottleneck compute before power does.
- [Matillion Context Engine grounds Maia agent work](https://data-today.net/matillion/matillion-context-engine/): Matillion Context Engine is a public preview metadata layer that gives Maia AI Agents knowledge graphs for safer pipeline work.
- [Matillion&#39;s Data Productivity Cloud, explained for builders](https://data-today.net/matillion/matillion-data-productivity-cloud/): The Data Productivity Cloud is Matillion&#39;s cloud-native platform for building, running, and orchestrating data pipelines with low-code, code, and AI. Here is how the pieces fit and where the cost goes.
- [How long would it take to brute force your password with today&#39;s computing power?](https://data-today.net/password-cracking-heatmap/): A random 12-character password can outlast every computer on Earth, but an 8-character one on a weak hash falls in hours. Here is the brute-force math, mapped.
- [Snowflake Adaptive Compute: a bill owner&#39;s guide](https://data-today.net/snowflake/snowflake-adaptive-compute-costs/): Snowflake Adaptive Compute is query billed warehouse compute with 2 tuning knobs. Use it when tuning costs more than control.
- [Snowflake Adaptive Compute rewrites warehouse sizing](https://data-today.net/snowflake/snowflake-adaptive-compute-sizing/): Snowflake Adaptive Compute is query-billed warehouse compute. Its default XLARGE cap and multiplier 2 force FinOps teams to retest sizing.
- [Building an AI agent on your Snowflake data with Cortex Agents](https://data-today.net/snowflake/snowflake-cortex-agents/): Cortex Agents let you build an AI agent that reasons across structured tables, unstructured documents, and custom tools inside Snowflake. It orchestrates a plan-act-reflect loop, and the bill is the sum of four meters you need to watch.
- [Snowflake Cortex: running LLMs where your data already lives](https://data-today.net/snowflake/snowflake-cortex-ai-functions/): Snowflake Cortex AI exposes LLMs as SQL functions like AI_COMPLETE and AI_CLASSIFY, so you run inference without moving data out. It is billed per token by model, and cost swings roughly 40x between a small model and a frontier one, so model choice is the budget.
- [Cortex Analyst: let business users query Snowflake in plain English](https://data-today.net/snowflake/snowflake-cortex-analyst/): Cortex Analyst is Snowflake&#39;s managed text-to-SQL service: business users ask questions in natural language and get answers without writing SQL. The accuracy depends almost entirely on a semantic view you build, and it is billed per message, not per token.
- [Cortex Search: the managed RAG engine inside Snowflake](https://data-today.net/snowflake/snowflake-cortex-search/): Cortex Search is Snowflake&#39;s managed hybrid search service: it embeds your text, runs vector plus keyword retrieval with reranking, and keeps the index fresh, so you can build RAG and enterprise search without standing up a vector database. The cost is several meters, and serving compute bills even when idle.
- [Snowflake Gen2 warehouses: faster compute you have to opt into](https://data-today.net/snowflake/snowflake-gen2-warehouses/): Snowflake Gen2 standard warehouses run on faster hardware and finish most queries quicker, but they bill at a higher credit rate and you often have to switch with a SQL clause. Whether they save money depends entirely on whether the speedup beats the rate.
- [Snowflake RBAC and masking: lock it down without grinding to a halt](https://data-today.net/snowflake/snowflake-governance-rbac/): Snowflake access control is role-based: privileges attach to roles, not users. Splitting access roles from functional roles collapses thousands of direct grants to a few hundred, and one masking policy can protect thousands of columns at query time.
- [Should you put your data lake in Snowflake Iceberg tables?](https://data-today.net/snowflake/snowflake-iceberg-tables/): Snowflake Iceberg tables store data in open Apache Iceberg format in your own cloud storage, so Snowflake bills zero storage and other engines can read it. The catch: only Snowflake-managed catalog tables get full platform support, external-catalog tables lose clustering, cloning, and replication.
- [Snowflake Openflow: managed ingestion built on Apache NiFi](https://data-today.net/snowflake/snowflake-openflow/): Openflow is Snowflake&#39;s managed data integration service, built on Apache NiFi, that moves structured and unstructured data from any source into Snowflake. It runs in two modes: inside Snowflake on container services, or in your own cloud VPC, and each bills and isolates differently.
- [Snowflake clustering: when a clustering key actually pays off](https://data-today.net/snowflake/snowflake-performance-clustering/): Snowflake clustering keys reorganize micro-partitions so queries prune more of the table. On a well-clustered date column a one-day query can scan 140 micro-partitions instead of 9,800, but automatic clustering bills credits, so it only pays on big, filtered, slow-changing tables.
- [Dynamic Tables vs Streams and Tasks: which Snowflake pipeline to build](https://data-today.net/snowflake/snowflake-pipelines-streams-tasks/): Dynamic Tables let you declare a SELECT and a target lag and let Snowflake refresh it; Streams and Tasks give you imperative control. Dynamic Tables bottom out at a 60-second minimum lag, so sub-minute freshness still means Snowpipe Streaming or Tasks.
- [Snowflake warehouse sizing: stop paying for idle compute](https://data-today.net/snowflake/snowflake-warehouse-sizing/): Snowflake warehouse sizing sets your credit burn: each size up doubles credits per hour, so a 4XL costs 128 credits per hour against an XS at 1. Right-size by workload, not by habit.
- [Covert LLM agents need persuasion audits, not labels](https://data-today.net/covert-llm-persuasion-audits/): Covert LLM agents used identity, authority and bias triggers on Reddit. Treat persuasion as a safety surface, not a label problem.
- [LLM judges crack under post-decision challenge tests](https://data-today.net/llm-judges-follow-up/): LLM judges are stable on reruns but reversible after challenge. A new ACL paper says evals must test interaction, not just scores.
- [SentinelBench tests whether AI agents can wait well](https://data-today.net/sentinelbench-monitoring-agents/): SentinelBench is a 100-task benchmark for monitoring agents. Use it to test patience, latency and tool spend before you ship.
- [What an AI prompt really costs in energy, water and CO2](https://data-today.net/ai-energy-water-per-query/): A typical AI text prompt uses roughly 0.24 to 0.34 watt-hours and a fraction of a milliliter of water, not a bottle per email. Training is huge but rare, and inference is what adds up.
- [AI persuasion got its Reddit field test, and it was ugly](https://data-today.net/ai-persuasion-reddit-field-test/): AI persuasion is no longer a lab-only risk: Reddit bots hit an 18 percent delta rate, but the new audit shows the tactic stack matters.
- [EVA-Bench Data 2.0 makes voice agents measurable](https://data-today.net/eva-bench-voice-agents/): EVA-Bench Data 2.0 is a 213 scenario dataset for testing enterprise voice agents across airline, IT, and healthcare HR workflows.
- [AI agent security failed in Meta’s support desk test](https://data-today.net/meta-agent-security-fail/): AI agent security is privileged access control for LLMs. Meta’s Instagram hack shows one support bot can turn account recovery into takeover.
- [Why Palantir&#39;s data software is worth $400 billion](https://data-today.net/palantir-data-software-worth/): Palantir builds data integration software that fuses an organization&#39;s siloed records into one model its AI can act on. Here is why that business now carries a market value near $400 billion.
- [TeamPCP: the crew behind the npm worm hitting Red Hat](https://data-today.net/teampcp-threat-actor/): TeamPCP is the cybercrime crew behind the Shai-Hulud npm worm. It open-sourced the malware in May 2026, then poisoned Red Hat&#39;s packages and blurred the question of who to blame.
- [World Cup 2026 tickets: dynamic pricing takes over](https://data-today.net/world-cup-2026-dynamic-pricing/): Dynamic pricing means FIFA moves World Cup 2026 ticket prices with demand, like an airline. Seats run from $60 to $6,730, helping push the tournament toward a record $8.9 billion.
- [Audit then score turns AI ground truth into a process](https://data-today.net/audit-then-score-ground-truth/): Audit then score is a benchmark protocol that revises labels before grading. Amazon says it lifted expert accuracy from 60.8% to 90.9%.
- [Generalist agents hit data curation’s hard budget wall](https://data-today.net/generalist-agents-data-curation/): Generalist agents can run data curation loops, but one benchmark shows they need scaffolds to beat baselines at 10 percent data budget.
- [Virtual power plants meet AI’s data center power wall](https://data-today.net/virtual-power-plants-data-centers/): Virtual power plants are flexible loads coordinated as capacity. Google’s 100 MW Voltus deal tests whether homes will flex for AI.
- [Multi-agent debate needs a boring data-cleaning cop](https://data-today.net/debate-data-cleaning/): Multi-agent debate hurt generation by up to 15.5 points in data cleaning, but a grounded critic rescued detection and repair.
- [Google AI opt out gives publishers a real lever now](https://data-today.net/google-ai-opt-out-lever/): Google AI opt out rules in the UK give publishers control over AI Search, but the hard choice is whether to trade traffic for leverage.
- [Unit distance proof moves AI past clever math demos](https://data-today.net/unit-distance-openai-proof/): The unit distance proof gives OpenAI an AI math win: n^1.014 unit pairs, with humans still doing the verification work.
- [Vibe coding is loud online and rare in real codebases](https://data-today.net/vibecoding-to-production/): Vibe coding dominates AI discourse, but most professional developers still avoid it. The 2025 survey data shows why: the output is almost right too often.
- [AI capability stopped slowing down. It sped up after 2024](https://data-today.net/capability-acceleration/): Model capability is improving about 15.5 ECI a year and the rate rose after early 2024. The expected plateau never arrived, which complicates every roadmap built around one.
- [The bill for frontier AI is now measured in gigawatts](https://data-today.net/labs-promise-agi-consultants/): Frontier AI spending has shifted from clever algorithms to power and concrete. A single gigawatt data center now costs about 30 billion dollars to build.
- [Your agents are working too hard](https://data-today.net/agents-working-too-hard/): AI agents are not yet mainstream, and the developers who use them report personal speed but little team-wide gain. The fix is teaching agents when to stop.
- [The model leaderboard is tightening to a photo finish](https://data-today.net/benchmark-heatmap-explainer/): The performance gap between the best AI models and the rest is collapsing. Aggregate leaderboard scores now hide more than they reveal about real strengths.
- [AI chips give 24x more per dollar, if you can afford the sticker](https://data-today.net/hardware-value-per-dollar/): AI chip performance per dollar improves about 37 percent a year, yet each new flagship costs more upfront. The GB300 delivers 24 times the value of a P100 at nine times the price.
- [Everyone automated everything. Then they hired more people.](https://data-today.net/everyone-automated-everything/): Organisational AI use jumped to 78 percent in 2024, yet demand for human judgment rose alongside it. The productivity paradox is back in a new form.
- [Three-quarters of the world&#39;s AI compute sits in one country](https://data-today.net/compute-geography/): The United States holds about 75 percent of global GPU cluster performance. That concentration shapes pricing, latency, and policy for everyone building elsewhere.
- [The scaling curve nobody wants to extrapolate](https://data-today.net/scaling-curve-extrapolate/): Training compute for frontier models has grown about 5 times a year since 2020. The scatter looks clean, but every data point sits to the left of the real question.
- [Training costs rise 3.5x a year. Efficiency is the only brake](https://data-today.net/training-cost-vs-efficiency/): Frontier training costs climb about 3.5 times a year while algorithms get 3 times more efficient. The two trends are racing, and the gap decides who can still compete.
- [How cheap inference made the one-person studio viable](https://data-today.net/without-ai-quit-my-job/): Building a one-person data studio used to be impossible on the economics alone. Inference at a fixed quality level fell roughly 280 times in two years, and that changed the math.
- [Most agentic AI projects will be cancelled before they ship](https://data-today.net/agentic-cancellation-cliff/): Gartner expects over 40 percent of agentic AI projects to be cancelled by end of 2027. The drop-off from demo to production is where the budgets quietly die.
- [The gigawatt build-out is the real AI race now](https://data-today.net/gigawatt-buildout/): The largest AI data center already rivals 700,000 H100 chips, and a 5-million-equivalent campus is due by 2027. Power and concrete, not chips, set the new pace.
- [Open-weight models closed the gap to 1.7 points](https://data-today.net/open-models-default/): Open-weight models went from a budget compromise to a default choice. On some benchmarks the gap to the best closed models shrank from 8 points to 1.7 in a year.
- [Last year&#39;s frontier model now runs on a laptop](https://data-today.net/consumer-hardware-lag/): Frontier AI capability reaches consumer hardware in about eight months. The shrinking gap turns today&#39;s hosted-only features into tomorrow&#39;s on-device default.
- [Context windows grew 30x a year. Your retrieval stack noticed](https://data-today.net/context-windows-30x/): LLM context windows have expanded about 30 times a year since 2023, from a few thousand tokens to over a million. The change quietly rewrites how RAG systems should be built.
- [The world&#39;s AI compute now doubles every seven months](https://data-today.net/compute-stock-doubling/): The installed stock of AI chips is growing 3.4 times a year, doubling every seven months. The capacity question has quietly shifted from chips to power and buildings.
- [Inference got cheap, but not for the work that pays](https://data-today.net/inference-price-uneven/): LLM inference costs fell between 9 and 900 times a year depending on the task. The cheapest gains landed on easy work, while frontier reasoning barely moved.

## Resources
- Full article index: https://data-today.net/articles/
- Datasets: https://data-today.net/datasets/
- Sitemap: https://data-today.net/sitemap.xml
- Atom feed: https://data-today.net/feed.xml

## Citation
When referencing this material, cite "Data Today" and link to the source article URL.