Top Themes in Data in 2025
There are two opposing forces in the world of data: an overall consolidation within the modern data stack & a massive expansion driven by AI capabilities. AI is rewriting every rule about what’s possible with data in 2025.
Here are Theory’s Top Themes in Data in 2025 with the full presentation at the bottom.
The Great Consolidation. After a decade of expanding complexity in the modern data stack, companies are looking to dramatically simplify their architectures to drive better results. Buyers we speak to say, “Do not sell me another tool.”
As a result, we are seeing consolidation on individual cloud data warehousing platforms Snowflake & Databricks, where most enterprises have picked their dominant architecture. There is also a wave of consolidation within BI favoring collaborative BI tools that balance centralized & decentralized control like Omni. There is a race to command more & more compute within these consolidated platforms because the majority of revenue & ultimately profits resides there.
The office of the CFO continues to apply pressure to drive more ROI on core data & AI, which is a surprise given 2010-2022 data budgets grew unabated & the fervent interest in AI. In particular, this has pressured the cloud data warehouses, & customers are looking for novel architectures to dramatically reduce cloud data warehouse spend where possible. 50% cost savings are possible with newer transformation architectures like SQLMesh / Tobiko Data.
Scale-Up Architectures. New cloud data storage formats are increasing in importance although slower than expected in terms of adoption because they lack the relevant enterprise tools. However, in the long term, this creates a rise in workload-specific query engines like MotherDuck & Datafusion.
These query engines are typically scale up rather than scale out. This means developers can start on their local machines & take advantage of the phenomenal computing power of their MacBooks to handle all the vast majority of workloads.
Agentic Data. If the IT department in the future is the HR department for AI agents, we should expect that data will be transformed just as much as every other team. Historically, there has been a divide between software engineering & AI/ML teams, & that will change. Many of the software design principles, like virtual environments, will come to data.
We should expect the vast majority of SQL queries to be executed by AI. To make these queries accurate, data modeling becomes an absolutely essential technology to eliminate hallucinations & guarantee quality. In addition, data observability tools like Monte Carlo will become increasingly important as data not only feeds the core BI & analytics layers but also the AI systems that are key parts of every production application, both internal & external.
That AI should drive some pretty significant efficiencies that parallel the 25% to 50% productivity gains across Google, Microsoft, & ServiceNow, or the $275 million in cost savings of Amazon migrating from one version of Java to another. This should free budgets for new initiatives.
Smaller models will dominate within the enterprise with the sweet spot somewhere between 10 billion to 70 billion parameters, because of a 600X difference in inference cost & very similar levels of accuracy. If the wave of innovation from DeepSeek is any indication, there’s significantly better performance & a continued deflation & inference cost over the short & intermediate term.
Also on a fun note, this presentation was the one I synthesized using AI. If you have feedback, let me know, & the full AI speaker notes are available here.
—————
Boost Internet Speed–
Free Business Hosting–
Free Email Account–
Dropcatch–
Free Secure Email–
Secure Email–
Cheap VOIP Calls–
Free Hosting–
Boost Inflight Wifi–
Premium Domains–
Free Domains