Small but Mighty AI

Venture

Booking.com

77% of enterprise AI usage are using models that are small models, less than 13b parameters.

Databricks, in their annual State of Data + AI report, published this survey which among other interesting findings indicated that large models, those with 100 billion perimeters or more now represent about 15% of implementations.
In August, we asked enterprise buyers What Has Your GPU Done for You Today? They expressed concern with the ROI of using some of the larger models, particularly in production applications.

Pricing from a popular inference provider shows the geometric increase in prices as a function of parameters for a model.1
But there are other reasons aside from cost to use smaller models.
First, their performance has improved markedly with some of the smaller models nearing their big brothers’ success. The delta in cost means smaller models can be run several times to verify like an AI Mechanical Turk.

Second, the latencies of smaller models are half those of the medium sized models & 70% less than the mega models .

Llama Model
Observed Latency per Token2

7b
18 ms

13b
21 ms

70b
47 ms

405b
70-750 ms

Higher latency is an inferior user experience. Users don’t like to wait.
Smaller models represent a significant innovation for enterprises where they can take advantage of similar performance at two orders of magnitude, less expense and half of the latency.
No wonder builders view them as small but mighty.

1Note: I’ve abstracted away the additional dimension of mixture of experts models to make the point clearer.
2There are different ways of measuring latency, whether it’s time to first token or inter-token latency.
—————
Boost Internet Speed–
Free Business Hosting–
Free Email Account–
Dropcatch–
Free Secure Email–
Secure Email–
Cheap VOIP Calls–
Free Hosting–
Boost Inflight Wifi–
Premium Domains–
Free Domains

Share on Social Media

Small but Mighty AI

extreme

Catch Up

BoA Air France KLM Credit Card Review (2025.3 Update: 70k Offer)

Barclays Wyndham Earner Credit Card Review (2025.3 Update: 60k Offer)

Barclays Wyndham Earner Plus Credit Card Review (2025.3 Update: 90k Offer)

Barclays Wyndham Earner Business Credit Card Review (2025.3 Update: 75k Offer)

AmEx Hilton Aspire Credit Card Review (2025.2 Update: 175k Offer)

AmEx Hilton Surpass Credit Card Review (2025.2 Update: 130k+1FN Offer)

Bank of Japan holds rates at 0.25%, yen weakens

Chinese self-driving trucking company pivots to generative AI for video games

Fed cuts by a quarter point, indicates fewer reductions ahead

Why the stock market was so disappointed by the Fed Wednesday

Trump joins Elon Musk in opposing House GOP’s government funding bill

Expectations on Fed cuts were the lethal blow to markets

Bank of Dave 2: The Loan Ranger review – Rory Kinnear files a solid return as the bloke from Burnley

Climate change driving demand for predatory loans, research shows

UK’s credit unions face uncertain future amid cost of living crisis

Trump retreats from 50% tariffs on Canadian metals. Here’s what comes next.

Stitch Fix’s stock rallies after styling service boosts full-year sales forecast

The anywhere-but-America trade has been working. But there are limits.

Verizon and AT&T are seeing stock drops. Here’s what’s driving the selloff.

Oracle won some big cloud contracts. Here’s why its stock is falling.

This cinema stock is a ‘safe bet in a volatile market’ — and it’s not AMC or Cinemark

Positioning Startups in the Age of AI

How Much Is A Venture Firm Worth?

Why War & Peace Is Killing Your Data Budget

A Founder’s Guide: Essential Management Advice for Startups

Lopsided AI Revenues

Four Marketing Principles That Redefine Markets from Klaviyo’s Former CMO

Cool Links

Categories

Archives

Social

extreme

Catch Up

Related Posts