The Premise of a New S-Curve in AI

Venture

Booking.com

Since July, have you noticed how much better your AI model has become? Measuring them is hard to do. All we can do is quantify the vibe : is this one better than that one?
Elo is a score that measures how often one model wins against another, as judged by a human. Which model answers the prompt : “Describe the differences in texture between a Pink Lady and a Macoun apple” better? The one with the higher Elo score.1

In the last four months, the top 100 models have improved their Elo by about 60 points, with the top models now at 1339 vs 1287 in July.

The biggest performance gains occurred at the center part of the distribution. Researchers have driven significantly more performance with innovations in algorithms.

Model Size
Win Probability Increase (%)
Definition

Small
32.0%
< 10b parameters

Medium
22.4%
10b – 100b parameters

Large
29.6%
100 – 200b parameters

Mega
25.9%
200b+ parameters

The smallest models have increased performance most. October models have increased their win rates by nearly a third in four months. All of the models have improved their competitive win rates by more than 20%.

In July, we posed the question : what happens when model performance asymptotes? Progress in small, medium, & large models is linear in Elo-terms.
But the mega models show more data points of inflection, suggesting the recent innovations in reasoning & scale (the biggest models have grown from 200b parameters to more than 400b) have produced the beginning of a new high-growth S-curve.

1 See the Bradley-Terry model.
—————
Boost Internet Speed–
Free Business Hosting–
Free Email Account–
Dropcatch–
Free Secure Email–
Secure Email–
Cheap VOIP Calls–
Free Hosting–
Boost Inflight Wifi–
Premium Domains–
Free Domains

Share on Social Media

The Premise of a New S-Curve in AI

extreme

Catch Up

Best Credit Card Signup Bonus Offers in Jan 2025

IHG One Rewards Premier Credit Card Review (2025.1 Update: 5FN Offer)

World of Hyatt Credit Card Review (2025.1 Update: 35k Offer)

BoA Alaska Business Credit Card Review (2025.1 Update: 70k Offer)

PSA: Activate Your 5% Bonus Categories for 2025 Q1 Now!

Amex Business Platinum Card Review (2024.12 Update: 250k Offer; New Benefit $200 Hilton Credit)

Bank of Japan holds rates at 0.25%, yen weakens

Trump joins Elon Musk in opposing House GOP’s government funding bill

Expectations on Fed cuts were the lethal blow to markets

Chinese self-driving trucking company pivots to generative AI for video games

Fed cuts by a quarter point, indicates fewer reductions ahead

Why the stock market was so disappointed by the Fed Wednesday

Bank of Dave 2: The Loan Ranger review – Rory Kinnear files a solid return as the bloke from Burnley

Climate change driving demand for predatory loans, research shows

UK’s credit unions face uncertain future amid cost of living crisis

Deckers hikes its yearly outlook. Wall Street still has questions about demand.

How Apple pulled off a major feat that offset its big China miss on iPhones

Trump says Canada, Mexico tariffs are coming Saturday. Stocks dived, then rallied.

I’m 63 and tried claiming Social Security early, but it was declined because I’m still working. Is that allowed?

‘I have a new lease on life’: I sold my business for $130,000. It’s the first time I’m not living paycheck to paycheck. What should I do with this money?

‘I thought about filing for conservatorship’: I bought a home with my elderly parents. They reneged on their promise to sell their house and repay me. What now?

Top Themes in Data in 2025

Top Themes in Data Transcript

What DeepSeek’s Newest Model Means for AI

From Blank Canvas to a Brilliant Presentation with AI

Building GTM for AI : Office Hours with Maggie Hott

Tech’s $370B Paradox: Why Cash-Rich Giants Might Not Lead 2025’s M&A Race

Cool Links

Categories

Archives

Social

extreme

Catch Up

Related Posts