A Series of Unfortunate Decisions

Venture

Booking.com

When a person asks a question of an LLM, the LLM responds. But there’s a good chance of an some error in the answer. Depending on the model or the question, it could be a 10% chance or 20% or much higher.
The inaccuracy could be a hallucination (a fabricated answer) or a wrong answer or a partially correct answer.

So a person can enter in many different types of questions & receive many different types of answers, some of which are correct & some of which are not.
In this chart, the arrow out of the LLM represents a correct answer. Askew arrows represent errors.
Today, when we use LLMs, most of the time a human checks the output after every step. But startups are pushing the limits of these models by asking them to chain work.
Imagine I ask an LLM-chain to make a presentation about the best cars to buy for a family of 5 people. First, I ask for a list of those cars, then I ask for a slide on the cost, another on fuel economy, yet another on color selection.
The AI must plan what to do at each step. It starts with finding the car names. Then it searches the web, or its memory, for the data necessary, then it creates each slide.

As AI chains these calls together the universe of potential outcomes explodes.
If at the first step, the LLM errs : it finds 4 cars that exist, 1 car that is hallucinated, & a boat, then the remaining effort is wasted. The error compounds from the first step & the deck is useless.
As we build more complex workloads, managing errors will become a critical part of building products.
Design patterns for this are early. I imagine it this way :

At the end of every step, another model validates the output of the AI. Perhaps this is a classical ML classifier that checks the output of the LLM. It could also be an adversarial network (a GAN) that tries to find errors in the output.
The effectiveness of the overall chained AI system will be dependent on minimizing the error rate at each step. Otherwise, AI systems will make a series of unfortunate decisions & its work won’t be very useful.
—————
Boost Internet Speed–
Free Business Hosting–
Free Email Account–
Dropcatch–
Free Secure Email–
Secure Email–
Cheap VOIP Calls–
Free Hosting–
Boost Inflight Wifi–
Premium Domains–
Free Domains

Share on Social Media

A Series of Unfortunate Decisions

extreme

Catch Up

Chase Freedom Unlimited® (CFU) Review (2024.11 Update: $250 or Double Cashback Offer)

BoA Alaska Credit Card Review (2024.11 Update: 75k Offer)

United Gateway℠ Card Review (2024.11 Update: 30k Offer)

United℠ Explorer Card Review (2024.11 Update: 60k Offer)

United Quest℠ Card Review (2024.11 Update: 70k Offer)

United Club℠ Infinite Card Review (2024.11 Update: 90k Offer)

Nvidia nearly doubles revenue on strong AI demand

Nvidia says it will sell more of its next-generation Blackwell chips than previously anticipated

The 10-year Treasury yield is looming as a potential anchor on stocks

Comcast’s cable network spinoff may be a signal to the media industry for necessary change

Billionaire Gautam Adani charged in New York with massive fraud, bribery scheme

Walmart hits new highs and Target dives after earnings as rivals diverge even more

Climate change driving demand for predatory loans, research shows

UK’s credit unions face uncertain future amid cost of living crisis

S&P 500 is on a 12-month tear as tariffs threaten 2025 outlook for U.S. stocks

Nvidia continues its march to a $4 trillion stock-market valuation. Why it’s inevitable.

Qualcomm is moving fast to boost growth and trim its dependence on Apple

Hoping for an OpenAI or Anthropic IPO? Here’s why that’s unlikely anytime soon.

Investors can look ahead to Nvidia, Treasury bonds and a bullish finish to 2024

Trump’s Treasury pick is Scott Bessent. He’s said it’s absurd to fear ‘Trump-flation.’

Theory Two

My Little Library

75 Cents per Month

Small but Mighty AI

The Post Election Surge is Unevenly Distributed

I Talk to Robots While Driving

Cool Links

Categories

Archives

Social

extreme

Catch Up

Related Posts