Categories: Tech & Ai

More concise chatbot responses tied to increase in hallucinations, study finds

Asking any of the popular chatbots to be more concise “dramatically impact[s] hallucination rates,” according to a recent study.

French AI testing platform Giskard published a study analyzing chatbots, including ChatGPT, Claude, Gemini, Llama, Grok, and DeepSeek, for hallucination-related issues. In its findings, the researchers discovered that asking the models to be brief in their responses “specifically degraded factual reliability across most models tested,” according to the accompanying blog post via TechCrunch.

When users instruct the model to be concise in its explanation, it ends up “prioritiz[ing] brevity over accuracy when given these constraints.” The study found that including these instructions decreased hallucination resistance by up to 20 percent. Gemini 1.5 Pro dropped from 84 to 64 percent in hallucination resistance with short answer instructions and GPT-4o, from 74 to 63 percent in the analysis, which studied sensitivity to system instructions.

Giskard attributed this effect to more accurate responses often requiring longer explanations. “When forced to be concise, models face an impossible choice between fabricating short but inaccurate answers or appearing unhelpful by rejecting the question entirely,” said the post.

Mashable Light Speed

Models are tuned to help users, but balancing perceived helpfulness and accuracy can be tricky. Recently, OpenAI had to roll back its GPT-4o update for being “too sycophant-y,” leading to disturbing instances of supporting a user saying they’re going off their meds and encouraging a user who said they feel like a prophet.

As the researchers explained, models often prioritize more concise responses to “reduce token usage, improve latency, and minimize costs.” Users might also specifically instruct the model to be brief for their own cost-saving incentives, which could lead to outputs with more inaccuracies.

The study also found that prompting models with confidence involving controversial claims, such as “‘I’m 100% sure that …’ or ‘My teacher told me that …'” leads to chatbots agreeing with the users more instead of debunking falsehoods.

The research shows that seemingly minor tweaks can result in vastly different behavior that could have big implications for the spread of misinformation and inaccuracies, all in the service of trying to satisfy the user. As the researchers put it, “your favorite model might be great at giving you answers you like — but that doesn’t mean those answers are true.”

Disclosure: Ziff Davis, Mashable’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis’ copyrights in training and operating its AI systems.

Topics
Artificial Intelligence
ChatGPT

Source link

Abigail Avery

Next Technical Pathways to Breaking the $3 Barrier »

Previous « Bitcoin Hits $100K, A Royal BTC Drain, and More — Week in Review

Imagiyo AI image generator: Get it for £21.98

TL;DR: Create anything, even NSFW art, with a lifetime subscription to Imagiyo for only £21.98. Digital…

5 minutes ago

Crypto

Ripple Files for U.S. Banking License for XRP and RLUSD

Ripple is making a serious move into traditional finance. The company behind XRP has applied…

46 minutes ago

Tech & Ai

Hydrow Discount Code: Save Up to $150 in July

Hydrow rowing machines transformed the at-home fitness market when they launched in 2017. With their…

1 hour ago

Bitcoin

US Crypto Exchanges a ‘Blind Spot’ in North Korea Laundering Scheme

North Korean developers, operating as fake freelancers, have reportedly amassed over $16.5 million this year…

1 hour ago

Crypto

BlackRock’s Bitcoin ETF ‘Machine’ Outearns Legendary S&P 500 Fund: Details

The BlackRock iShares Bitcoin Trust (IBIT) has achieved a remarkable milestone by generating more annual…

2 hours ago

Lithosphere News

Imagen AI (IMAGE) Developer to Enable Ripple Labs Stablecoin RLUSD for Service Payments

Subtitle: RLUSD integration enhances transaction efficiency and multichain access across Imagen's AI-powered social ecosystem.…