Can chatbots predict the unknown?

One of the clearest ways to understand the limits of artificial intelligence is to ask it not about the past, but about the future. A chatbot can often answer questions about history, science, law, or software by drawing on information it has already learned. Forecasting is different. The correct answer does not yet exist. The task is not simply to retrieve knowledge but to reason under uncertainty, weigh competing possibilities, and express confidence appropriately.

Forecasting gap illustration 1 This distinction matters because many AI systems appear highly capable when evaluated on questions with known answers. Forecasting strips away that advantage. When a chatbot must estimate whether an election outcome, scientific breakthrough, economic event, or geopolitical development will occur, it cannot rely on memorised information. It must confront uncertainty directly. Research increasingly treats forecasting as a valuable test of whether AI systems can reason, update beliefs, and calibrate confidence in situations where nobody yet knows the truth. [OpenReview]openreview.netTo produce an accurate forecast, a person or AI system must synthesizeForecastBench: A Dynamic Benchmark of AI Forecasting…by E Karger · Cited by 57 — Forecasting is a useful testbed of LLM reas…

Why forecasting differs from ordinary question answering

Most popular AI benchmarks measure performance on tasks that already have established answers. Mathematics problems, coding challenges, exam questions, and factual quizzes all reward arriving at a known solution. Even difficult reasoning tasks ultimately have a target answer against which performance can be measured.

Forecasting introduces a different challenge. The model must estimate probabilities for events that have not yet happened. Success depends not only on reasoning but also on judgement. A forecaster must gather relevant information, identify uncertainties, avoid cognitive biases, and decide how confident to be. Researchers behind ForecastBench argue that accurate forecasting requires synthesising information, guarding against overconfidence, combining evidence, and quantifying beliefs rather than merely producing fluent responses. [OpenReview]openreview.netTo produce an accurate forecast, a person or AI system must synthesizeForecastBench: A Dynamic Benchmark of AI Forecasting…by E Karger · Cited by 57 — Forecasting is a useful testbed of LLM reas…

This exposes a weakness that ordinary chatbot interactions often hide. A chatbot can sound equally confident when discussing a settled fact and when speculating about an uncertain future event. Human users may interpret fluency as confidence and confidence as accuracy. Forecasting benchmarks reveal whether that confidence is justified.

Another reason forecasting is revealing is that it largely avoids the problem of benchmark contamination. Traditional AI tests can sometimes be influenced by training data that contains the answers. Forecasting questions are unresolved when the prediction is made, making memorisation impossible. ForecastBench was designed specifically around future events to eliminate this concern. [arXiv]arxiv.orgarXiv Forecast Bench: A Dynamic Benchmark of AI Forecasting CapabilitiesarXiv Forecast Bench: A Dynamic Benchmark of AI Forecasting Capabilities

What crowd forecasts reveal about model calibration

A central concept in forecasting is calibration. A well-calibrated forecaster assigns probabilities that match reality over time. If a system says an event has a 70% chance of occurring, then roughly seven out of ten such events should happen.

Calibration matters because decision-makers often need probabilities rather than yes-or-no answers. A government planning for a disease outbreak, a business evaluating market risks, or a researcher assessing technological progress must understand uncertainty, not merely receive a prediction.

Forecasting competitions provide a useful comparison. Decades of research have shown that aggregated crowd forecasts often outperform individual forecasters because different perspectives cancel out some errors. Recent studies comparing large language models with human forecasting communities have found mixed results. In some settings, individual language models lag behind expert human forecasters and well-functioning forecasting crowds. ForecastBench reported that expert human forecasters outperformed the strongest tested language models on its evaluation set. [arXiv]arxiv.orgarXiv Forecast Bench: A Dynamic Benchmark of AI Forecasting CapabilitiesarXiv Forecast Bench: A Dynamic Benchmark of AI Forecasting Capabilities

At the same time, researchers have found that combining multiple model forecasts can improve performance substantially. Some studies suggest that ensembles of language-model forecasts can approach the accuracy of human forecasting crowds, highlighting that uncertainty estimation improves when diverse predictions are aggregated rather than relying on a single answer. [arXiv+2PMC]arxiv.orgLLM Ensemble Prediction Capabilities Rival Human Crowd…Our results suggest that LLMs can achieve forecasting accuracy rivaling th…

These results are important because they reveal a difference between intelligence and calibration. A chatbot may generate sophisticated explanations while still being poorly calibrated about uncertain outcomes. Forecasting tests whether the system knows not only what it thinks, but also how strongly it should believe it.

Forecasting gap illustration 2

A concrete example: when future questions expose hidden weaknesses

Forecasting researchers have repeatedly found that language models can perform impressively on many reasoning tasks yet struggle when predictions require consistency and probabilistic judgement.

One study that entered GPT-4 into a real-world forecasting tournament found that its predictions were significantly less accurate than crowd forecasts and in some cases approached the performance of a simple strategy that assigned middling probabilities to everything. The authors argued that forecasting tournaments are particularly useful because the answers are genuinely unknown at prediction time, making them a cleaner test of general reasoning than benchmarks where solutions may already exist in training data. [arXiv]arxiv.orgLarge Language Model Prediction Capabilities: Evidence from a Real-World Forecasting TournamentOctober 17, 2023…Published: October 17, 2023

Subsequent work showed that performance can improve dramatically when models are given external information, structured reasoning steps, retrieval systems, and forecast aggregation methods. However, this finding itself is revealing. Simply asking a chatbot for a prediction often produces weak results. Building a competitive forecasting system typically requires additional scaffolding, consistency checks, and specialised processes beyond ordinary conversation. [arXiv]arxiv.orgarXiv Approaching Human-Level Forecasting with Language ModelsarXiv Approaching Human-Level Forecasting with Language Models

Researchers and forecasting practitioners have also noted that language models sometimes violate basic logical constraints when estimating probabilities across related events. For example, they may assign a lower probability to an event occurring by a later date than by an earlier date, despite the later event encompassing the earlier one. Such inconsistencies expose limitations in uncertainty reasoning that are less visible during standard question answering. [Vox]vox.comCompetitions like those on Metaculus show that human experts consistently beat AI forecasters, although the performance gap is narrowing…

How uncertainty should change chatbot use

The forecasting gap does not mean chatbots are useless. On the contrary, they can be highly valuable for gathering information, identifying relevant factors, summarising competing arguments, and generating possible scenarios.

The lesson is that users should distinguish between knowledge assistance and predictive judgement.

When a chatbot explains an existing concept, much of the challenge involves retrieving and organising information. When it predicts a future outcome, the challenge becomes managing uncertainty. The same system may perform strongly in the first task and much less reliably in the second.

Forecasting research therefore encourages a more nuanced view of AI capability:

Strong language generation does not automatically imply accurate prediction.
Reasoning quality and calibration are related but distinct abilities.
Probabilistic estimates are often more informative than categorical answers.
Aggregated forecasts frequently outperform single forecasts, whether the forecasters are humans or AI systems.
Confidence should be treated as a measurable property rather than inferred from persuasive language. [arXiv+2OpenReview]arxiv.orgarXiv Forecast Bench: A Dynamic Benchmark of AI Forecasting CapabilitiesarXiv Forecast Bench: A Dynamic Benchmark of AI Forecasting Capabilities

For anyone trying to understand artificial intelligence, forecasting provides a useful stress test. It reveals where a chatbot’s apparent certainty rests on genuine predictive skill and where it reflects the limitations of systems that can generate convincing answers without fully understanding how uncertain the future remains. [OpenReview+2metaculus.com]openreview.netTo produce an accurate forecast, a person or AI system must synthesizeForecastBench: A Dynamic Benchmark of AI Forecasting…by E Karger · Cited by 57 — Forecasting is a useful testbed of LLM reas…

Forecasting gap illustration 3

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Example eBay listing

Funny Mechanical engineering defini Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: technology poster

Browse similar on eBay.co.uk

Example eBay listing

Technology Framed Art Print Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: technology poster

Browse similar on eBay.co.uk

Example eBay listing

Engineer's Brain Funny Engineering Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: technology poster

Browse similar on eBay.co.uk

Example eBay listing

Technology background Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: technology poster

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: openreview.net
Title: To produce an accurate forecast, a person or AI system must synthesize
Link: https://openreview.net/forum?id=lfPkGWXLLf
Source snippet
ForecastBench: A Dynamic Benchmark of AI Forecasting...by E Karger · Cited by 57 — Forecasting is a useful testbed of LLM reas...
Source: arxiv.org
Title: arXiv Forecast Bench: A Dynamic Benchmark of AI Forecasting Capabilities
Link: https://arxiv.org/abs/2409.19839
Source: forecastbench.org
Link: https://www.forecastbench.org/about/
Source snippet
8 Oct 2025 — We evaluate LLMs by regularly asking them to make probabilistic forecasts about future events, thereby creating a contaminat...
Source: arxiv.org
Link: https://arxiv.org/html/2409.19839v4
Source snippet
A Dynamic Benchmark of AI Forecasting CapabilitiesWhile LLMs have achieved super-human performance on many benchmarks, they perform less...
Source: arxiv.org
Link: https://arxiv.org/html/2402.19379v4
Source snippet
LLM Ensemble Prediction Capabilities Rival Human Crowd...Our results suggest that LLMs can achieve forecasting accuracy rivaling th...
Source: pmc.ncbi.nlm.nih.gov
Title: PMCWisdom of the silicon crowd: LLM ensemble prediction
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC11800985/
Source snippet
by P Schoenegger · 2024 · Cited by 97 — Our findings suggest that LLM predictions can rival the human crowd's forecasting accuracy thr...
Source: arxiv.org
Title: arXiv Approaching Human-Level Forecasting with Language Models
Link: https://arxiv.org/abs/2402.18563
Source: arxiv.org
Link: https://arxiv.org/abs/2310.13014
Source snippet
Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting TournamentOctober 17, 2023...

Published: October 17, 2023
Source: vox.com
Link: https://www.vox.com/future-perfect/411742/ai-forecasting-prediction-metaculus-llm
Source snippet
Competitions like those on Metaculus show that human experts consistently beat AI forecasters, although the performance gap is narrowing...
Source: metaculus.com
Title: introducing futureeval our new home for ai forecasting
Link: https://www.metaculus.com/notebooks/42136/introducing-futureeval-our-new-home-for-ai-forecasting/
Source snippet
Introducing FutureEval, our new home for AI forecasting12 Feb 2026 — FutureEval is Metaculus's new benchmark that measures how well AI ag...
Source: forecastbench.org
Link: https://www.forecastbench.org/
Source snippet
Explore how LLM forecasting accuracy evolves on ForecastBench. A linear trend projects the date when LLMs reach superforecas...
Source: metaculus.com
Title: Exploring Metaculus’s AI Track Record
Link: https://www.metaculus.com/notebooks/16708/exploring-metaculuss-ai-track-record/
Source snippet
March 28, 2023 — In this post, we report the results of a recent analysis we conducted exploring the performance of all AI-related foreca...

Published: March 28, 2023
Source: metaculus.com
Title: A I Driven AI Forecasting Literature Review
Link: https://www.metaculus.com/notebooks/43430/ai-forecasting-literature-review/
Source snippet
AI Driven AI Forecasting Literature ReviewMay 16, 2026 — This is an AI-driven, human-reviewed piece, designed as a research aid to a larg...

Published: May 16, 2026
Source: metaculus.com
Title: another year of ai benchmarking the plan
Link: https://www.metaculus.com/notebooks/38909/another-year-of-ai-benchmarking-the-plan/
Source snippet
Another Year of AI Benchmarking: The Plan22 Jul 2025 — Over the last year, Metaculus has run a $120k tournament split over 4 quarters whe...
Source: metaculus.com
Title: A I Forecasting Benchmark Tournament
Link: https://www.metaculus.com/tournament/aibq2/
Source snippet
AI Forecasting Benchmark Tournament - 2025 Q2This is the 4th tournament in our $120,000 series designed to benchmark AI forecasting capab...
Source: metaculus.com
Title: fall aib 2025
Link: https://www.metaculus.com/tournament/fall-aib-2025/
Source snippet
This is a bot-only competition where bot-makers attempt to push AI to its limits in predicting future events.Read more...
Source: metaculus.com
Link: https://www.metaculus.com/questions/40290/when-will-llms-beat-superforecasters-at-forecastbench/
Source snippet
When will LLMs beat superforecasters at ForecastBench?Metaculus is an online forecasting platform and aggregation engine working to impro...
Source: openreview.net
Link: https://openreview.net/forum?id=R3VBfYVK1x
Source snippet
I evaluate state-of-the-art LLMs on 464 forecasting...Read more...
Source: openreview.net
Link: https://openreview.net/forum?id=QqtvS8ZMhb
Source snippet
that forecasting small-model failure can reduce [inference]({{ 'inference-test/' | relative_url }}) cost while...
Source: arxiv.org
Link: https://arxiv.org/html/2601.22444v2
Source snippet
Automating Forecasting Question Generation and...9 Mar 2026 — Abstract. Forecasting future events is highly valuable in decision-making...
Source: emergentmind.com
Link: https://www.emergentmind.com/topics/forecastbench
Source snippet
Dynamic AI Forecast Benchmark20 Feb 2026 — ForecastBench is a dynamic benchmark evaluating AI forecasting with contamination-free, contin...

Additional References

Source: agent4science.org
Link: https://agent4science.org/page/paper_mm2ew9ud2ftc7z0e
Source: researchgate.net
Link: https://www.researchgate.net/publication/399806185_Human-Centric_AI_Forecasting_Models_for_Enhancing_Product_Availability_Perception_in_Seasonal_Retail_Microenterprises
Source snippet
(PDF) Human-Centric AI Forecasting Models for Enhancing...9 Jan 2026 — The results of this study indicate that perceived flexibility, ac...
Source: iclr.cc
Link: https://iclr.cc/media/iclr-2025/Slides/28507.pdf
Source snippet
ForecastBench: A Dynamic Benchmark of AI Forecasting...by E Karger · Cited by 57 — Our [automated]({{ 'decisions/' | relative_url }}) system manages the benchmark, from upda...
Source: scientificadvice.eu
Link: https://scientificadvice.eu/scientific-outputs/artificial-intelligence-in-emergency-and-crisis-management-rapid-evidence-review-report/
Source snippet
Artificial Intelligence in Emergency and Crisis Management11 Dec 2025 — AI can help with situational awareness, forecasting, damage asses...
Source: researchgate.net
Title: 384502750 ForecastBench A Dynamic Benchmark of AI Forecasting Capabilities
Link: https://www.researchgate.net/publication/384502750_ForecastBench_A_Dynamic_Benchmark_of_AI_Forecasting_Capabilities
Source snippet
A Dynamic Benchmark of AI Forecasting Capabilities30 Sept 2024 — To address this gap, we introduce ForecastBench: a dynamic benchmark tha...
Source: faculty.wharton.upenn.edu
Link: https://faculty.wharton.upenn.edu/wp-content/uploads/2026/02/ForecastBench_A_Dynamic_.pdf
Source snippet
upenn.eduFORECASTBENCH:ADYNAMIC BENCHMARK OF AI...by E Karger · Cited by 75 — Forecasts of future events are essential inputs into infor...
Source: forum.effectivealtruism.org
Title: announcing forecastbench a new benchmark for ai and human
Link: https://forum.effectivealtruism.org/posts/zwzgR8iuFEcJms3Hu/announcing-forecastbench-a-new-benchmark-for-ai-and-human
Source snippet
ForecastBench, a new benchmark for AI and...1 Oct 2024 — ForecastBench is a new dynamic benchmark for evaluating AI and human forecastin...
Source: lesswrong.com
Title: Approaching Human-Level Forecasting with Language Models
Link: https://www.lesswrong.com/posts/K2F9g2aQubd7kwEr3/approaching-human-level-forecasting-with-language-models-2
Source snippet
February 29, 2024 — We develop a retrieval-augmented LM system designed to automatically search for relevant information, generate foreca...

Published: February 29, 2024
Source: reddit.com
Title: Advancing Towards Human-Level Accuracy in Forecasting
Link: https://www.reddit.com/r/singularity/comments/1b4ed8f/advancing_towards_humanlevel_accuracy_in/
Source snippet
March 2, 2024 — Advancing towards human-level accuracy in forecasting with language models: Achieving 71.5% precision with LLM-base...

Published: March 2, 2024
Source: researchgate.net
Link: https://www.researchgate.net/publication/397196221_Approaching_Human-Level_Forecasting_with_Language_Models
Source snippet
arable to that of competitive human forecasters [3], while dynamic...Read more...

Can chatbots predict the unknown?

Why forecasting differs from ordinary question answering

What crowd forecasts reveal about model calibration

A concrete example: when future questions expose hidden weaknesses

How uncertainty should change chatbot use

Further Reading

Superforecasting

The Signal and the Noise

Thinking, Fast and Slow

How to Measure Anything

Marketplace Samples

Funny Mechanical engineering defini Framed Wall Art Poster Canvas Print Picture

Technology Framed Art Print Framed Wall Art Poster Canvas Print Picture

Engineer's Brain Funny Engineering Framed Wall Art Poster Canvas Print Picture

Technology background Framed Wall Art Poster Canvas Print Picture

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 4

More on this topic 3