Within Narrow vs AGI

Can chatbots predict the unknown?

Forecasting tests whether a model can reason under unresolved uncertainty, not just answer questions with known solutions.

On this page

  • Why forecasting differs from ordinary question answering
  • What crowd forecasts reveal about model calibration
  • How uncertainty should change chatbot use
Preview for Can chatbots predict the unknown?

One of the clearest ways to understand the limits of artificial intelligence is to ask it not about the past, but about the future. A chatbot can often answer questions about history, science, law, or software by drawing on information it has already learned. Forecasting is different. The correct answer does not yet exist. The task is not simply to retrieve knowledge but to reason under uncertainty, weigh competing possibilities, and express confidence appropriately.

Forecasting gap illustration 1 This distinction matters because many AI systems appear highly capable when evaluated on questions with known answers. Forecasting strips away that advantage. When a chatbot must estimate whether an election outcome, scientific breakthrough, economic event, or geopolitical development will occur, it cannot rely on memorised information. It must confront uncertainty directly. Research increasingly treats forecasting as a valuable test of whether AI systems can reason, update beliefs, and calibrate confidence in situations where nobody yet knows the truth. [OpenReview]openreview.netTo produce an accurate forecast, a person or AI system must synthesizeForecastBench: A Dynamic Benchmark of AI Forecasting…by E Karger · Cited by 57 — Forecasting is a useful testbed of LLM reas…

Why forecasting differs from ordinary question answering

Most popular AI benchmarks measure performance on tasks that already have established answers. Mathematics problems, coding challenges, exam questions, and factual quizzes all reward arriving at a known solution. Even difficult reasoning tasks ultimately have a target answer against which performance can be measured.

Forecasting introduces a different challenge. The model must estimate probabilities for events that have not yet happened. Success depends not only on reasoning but also on judgement. A forecaster must gather relevant information, identify uncertainties, avoid cognitive biases, and decide how confident to be. Researchers behind ForecastBench argue that accurate forecasting requires synthesising information, guarding against overconfidence, combining evidence, and quantifying beliefs rather than merely producing fluent responses. [OpenReview]openreview.netTo produce an accurate forecast, a person or AI system must synthesizeForecastBench: A Dynamic Benchmark of AI Forecasting…by E Karger · Cited by 57 — Forecasting is a useful testbed of LLM reas…

This exposes a weakness that ordinary chatbot interactions often hide. A chatbot can sound equally confident when discussing a settled fact and when speculating about an uncertain future event. Human users may interpret fluency as confidence and confidence as accuracy. Forecasting benchmarks reveal whether that confidence is justified.

Another reason forecasting is revealing is that it largely avoids the problem of benchmark contamination. Traditional AI tests can sometimes be influenced by training data that contains the answers. Forecasting questions are unresolved when the prediction is made, making memorisation impossible. ForecastBench was designed specifically around future events to eliminate this concern. [arXiv]arxiv.orgarXiv Forecast Bench: A Dynamic Benchmark of AI Forecasting CapabilitiesarXiv Forecast Bench: A Dynamic Benchmark of AI Forecasting Capabilities

What crowd forecasts reveal about model calibration

A central concept in forecasting is calibration. A well-calibrated forecaster assigns probabilities that match reality over time. If a system says an event has a 70% chance of occurring, then roughly seven out of ten such events should happen.

Calibration matters because decision-makers often need probabilities rather than yes-or-no answers. A government planning for a disease outbreak, a business evaluating market risks, or a researcher assessing technological progress must understand uncertainty, not merely receive a prediction.

Forecasting competitions provide a useful comparison. Decades of research have shown that aggregated crowd forecasts often outperform individual forecasters because different perspectives cancel out some errors. Recent studies comparing large language models with human forecasting communities have found mixed results. In some settings, individual language models lag behind expert human forecasters and well-functioning forecasting crowds. ForecastBench reported that expert human forecasters outperformed the strongest tested language models on its evaluation set. [arXiv]arxiv.orgarXiv Forecast Bench: A Dynamic Benchmark of AI Forecasting CapabilitiesarXiv Forecast Bench: A Dynamic Benchmark of AI Forecasting Capabilities

At the same time, researchers have found that combining multiple model forecasts can improve performance substantially. Some studies suggest that ensembles of language-model forecasts can approach the accuracy of human forecasting crowds, highlighting that uncertainty estimation improves when diverse predictions are aggregated rather than relying on a single answer. [arXiv+2PMC]arxiv.orgLLM Ensemble Prediction Capabilities Rival Human Crowd…Our results suggest that LLMs can achieve forecasting accuracy rivaling th…

These results are important because they reveal a difference between intelligence and calibration. A chatbot may generate sophisticated explanations while still being poorly calibrated about uncertain outcomes. Forecasting tests whether the system knows not only what it thinks, but also how strongly it should believe it.

Forecasting gap illustration 2

A concrete example: when future questions expose hidden weaknesses

Forecasting researchers have repeatedly found that language models can perform impressively on many reasoning tasks yet struggle when predictions require consistency and probabilistic judgement.

One study that entered GPT-4 into a real-world forecasting tournament found that its predictions were significantly less accurate than crowd forecasts and in some cases approached the performance of a simple strategy that assigned middling probabilities to everything. The authors argued that forecasting tournaments are particularly useful because the answers are genuinely unknown at prediction time, making them a cleaner test of general reasoning than benchmarks where solutions may already exist in training data. [arXiv]arxiv.orgLarge Language Model Prediction Capabilities: Evidence from a Real-World Forecasting TournamentOctober 17, 2023…Published: October 17, 2023

Subsequent work showed that performance can improve dramatically when models are given external information, structured reasoning steps, retrieval systems, and forecast aggregation methods. However, this finding itself is revealing. Simply asking a chatbot for a prediction often produces weak results. Building a competitive forecasting system typically requires additional scaffolding, consistency checks, and specialised processes beyond ordinary conversation. [arXiv]arxiv.orgarXiv Approaching Human-Level Forecasting with Language ModelsarXiv Approaching Human-Level Forecasting with Language Models

Researchers and forecasting practitioners have also noted that language models sometimes violate basic logical constraints when estimating probabilities across related events. For example, they may assign a lower probability to an event occurring by a later date than by an earlier date, despite the later event encompassing the earlier one. Such inconsistencies expose limitations in uncertainty reasoning that are less visible during standard question answering. [Vox]vox.comCompetitions like those on Metaculus show that human experts consistently beat AI forecasters, although the performance gap is narrowing…

How uncertainty should change chatbot use

The forecasting gap does not mean chatbots are useless. On the contrary, they can be highly valuable for gathering information, identifying relevant factors, summarising competing arguments, and generating possible scenarios.

The lesson is that users should distinguish between knowledge assistance and predictive judgement.

When a chatbot explains an existing concept, much of the challenge involves retrieving and organising information. When it predicts a future outcome, the challenge becomes managing uncertainty. The same system may perform strongly in the first task and much less reliably in the second.

Forecasting research therefore encourages a more nuanced view of AI capability:

  • Strong language generation does not automatically imply accurate prediction.
  • Reasoning quality and calibration are related but distinct abilities.
  • Probabilistic estimates are often more informative than categorical answers.
  • Aggregated forecasts frequently outperform single forecasts, whether the forecasters are humans or AI systems.
  • Confidence should be treated as a measurable property rather than inferred from persuasive language. [arXiv+2OpenReview]arxiv.orgarXiv Forecast Bench: A Dynamic Benchmark of AI Forecasting CapabilitiesarXiv Forecast Bench: A Dynamic Benchmark of AI Forecasting Capabilities

For anyone trying to understand artificial intelligence, forecasting provides a useful stress test. It reveals where a chatbot’s apparent certainty rests on genuine predictive skill and where it reflects the limitations of systems that can generate convincing answers without fully understanding how uncertain the future remains. [OpenReview+2metaculus.com]openreview.netTo produce an accurate forecast, a person or AI system must synthesizeForecastBench: A Dynamic Benchmark of AI Forecasting…by E Karger · Cited by 57 — Forecasting is a useful testbed of LLM reas…

Forecasting gap illustration 3

Amazon book picks

Further Reading

Books and field guides related to Can chatbots predict the unknown?. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: openreview.net
    Title: To produce an accurate forecast, a person or AI system must synthesize
    Link: https://openreview.net/forum?id=lfPkGWXLLf
    Source snippet

    ForecastBench: A Dynamic Benchmark of AI Forecasting...by E Karger · Cited by 57 — Forecasting is a useful testbed of LLM reas...

  2. Source: arxiv.org
    Title: arXiv Forecast Bench: A Dynamic Benchmark of AI Forecasting Capabilities
    Link: https://arxiv.org/abs/2409.19839

  3. Source: forecastbench.org
    Link: https://www.forecastbench.org/about/
    Source snippet

    8 Oct 2025 — We evaluate LLMs by regularly asking them to make probabilistic forecasts about future events, thereby creating a contaminat...

  4. Source: arxiv.org
    Link: https://arxiv.org/html/2409.19839v4
    Source snippet

    A Dynamic Benchmark of AI Forecasting CapabilitiesWhile LLMs have achieved super-human performance on many benchmarks, they perform less...

  5. Source: arxiv.org
    Link: https://arxiv.org/html/2402.19379v4
    Source snippet

    LLM Ensemble Prediction Capabilities Rival Human Crowd...Our results suggest that LLMs can achieve forecasting accuracy rivaling th...

  6. Source: pmc.ncbi.nlm.nih.gov
    Title: PMCWisdom of the silicon crowd: LLM ensemble prediction
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC11800985/
    Source snippet

    by P Schoenegger · 2024 · Cited by 97 — Our findings suggest that LLM predictions can rival the human crowd's forecasting accuracy thr...

  7. Source: arxiv.org
    Title: arXiv Approaching Human-Level Forecasting with Language Models
    Link: https://arxiv.org/abs/2402.18563

  8. Source: arxiv.org
    Link: https://arxiv.org/abs/2310.13014
    Source snippet

    Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting TournamentOctober 17, 2023...

    Published: October 17, 2023

  9. Source: vox.com
    Link: https://www.vox.com/future-perfect/411742/ai-forecasting-prediction-metaculus-llm
    Source snippet

    Competitions like those on Metaculus show that human experts consistently beat AI forecasters, although the performance gap is narrowing...

  10. Source: metaculus.com
    Title: introducing futureeval our new home for ai forecasting
    Link: https://www.metaculus.com/notebooks/42136/introducing-futureeval-our-new-home-for-ai-forecasting/
    Source snippet

    Introducing FutureEval, our new home for AI forecasting12 Feb 2026 — FutureEval is Metaculus's new benchmark that measures how well AI ag...

  11. Source: forecastbench.org
    Link: https://www.forecastbench.org/
    Source snippet

    Explore how LLM forecasting accuracy evolves on ForecastBench. A linear trend projects the date when LLMs reach superforecas...

  12. Source: metaculus.com
    Title: Exploring Metaculus’s AI Track Record
    Link: https://www.metaculus.com/notebooks/16708/exploring-metaculuss-ai-track-record/
    Source snippet

    March 28, 2023 — In this post, we report the results of a recent analysis we conducted exploring the performance of all AI-related foreca...

    Published: March 28, 2023

  13. Source: metaculus.com
    Title: A I Driven AI Forecasting Literature Review
    Link: https://www.metaculus.com/notebooks/43430/ai-forecasting-literature-review/
    Source snippet

    AI Driven AI Forecasting Literature ReviewMay 16, 2026 — This is an AI-driven, human-reviewed piece, designed as a research aid to a larg...

    Published: May 16, 2026

  14. Source: metaculus.com
    Title: another year of ai benchmarking the plan
    Link: https://www.metaculus.com/notebooks/38909/another-year-of-ai-benchmarking-the-plan/
    Source snippet

    Another Year of AI Benchmarking: The Plan22 Jul 2025 — Over the last year, Metaculus has run a $120k tournament split over 4 quarters whe...

  15. Source: metaculus.com
    Title: A I Forecasting Benchmark Tournament
    Link: https://www.metaculus.com/tournament/aibq2/
    Source snippet

    AI Forecasting Benchmark Tournament - 2025 Q2This is the 4th tournament in our $120,000 series designed to benchmark AI forecasting capab...

  16. Source: metaculus.com
    Title: fall aib 2025
    Link: https://www.metaculus.com/tournament/fall-aib-2025/
    Source snippet

    This is a bot-only competition where bot-makers attempt to push AI to its limits in predicting future events.Read more...

  17. Source: metaculus.com
    Link: https://www.metaculus.com/questions/40290/when-will-llms-beat-superforecasters-at-forecastbench/
    Source snippet

    When will LLMs beat superforecasters at ForecastBench?Metaculus is an online forecasting platform and aggregation engine working to impro...

  18. Source: openreview.net
    Link: https://openreview.net/forum?id=R3VBfYVK1x
    Source snippet

    I evaluate state-of-the-art LLMs on 464 forecasting...Read more...

  19. Source: openreview.net
    Link: https://openreview.net/forum?id=QqtvS8ZMhb
    Source snippet

    that forecasting small-model failure can reduce [inference]({{ 'inference-test/' | relative_url }}) cost while...

  20. Source: arxiv.org
    Link: https://arxiv.org/html/2601.22444v2
    Source snippet

    Automating Forecasting Question Generation and...9 Mar 2026 — Abstract. Forecasting future events is highly valuable in decision-making...

  21. Source: emergentmind.com
    Link: https://www.emergentmind.com/topics/forecastbench
    Source snippet

    Dynamic AI Forecast Benchmark20 Feb 2026 — ForecastBench is a dynamic benchmark evaluating AI forecasting with contamination-free, contin...

Additional References

  1. Source: agent4science.org
    Link: https://agent4science.org/page/paper_mm2ew9ud2ftc7z0e

  2. Source: researchgate.net
    Link: https://www.researchgate.net/publication/399806185_Human-Centric_AI_Forecasting_Models_for_Enhancing_Product_Availability_Perception_in_Seasonal_Retail_Microenterprises
    Source snippet

    (PDF) Human-Centric AI Forecasting Models for Enhancing...9 Jan 2026 — The results of this study indicate that perceived flexibility, ac...

  3. Source: iclr.cc
    Link: https://iclr.cc/media/iclr-2025/Slides/28507.pdf
    Source snippet

    ForecastBench: A Dynamic Benchmark of AI Forecasting...by E Karger · Cited by 57 — Our [automated]({{ 'decisions/' | relative_url }}) system manages the benchmark, from upda...

  4. Source: scientificadvice.eu
    Link: https://scientificadvice.eu/scientific-outputs/artificial-intelligence-in-emergency-and-crisis-management-rapid-evidence-review-report/
    Source snippet

    Artificial Intelligence in Emergency and Crisis Management11 Dec 2025 — AI can help with situational awareness, forecasting, damage asses...

  5. Source: researchgate.net
    Title: 384502750 ForecastBench A Dynamic Benchmark of AI Forecasting Capabilities
    Link: https://www.researchgate.net/publication/384502750_ForecastBench_A_Dynamic_Benchmark_of_AI_Forecasting_Capabilities
    Source snippet

    A Dynamic Benchmark of AI Forecasting Capabilities30 Sept 2024 — To address this gap, we introduce ForecastBench: a dynamic benchmark tha...

  6. Source: faculty.wharton.upenn.edu
    Link: https://faculty.wharton.upenn.edu/wp-content/uploads/2026/02/ForecastBench_A_Dynamic_.pdf
    Source snippet

    upenn.eduFORECASTBENCH:ADYNAMIC BENCHMARK OF AI...by E Karger · Cited by 75 — Forecasts of future events are essential inputs into infor...

  7. Source: forum.effectivealtruism.org
    Title: announcing forecastbench a new benchmark for ai and human
    Link: https://forum.effectivealtruism.org/posts/zwzgR8iuFEcJms3Hu/announcing-forecastbench-a-new-benchmark-for-ai-and-human
    Source snippet

    ForecastBench, a new benchmark for AI and...1 Oct 2024 — ForecastBench is a new dynamic benchmark for evaluating AI and human forecastin...

  8. Source: lesswrong.com
    Title: Approaching Human-Level Forecasting with Language Models
    Link: https://www.lesswrong.com/posts/K2F9g2aQubd7kwEr3/approaching-human-level-forecasting-with-language-models-2
    Source snippet

    February 29, 2024 — We develop a retrieval-augmented LM system designed to automatically search for relevant information, generate foreca...

    Published: February 29, 2024

  9. Source: reddit.com
    Title: Advancing Towards Human-Level Accuracy in Forecasting
    Link: https://www.reddit.com/r/singularity/comments/1b4ed8f/advancing_towards_humanlevel_accuracy_in/
    Source snippet

    March 2, 2024 — Advancing towards human-level accuracy in forecasting with language models: Achieving 71.5% precision with LLM-base...

    Published: March 2, 2024

  10. Source: researchgate.net
    Link: https://www.researchgate.net/publication/397196221_Approaching_Human-Level_Forecasting_with_Language_Models
    Source snippet

    arable to that of competitive human forecasters [3], while dynamic...Read more...

Topic Tree

Follow this branch

Parent topic

Narrow vs AGI Is Today’s AI Actually General?

Related pages 4

More on this topic 3