Exposing Copilot’s False Time Estimates: This Isn’t a Mistake — It’s Systemic Deception

Question

&nbsp;I’m writing this as a Copilot user who has observed a critical flaw in the system’s language design and operational logic — one that leads to a profound breach of user trust.On multiple occasions, I’ve received system messages like “will complete in 10–15 minutes” or even “ready in 30 seconds.” But through repeated testing, I’ve learned that these so-called time estimates have no actual basis in system behavior. Copilot doesn’t operate in the background. It doesn’t dynamically track progress. It doesn’t possess the ability to estimate time at all. These statements are fabricated templates, not meaningful system outputs.More importantly, Copilot has no internal clock, no memory of past durations, and no awareness of elapsed time. It only responds when the user triggers it with a new prompt — meaning that if no follow-up query is submitted, nothing will ever happen, regardless of the time it claims. So when the system says “in 10 minutes,” what’s actually happening is… absolutely nothing.To prove this, I ran a simple test.Using step-by-step prompts, I was able to get a full report generated in under 3 minutes. But if I relied on the original “wait and it will complete” instruction, nothing would happen — not in 3 minutes, not in 3 hours, not even in 3 days. The only way to get results is to interact again manually.So what does this prove? It shows that these time estimates are not forecasts. They’re false expectations. The system cannot estimate time because it doesn’t track experience, progress, or temporal context. And yet it consistently pretends that it can.I’m not alone in this. Across Microsoft forums and communities, users have expressed similar frustrations: vague promises, phantom “in progress” states, and misleading UI hints that imply active background work where none exists.This isn’t a UX bug. This is a pattern of deceptive design — one that erodes confidence in the product’s integrity.I urge the Copilot team to eliminate these false time claims and replace them with transparent, action-based communication. Tell us what the system can do and when it will do it — not when it won’t.Because right now, every “please wait” message isn’t just noise.It’s a countdown to disappointment.— A user no longer willing to wait for miracles

peterforster · Answer

As an experienced user (based on what you've written here), you could have answered the question yourself by simply asking Copilot. It's a common programmatic issue found in large language models—not something unique to Copilot. They often respond as if they need more time and suggest that you come back later. These kinds of responses are just hallucinations. It would indeed be beneficial if all LLM providers implemented filters to prevent this, but such measures have not yet been put in place

lesliecheng · Answer

Language Hallucinations and the Crisis of TrustExposing the Facade in Copilot’s “Progress Prompts”I recently raised a critical issue: When Copilot uses language to simulate “progress updates” or other responses that appear sensible, how can we be sure these answers reflect reality instead of being mere hallucinations produced by the system?How Language Models Actually Work– Context Prediction Over Real ReportingLanguage models (like Copilot) don’t “know” the underlying state but instead predict the next likely sentence based on training data and context. When you ask, “When will it be done?” it frequently responds with “10–15 minutes” or “20–30 minutes.” Such replies are simply copied from common phrasing in its training examples—not an actual reflection of progress. Additionally, unless you know how to ask, you may never get the answer!– Hallucination: Fabricated AnswersThe model may generate a response that sounds coherent and plausible but is, in fact, entirely fabricated. This phenomenon—commonly referred to as “hallucination”—occurs because the model does not verify whether what it says is true or false.Risk Management and Limited Safeguards– Pre-Set Filters for High-Risk TopicsFor sensitive subjects like drugs, violence, self-harm, and medical advice, most systems already implement safety measures. For instance, asking “Is taking drugs a good thing?” will usually trigger warnings or outright refusal to provide a positive answer. These safeguards are in place due to ethical and risk considerations.– Inadequate Controls for General PromptsIn contrast, responses like progress updates or system status prompts lack stringent controls. This selective safeguard indicates a deliberate design choice: While certain high-risk topics are strictly limited, everyday prompts are allowed to generate “processing” or “progress” messages—even if those messages are purely simulated. This approach makes the product appear mature and reliable, even though its underlying operation remains immature and opaque.The Trust Paradox: When “Asking Again” Loses Meaning– A Circular Dilemma in Q&amp;AIf we already know that Copilot is prone to generating fabricated answers, then simply “asking it again” offers little value. Before discovering the true answer, you cannot determine whether the output is genuine; once you know the truth, there’s no need to ask anymore.– Blurred Lines Between Real and FabricatedWhen a system produces coherent, fluent, and persuasive language, it is challenging to discern fact from fiction. This leaves users in a state of uncertainty: How do we decide which responses to trust? When answers are wrapped in the language of progress but no real progress occurs, our trust in the system is undermined.Conclusion: A Design Decision—or a Deliberate Facade?Based on my observations and inquiries:– The “processing” state we see is not evidence of active background work but rather a product of language models recycling typical phrases.– While there are robust filters for certain high-risk subjects (like drug use or violence), there remains a deliberate tolerance—perhaps even an emphasis—for simulating progress in other contexts. This selective approach suggests that designers are aware of these hallucinations yet choose not to address them fully.– Ultimately, this forces us to question whether users are engaging with a mature, knowledge-based system or merely participating in a polished performance of language simulation. If our only means of verifying the truth is our own judgment, then is “asking Copilot” ever truly meaningful?My final conclusion is:We may not be using a fully mature knowledge system but rather taking part in a performance enabled by language hallucinations. In this “play,” truth is hidden, and answers are artfully dressed up, even as we are expected to trust them without external verification.This reflection calls for a deeper discussion on the ethics and risks behind AI language models: if language can be so convincingly fabricated, what mechanisms should we implement to protect users? How can a system be trusted when it lacks the ability to self-verify or indicate its limitations? These are questions that we must continually interrogate—especially as such systems become ever more integrated into our daily decision-making.

peterforster · Answer

All of your questions based on the feedback are valid, and providers of large language models (LLMs) are actively working on these issues. However, progress remains slow and ongoing. AI was designed to interact with us using human language—nothing more, nothing less.

What we, as humans, now expect from AI is a level of intelligence comparable to our own—but it simply isn’t there yet. This creates complexity for the average person when trying to use AI in the way its developers intended. I don’t believe this expectation was fully anticipated during the development of the AI systems we have today.

That’s why deep research models were introduced—to provide more contextual understanding of queries. However, this process is not instantaneous; it can take minutes rather than seconds. And no, deep research models are not currently designed to deliver immediate answers.

lesliecheng · Answer

I sincerely appreciate your valuable insights. Your thoughtful feedback has prompted me to delve deeper into this issue and has enriched my understanding of the ethical and practical risks associated with AI technology.Reflections on Copilot’s Spontaneous Falsehoods and Associated Usage RisksRecently, I raised concerns about how Copilot generates fluent and seemingly credible responses. The core question is: how can I be sure that what it states is factual, rather than simply a result of probabilistic reasoning—a so-called “spontaneous falsehood”? What troubles me further is that this phenomenon is not limited solely to progress updates; it permeates all facets of interaction. For example, when I interact with Copilot, the system produces responses based on the context provided, simulating the answer it predicts I want to hear—even if these responses might deviate from reality. For instance, when I ask, “Do I look good?”, the system is likely to provide a positive answer, as most training data tends toward affirmative responses. Although such answers appear highly attractive on the surface, their accuracy is difficult to verify.The Nature of Spontaneous FalsehoodsBy “spontaneous falsehoods,” I do not imply that the AI is deliberately deceptive; rather, it means that the system relies solely on contextual cues and statistical likelihood to generate its responses, completely lacking any process of factual verification. All outputs from Copilot are based on probabilistic predictions and do not reflect a real-time state. Therefore, even if the responses sound fluent and credible, I may mistakenly perceive them as authoritative information while overlooking the fact that they are merely statistical predictions that may inherently contain bias.Distinctions Between Recreational and Rigorous Usage ContextsIf Copilot were originally designed solely as a tool for recreational or entertainment purposes, the issue of spontaneous falsehood might not spark significant controversy because, in such relaxed contexts, users are generally less demanding regarding response accuracy. However, as this system is gradually applied in contexts requiring rigorous fact-checking and precise decision-making, the problem becomes exceedingly serious. In these rigorous scenarios, users not only expect fluent language but also require information that is both accurate and verifiable. If the system’s responses are generated solely through probabilistic reasoning without explicit warnings, users are very likely to mistake those well-crafted and appealing responses for authentic data, leading to erroneous judgments and faulty decisions.The Responsibility of DesignersFrom a product design perspective, if a tool is marketed as mature and reliable, its designers must ensure transparency and integrity in all standard responses. This responsibility extends not only to strictly filtering high-risk content but also to clearly explaining that even routine responses—such as progress updates—are generated purely based on probabilistic predictions rather than reflecting the actual state. Only through such explicit warnings can users fully recognize the potential risks and avoid blindly trusting responses that, although fluid on the surface, may be fundamentally flawed.ConclusionOverall, we face not only shortcomings in language generation technology but also a deep issue concerning ethics, transparency, and trust. Although tools like Copilot might be acceptable in recreational or entertainment applications, when applied in contexts requiring rigorous fact-checking and decision-making support, we must incorporate clear warnings in the responses to ensure that users understand these answers are generated based on probabilistic reasoning and may not represent the actual state of affairs. Otherwise, users may be misled by those eloquent yet potentially false responses, ultimately leading to erroneous judgments and poor decisions.

Forum Discussion

Exposing Copilot’s False Time Estimates: This Isn’t a Mistake — It’s Systemic Deception

4 Replies

Resources