← Back to blog
Strategy

AI isn't hallucinating. It's mirroring how precisely you can phrase what's in your head

Ing. Lukáš DolejskýPublished 7 June 20269 min read

TL;DR

When AI gives you generic or off-target answers, the output usually isn't broken — the input is. You hold the full context in your head (defect type, batch, customer, history), then you send AI one vague sentence. AI can't guess what a senior colleague over coffee would read from your face. This isn't an AI problem. It's a problem of how precisely you can shape a thought before it leaves your head. AI is a mirror and the best trainer of precision you can have — because it won't excuse a vague question the way a human colleague will.

A few months ago I noticed a pattern. I'd ask the AI something about quality — the answer would be generic. I'd try again, more precisely — better, but still off. On the third attempt, when I described exactly what I was solving — the AI went straight to the point.

My first thought was: the model is hallucinating, or it has weak context for automotive. But when I broke it down step by step later, I noticed something different. The AI wasn't responding badly. It was responding precisely to what I had written — and what I had written wasn't what I had in my head.

This is a gap every quality engineer hits over their years in automotive. I just saw it sharply for the first time, because I had it in front of me in a text dialogue.

What's in your head vs what you write

In your head you have the full context. You know which part. You know which customer. You know which process. You know what changed. You know what you've already tried and ruled out. You know which audit finding triggered it. You know that your senior already said „this is going to be the same as Q3". You know everything.

You hand the AI: „I have a surface defect problem."

That sentence has nothing about defect type, material, customer, when it started, or what you've ruled out. The AI sees six words and has two options. Either it generates a general answer from the 5 most common causes of surface defects in automotive (which is exactly the generic output that frustrates you), or it asks you about context (which is exactly the flow you see with a guided AI).

In both cases the problem isn't the AI. The problem is that your external phrasing doesn't carry what you have internally in your head.

And this is exactly where a senior colleague next to you over coffee can help without you having to explain. They look you in the eye, they see your expression, they know you're working on the same case you mentioned last week. They fill in context from non-verbal cues, from the history of your conversations, from what they can see on the noticeboard behind you.

The AI doesn't have those channels. The AI sees only text. And only this specific text in this specific moment. If you don't bring context, context doesn't fill itself.

I first thought the AI was hallucinating. Then I realised it was doing exactly what I asked.

The first reaction when you get a weak answer is ego defence. The model is dumb. AI in automotive isn't there yet. This isn't for our segment. Something's wrong with the model.

This is the comfortable interpretation. The second interpretation, slightly less comfortable but usually true — the AI gave a precise answer. Your question just wasn't precise.

You can run the test yourself. Take your last question that „the AI didn't get" and give it to a senior quality engineer outside your company. Someone who doesn't know your process, your customers, your history. Count how many follow-up questions they ask before giving you a meaningful answer.

If they ask 5 or more — your AI needed them too. It just gave you a generic output instead, because it doesn't have a way to ask you back without you explicitly giving it permission to ask. A guided AI (like QualityOS) does this. A generic AI doesn't — it generates the most probable answer in the space of your vague question.

That isn't a hallucination. That's the optimal response under insufficient context.

This isn't an AI problem. This is a universal communication problem.

After a few weeks I noticed that this gap between the internal model and the external phrasing isn't specific to AI. It's everywhere.

Customer communication. You send the customer an email „we have a small problem with a dimensional characteristic, but it's under control". The customer comes back with three follow-up questions: which characteristics, how many parts, are any in their warehouse. In your head you knew it was nominal +0.05mm over tolerance, on 12 parts from batch 4521, and none went to the customer. But the email only says „small problem". Four-day ping pong.

Audit response. The auditor asks „describe your reaction plan for non-conformity". You answer „we have a reaction plan, it's in the documentation". The auditor: „which documentation, where is the RACI matrix, who is the decision authority on escalation". Another 30 minutes of cross-examination because you didn't bring concreteness up front.

8D D2 — problem description. The weakest point of most 8D documents. „Customer reported defect on housings." No quantity, batch, scorecard position, when detected, where detected. The D4 root cause then drifts, because the D2 didn't define scope.

PFMEA Failure Mode. „Surface defect." Fine as a tag, but as a Failure Mode in PFMEA it's a disaster — Severity can't be defended, Occurrence is a guess, Detection is speculation. Because the Failure Mode isn't concrete enough for the triplet to inform a risk priority number.

In all four cases it's the same mechanism. The internal model is rich. The external phrasing is poor. And the other side — auditor, customer, colleague, or AI — can only work with the external phrasing.

AI as a mirror, not a messenger

The interesting thing about AI is that it surfaces this mistake fast. People tolerate vague phrasing for a long time. The customer comes back with questions but doesn't make it explicit that your input was deficient. The auditor gives you credible feedback but rarely tells you „your answer was too general" — they just take a silent note. A colleague helps because they know you, but passes through without comment.

The AI doesn't have this discretion. You get some answer and either it helps or it doesn't. If it doesn't, that's not just lost time — it's feedback that says „your input wasn't precise enough".

This is where the key distinction sits. AI isn't a messenger. It's a mirror. When the answer doesn't match, the first thing to do isn't to correct the AI with another question. Go back to the original question and read it from the perspective of someone who doesn't see what's in your head. What question would they have? What would you want to add?

This exercise is called output-input reflection. The output tells you what was OK with the input — or wasn't. Use it.

Concrete techniques for better phrasing

In practical use of QualityOS I found 4 patterns that improve answer quality dramatically. They're simple and add 30 seconds. That's time you save on 3 follow-up iterations.

1. Name the object concretely. Not „the part", but „injection-moulded polycarbonate headlamp cover for VW MQB platform". Not „the audit finding", but „VDA 6.3 P6 audit finding on traceability on line 03". Not „the complaint", but „Renault complaint, 240 parts, month 2026-04, dimensional characteristic over upper tolerance". Concreteness makes 60 % of the difference.

2. Name the change concretely. Quality problems almost always have a trigger in time. New mould, new operator, new revision, new supplier, new contractual temperature. „It started after the mould revision on 2026-05-12" is 100× more valuable than „we've had a problem since May". The time anchor is half the analysis.

3. Name what you've already ruled out. This goes unsaid because we think it's redundant. But it's gold. „Operator ruled out — all three produce the same defect. Batch ruled out — defect across three batches. Paint booth filtration ruled out — measurements in range. Suspect electrostatics between press and paint booth." That's 5 hypotheses tested and 1 remaining. The AI with this context won't repeat „check the filters". It will go straight to the electrostatic hypothesis.

4. Name the customer or the standard. Automotive quality is customer-specific. „What's the PPAP requirement?" gets you AIAG-VDA general — generic. „What's the PPAP requirement for BMW production part, Level 3, current Q1 revision?" gets you BMW Customer Specifics directly. Names of customers and standards are anchors for concrete answers.

These 4 patterns affect answer quality more than any model improvement that will land in the next year. Not by using a different AI — by bringing a different question.

The side effect — a generally better engineer

After a few months of this discipline I noticed something unexpected. I started writing better emails to customers. Audit responses shortened, because the auditor didn't need so many follow-ups. 8D D2 statements got sharper. Team meeting agendas had clearer points.

It wasn't a planned side effect. It was a consequence of the habit of asking myself „if someone who can't see my process were reading this — would they understand what I'm solving?" — because that's exactly the question I needed to ask before every AI interaction. And once you ask it every day, it naturally bleeds into the other channels.

This is the unexpected benefit that AI-tool marketing never mentions. The pitch is speed, automation, efficiency. The real value for a senior quality engineer is often the discipline of phrasing, picked up as a side effect of daily use.

In HELLA-Forvia I had a 24-person team. Half were juniors. The weakest thing in their work wasn't a knowledge gap — it was a lack of precision in communication. „We have a problem" instead of „we have a non-conformity on characteristic X in process Y since date Z". AI trained this discipline in them faster than any mentor programme could.

The strategic thesis — AI as a trainer, not just a tool

The standard narrative of AI at work is „AI saves me time, does it for me faster". This is useful but poor.

The richer narrative I've seen work with quality engineers is „AI trains me to phrase things more clearly, because it won't excuse me when I can't". A human colleague excuses you. The auditor excuses you (and takes a note). The customer excuses you (and sends follow-up emails). The AI doesn't excuse you — you get a generic output and you see it in real time.

This feedback system is faster than any mentor role you could have. A senior colleague can correct you once a week. AI corrects you 30 times a day, calmly and without ego dynamics.

And after six months of this, your verbal discipline improves across every channel.

In QualityOS we've built this approach into the AI Assistant explicitly — it asks for containment before root cause, scope before hypothesis, evidence before conclusion. That means at the very first question it shapes a more precise answer by forcing a more precise input. But the principle holds in general, even with a different AI.

Next time you get a weak answer, don't ask „what's wrong with the AI". Ask „what was wrong with my question".

In automotive quality, that difference decides whether you dive into the problem in an hour, or in a week.

In my view AI is the best phrasing trainer you can have. And once you realise that, you stop fighting the AI and start using it as the mirror it is.

FAQ

Why does AI give me generic answers even when my problem is very specific?
Because the AI only sees what you wrote. It doesn't see your production process, doesn't see which customer you're handling, doesn't see what you've already tried. When you ask „I have a surface defect problem”, the AI has two options: either it answers generically (the 5 most common causes of surface defects), or it asks. A guided AI (like QualityOS) asks. A generic AI generates the average answer. The fix: give yourself 30 seconds before the question and add context — what exactly, on which process, since when, what you've already tried, which customer.
Is the AI really hallucinating or just reacting to a weak input?
Both happen, but in automotive quality practice 80 % of „hallucinations” are actually a correct response to an incorrect input. The AI fills in missing context with a guess — and that guess is off for your specific case. Example: you ask „What's the customer-specific PPAP for BMW?” and the AI assumes BMW Production. You meant BMW Prototype, where the level and content are completely different. The AI didn't hallucinate a fact — it filled a gap in the direction you didn't specify in your input.
How do I learn to phrase questions better for AI?
Three rules that help in 90 % of cases. (1) **Name the object concretely** — not „the part” but „injection-moulded polycarbonate headlamp cover for VW MQB”. (2) **Name the change concretely** — when it started, what changed before it (new mould, new operator, new revision, etc.). (3) **Name what you've already ruled out** — operator suspected, batch no, paint booth no. These three steps take 30 seconds and the quality of the answer improves dramatically. The same rule applies to a senior colleague you bring a problem to — except the colleague excuses you when you skip it.
Does this discipline work outside AI — in customer communication, audits, 8D?
Yes, and that's where the biggest value sits. A quality engineer who uses AI for six months with the discipline to phrase concretely becomes better at written communication in general. Better 8D D2 problem statements, better customer emails, better audit responses. Because they get used to asking in their head „if someone who can't see my process were reading this — would they understand what I'm solving?” A classic quality engineer doesn't ask themselves this question. AI forces them to, because otherwise they don't get a usable answer.
Am I doing it wrong, or is the AI doing it wrong?
The most natural reaction is to hand responsibility to the AI — „the model is dumb”. But in 80 % of cases the answer is as precise as the question was. Take the test: take your question to a senior quality engineer who doesn't know your process, and count how many follow-up questions they ask. If they ask you 5 or more, the AI needed them too — except it gave you a generic output instead. This isn't an accusation. It's a diagnosis. And the diagnosis is treatable in 30 seconds with better phrasing.
Updated 7 June 20269 min readIng. Lukáš Dolejský Production Quality Leader · zakladateľ QualityOS
Share: LinkedIn X