Intermittent Scrap, 5 Why and AI — Finding Root Cause of a Chronic Problem in 15 Questions
TL;DR
Chronic intermittent problem on a paint line — most scrap from one failure mode, no correlation with operator, batch or air contamination. AI walked the engineer through 5 Why, asked the right questions, accepted a correction from the expert and the root cause was found in 15 exchanges: missing ionization between the press and the tempering step.
Chronic intermittent problem on a paint line. Most of the scrap comes from a single failure mode — sub-1-micron contamination trapped under the coat, which the paint then enlarges optically into a visible defect. We've been chasing it for months.
The pattern doesn't fit. It's not steady, it's explosive — one batch fine, second fine, third unusable. Two more OK, sixth ruined again.
We tested every correlation we could think of. Operator — nothing. Paint batch — nothing. Shift — nothing. Air contamination in the booth — nothing. At the press — nothing. Ionization calibration — nothing.
This is the classic point where a quality engineer either goes to „human factor" in the audit response and closes it as „operators need retraining", or starts looking in places no standard process is monitoring.
This article is about the second case. And about how we found the root cause in 15 exchanges in a chat with a quality AI. No googling, no generic „check the filters, calibrate the nozzles" answers.
First — why this was hard
Standard RCA methods work on problems with a clear signal. If your scrap rises Wednesday afternoon on shift B when a new operator started, you have a lead. Correlation → hypothesis → test → fix.
For intermittent defects with no correlation, the standard workflow breaks down. Because:
- You checked operators — didn't help.
- You separated batches — didn't help.
- You measured the air — didn't help.
- You verified equipment calibration — didn't help.
At that point most teams say „it's human factor" or „it's a complex chemical process, it lives its own life". Both are excuses, not diagnoses.
A real root cause always exists. The question is which part of the process nobody is scanning, because nobody considers it relevant.
Why I went through AI instead of ChatGPT or Google
Ask ChatGPT „why do I have intermittent contamination in paint" and it writes a two-page essay with 12 possible causes in bullet points. That's not RCA — that's the outline of a textbook chapter. None of the 12 causes is asking about your context.
Google sends you to a forum thread from 11 years ago describing a completely different process.
A domain-specific AI for quality does something else. It knows that:
- Containment must be the first question. No RCA without isolation.
- After containment comes scope. What percent? How is the defect characterized?
- Only then root cause. And there it asks first what kind of problem this is (new vs chronic, acute vs intermittent), because that drives a different approach.
These are shortcuts a quality engineer learns over 10 years. The AI learned them because it was built for it.
First exchanges — containment and scope
In its very first reply the AI didn't go to „it could be this, this, this". It started with:
„Are all the parts produced with this defect isolated in quarantine? They can't go to the customer or to the next process?"
This is a trivial question for a quality engineer. But for a project manager, production lead or anyone pulled into RCA „because it has to be done" it's the critical one. Without containment there's no point discussing root cause — defective product is shipping.
Then it went to scope:
„What percentage of the total scrap is from this defect? Is the contamination dust/fibres (white dots), or oily/greasy marks, or metallic particles?"
These two questions in one exchange narrowed the space meaningfully. From the answers it was already clear that this was critical scrap (one failure mode = most of the problem) and that the particle character was tiny (below the threshold of visual detection before paint).
The key moment — „new or chronic?"
Here the AI behaved differently than most tools:
„When did the scrap rise? Is this a new problem (last days/weeks)? Or is it chronic — something you've had long-term? If new — we look for what changed. If chronic — the process was never under control."
This is a very precise RCA marker. For a new problem you look for a change (an ECN, a supplier swap, a new operator). For a chronic problem you look for a systemic gap that was always there but nobody noticed.
The answer was clear — chronic intermittent. The AI confirmed it as „the hardest type" and immediately went to typical sources that aren't tracked by standard inspection in paint shops:
- Contamination already ON THE PART before the booth (from pressing, handling, storage) — paint just „freezes" it.
- Contamination from the paint equipment — booth, nozzle, filtration with an unstable leak.
- Electrostatics attracting dust from the surroundings — not from the air directly, but from clothing, walls, transport.
These are three classic sources for intermittent paint-shop problems. Standard inspection doesn't cover them, because it doesn't measure things it doesn't think it should measure.
5 Why — step by step
After these questions we worked downward methodically. Briefly how it went:
Why #1: Why do we see contamination after paint? Because it was there before paint — the paint only enlarged it optically. Sub-1-micron before, visible after.
Why #2: Why don't the existing ionization and IPA take it off before the booth? Because it isn't a loose particle that ionization could shake off and IPA could wipe away. It sits on the surface firmly — something is holding it there.
Why #3: Why is it stuck so firmly? Because tempering at 140 °C has literally fused it onto the surface. This is not regular adhesion — it is thermal fusion. Micro-particle + temperature = permanent bond to the part surface.
Why #4: Why does this happen specifically during tempering? During the thermal cycle the glass develops a surface electrostatic charge, which pulls free dust from the surroundings. And the high temperature simultaneously „bonds" it onto the surface the moment it lands. Tempering doesn't only do a thermal treatment — in parallel it unintentionally also does deposition.
Why #5: Why doesn't the existing (post-tempering) ionization catch this? Because it works post factum. By the time the part reaches ionization, the particles are already fused. Ionization neutralises the airborne charge but cannot remove what is already chemically/thermally bonded to the surface.
Root cause found. It's not that ionization is missing as a technology — we have it. We found a blind spot in the process — the step between the press and the tempering oven that nobody was monitoring. That's where ionization should run: to remove the electrostatic charge and free dust before the thermal fusion, not after.
And now the less obvious part — how to fix this without breaking another process
Finding the root cause is half the work. The other half is designing a corrective action that doesn't create a new problem.
In this case, if we'd put a regular cold-air ionizer between the press and the tempering oven, we would have solved one problem and created another — a cold blast would change the part's thermal profile, break temperature homogeneity before tempering and, at the extreme, affect the dimensional stability of the glass. For the quality engineer, that would mean swapping one chronic problem for another.
The AI accounted for this. Instead of regular cold-air ionization it proposed ionization with warm air at 50–60 °C. The ionization principle works through electrical discharge in the bar, not through air temperature — so it works just as well. You blow warm air through it instead of cold, and:
- We neutralize the electrostatic charge and carry off the free surface dust.
- We don't disturb the part's temperature profile before tempering.
- We don't trigger a secondary quality problem in downstream steps.
This is exactly the kind of perspective that separates a senior quality engineer from a junior. A junior says „let's add ionization". A senior says „let's add ionization in a way that…". And in this case the AI was thinking about the in a way that.
Then something important happened
When I got the first analysis summary from the AI, it had a small inaccuracy in it — it suggested adding a buffer station between tempering and IPA cleaning. From my technical knowledge I knew that was the wrong place, because the existing post-tempering ionization is already there and working. The real gap is elsewhere, between the press and the tempering step.
I wrote:
„Slightly different — after tempering there's already ionization, but the gap as we defined it is between the press and tempering."
The AI rebuilt the whole analysis. Not just corrected the one point — it completely restructured the flow diagram, the action plan, the root-cause explanation. Five lines from the engineer and it had the correct map of the process.
This is the difference between a tool that helps you think and a generator that throws answers at your face. A generator wouldn't correct itself. A thinking tool does.
The action plan that came out
D5 (permanent corrective actions):
- Layout change — buffer station between the press and tempering, ~20–30 seconds of natural cooling.
- Install an ionization bar with warm air (50–60 °C, doesn't cool the part below its minimum required temperature).
- Trial run — verification batch, scrap-rate measurement with the new process.
D6 (verification):
- One week of production with monitoring.
- Target scrap value defined against pre-trial baseline.
D7 (prevention):
- Add electrostatic-charge consideration to PFMEA for this type of process.
- Update the control plan with temperature monitoring at the ionization step.
This is the whole action plan. From 15 questions in a chat. With a quality engineer who brought nothing to the AI except real numbers from his own process.
Why this worked
Because the AI had no motivation to write a longer answer. No motivation to compliment me on my question. No motivation to add bonus tips beyond my situation.
It had the motivation to ask the right questions — containment before root cause, scope before hypothesis, evidence before conclusion. And when I corrected it, it accepted the correction without excuses and recomputed the whole reasoning.
That's exactly what a senior quality engineer next to you over coffee would do. Without the appendix of a 12-page PDF listing every theoretical possibility.
The key lesson for your situation
If you have a chronic intermittent problem nobody in the team has cracked in months, the two most likely scenarios are:
- You're looking in the wrong place. The correlations you tested were logical, but the real cause sits in a process step you don't consider relevant. AI expands your hypothesis space to things like electrostatics, micro-particles below the detection threshold, the timing of cleaning operations, the impact of thermal cycles.
- You're missing structured guidance. You know everything in theory. You know what 5 Why is. You know containment goes before root cause. But when you sit three hours in a meeting with the team, the structure gets lost in discussion. AI holds structure for you — one question at a time, evidence before conclusion, no jumps.
Neither of these problems is solved by Google. Neither by ChatGPT. A domain-specific AI for quality solves them.
I don't think AI replaces the quality engineer. I think it gives you the same access you'd have to a senior colleague sitting next to you — except whenever you need them, without having to track them down.
And for intermittent problems that escape you for months, that's often exactly what you need.
FAQ
- When does AI in root cause analysis help the most?
- On intermittent problems and chronic defects with no obvious correlation. For an acute problem with a clear cause, AI won't add much value — you already know what to do. For a problem nobody in the team has cracked in months, AI asks the questions nobody asked themselves and opens paths the standard workflow has closed.
- Will AI replace the quality engineer?
- No. AI doesn't have hands on the shop floor, can't read a gauge, can't judge whether a part is „clean to the eye“. What AI does: ask the right questions in the right order, hold containment before root cause, refuse to jump to „human factor“ as a conclusion. The quality engineer remains the decision-maker and the one who knows what's actually feasible.
- What's the difference between ChatGPT and a domain-specific AI for quality?
- ChatGPT writes an 11-page essay listing every possible cause to the same question. A domain-specific AI for quality asks one question, waits for the answer, builds the next on top. It knows containment goes before root cause, that a dominant failure-mode share is not random, that intermittency has typical sources. These shortcuts come from specialization, not from model size.
- Does AI work on intermittent defects with no correlation to operator or batch?
- Yes — and often that's where it helps the most. The absence of correlation doesn't mean the cause doesn't exist; it means you're looking in the wrong place. AI expands the hypothesis space to sources that aren't tracked by standard inspection (electrostatics during temperature cycles, sub-detection particles, timing of cleaning operations in the process sequence).
- Will AI accept a correction when an expert says it's wrong?
- A well-designed one will. In this case study I corrected the AI mid-summary (it placed an intermediate station in the wrong process step) and the AI rebuilt the entire 5 Why chain around the correction. That's the difference between a tool that helps you think and a generator that throws answers at your face.