Ford AI quality fix puts veterans back in the loop

A factory can automate the wrong thing very efficiently. That is the uncomfortable lesson inside Ford's new quality win.

Ford AI quality moved from boardroom slogan to operating repair job after the automaker brought back or elevated more than 350 experienced engineers to rebuild the judgment layer its automated systems had failed to capture, according to The Verge's June 25 report. The timing matters because Ford just took the top mass market spot in the 2026 JD Power U.S. Initial Quality Study, where the brand scored 152 problems per 100 vehicles and the industry average improved to 175.

That is a real improvement. It is also a warning for anyone building with AI inside a complex product organization. Ford did not merely add more models, more tests, or more automation. It had to reattach old hands to new systems.

Charles Poon, Ford's vice president of vehicle hardware engineering, gave the cleanest version of the mistake when he said the company thought adding AI and changing design requirements would produce high quality, according to The Verge's account of the reporter briefing. That assumption should sting if your roadmap has a box labeled "replace expert review with AI".

What did Ford's quality rebound actually show?

JD Power measures initial quality by counting problems reported per 100 vehicles during the first 90 days of ownership, and the 2026 study is based on 78,514 purchasers and lessees of new 2026 model year vehicles surveyed after 90 days, according to JD Power's 2026 methodology note. Lower is better, which makes Ford's 152 PP100 mass market result meaningful because it beat Nissan at 156 and Buick at 162.

The broader industry also improved. Overall new vehicle quality moved from 192 PP100 in 2025 to 175 PP100 in 2026, which JD Power called the best year over year improvement since 1997 in its 2026 U.S. Initial Quality Study. Premium brands improved from 203 to 169 PP100, while mass market brands improved from 187 to 177.

The chart below shows the shape of the rebound: Ford's win sits inside a wider industry improvement, but premium brands made the steepest one year move, dropping 34 problems per 100 vehicles.

Line chart of JD Power initial quality scores showing overall PP100 falling from 192 in 2025 to 175 in 2026, premium falling from 203 to 169, and mass market falling from 187 to 177. — Initial quality scores from JD Power: overall improved from 192 PP100 in 2025 to 175 in 2026, premium improved from 203 to 169, and mass market improved from 187 to 177. Source: JD Power. Data Today benchmark.

The catch is infotainment. JD Power said infotainment was the only category that worsened in 2026, with 44.4 PP100 in the mass market segment and 38.3 PP100 in the premium segment, according to the same 2026 study. Android Auto and Apple CarPlay connectivity added 1.4 PP100 to the deterioration inside infotainment, a small number that matters because it names the failure mode: software integration at the customer boundary.

That is where Ford's story stops being a car story and starts looking like a systems story. The company had automated parts of design and production, but Poon said it underestimated the knowledge held by engineers who had been through multiple vehicle development cycles, according to The Verge's briefing report. Institutional knowledge is often invisible until the system that quietly depended on it starts returning weird results.

Ford's fix was unusually blunt. The company hired, promoted, or brought back more than 350 experienced engineers, and it created a 40 person software quality assurance team, according to The Verge's report on Ford executives' comments. It also says it added more than 100,000 AI powered tests to stress software systems and catch edge cases.

More tests sound like the obvious answer. The sharper lesson is that Ford paired those tests with people who knew what a bad edge case looked like before a dashboard produced a red icon.

Why should builders care about a car company's AI mistake?

Because Ford ran into a pattern software teams recognize: automation debt. A team takes an expensive human process, translates part of it into tooling, loses the undocumented judgment, then has to hire the judgment back at a premium.

Ford's version is easy to see because the stakes are physical. A vehicle leaves the factory, customers discover problems, dealers process repairs, regulators watch recalls, and warranty accruals hit the income statement. In Ford's 2025 annual report, the company disclosed $5.733 billion in payments made during the period for warranty and field service actions and an ending warranty and field service action balance of $17.190 billion, according to Note 24 of Ford's 2025 annual report.

That is the business case for keeping experts in the loop. If your AI system ships bad requirements, weak test coverage, or plausible but wrong design recommendations, the cost usually shows up later in support tickets, churn, remediation work, and slower releases. Cars make the bill legible. SaaS teams just call it a rough quarter.

The recall backdrop makes the win less tidy. NHTSA's 2025 annual recall report lists 997 total recalls across the industry affecting 31,268,058 units, with 727 vehicle recalls affecting 23,427,040 vehicles, according to NHTSA's 2025 annual report. The same report's large recall table includes three 2025 Ford vehicle recalls affecting 1,076,138, 1,456,417, and 1,448,655 vehicles respectively.

Ford also entered a three year consent order with NHTSA in November 2024 over rearview camera recall failures, and the agency said the order included a $165 million civil penalty, according to NHTSA's announcement. That was before the latest JD Power win, but it explains why a 90 day quality metric cannot be treated as a full reset.

For AI builders, the important consequence is practical:

Do not automate undocumented expertise first. If the senior reviewer cannot explain the checklist, the model will learn the shadows of the checklist.
Treat expert review as training data creation. Ford's experienced engineers were tasked with improving data collection and AI training, according to The Verge's report.
Measure downstream defects, not tool usage. JD Power's PP100 metric counts owner reported problems after 90 days, while your equivalent may be failed deployments, escalations, refunds, or blocked sales.
Keep software speed away from safety theater. A vehicle cannot adopt the consumer app habit of pushing a fix after customers trip over the bug.

This is close to the lesson in Data Today's earlier look at why AI coding costs bite inside the workflow: the cheap part is generating more work. The expensive part is validating which work belongs in production.

Did AI fail here, or did Ford use it the wrong way?

The useful read is that AI exposed a management gap. Ford's executives are still expanding AI powered testing, which suggests the company has not concluded automation itself was the villain. The failure was assuming that the system could preserve engineering taste, historical memory, and cross functional context without a deliberate transfer plan.

Automotive development is a brutal test bed for that mistake. A modern vehicle is a mechanical product, a distributed software platform, a supply chain artifact, and a regulated safety object. JD Power's 2026 dataset uses 227 voice of customer questions plus repair data across 10 categories, including infotainment, driving assistance, powertrain, seats, climate, and exterior, according to the study description. That is a lot of surface area for an automated system to misunderstand.

Kumar Galhotra, Ford's chief operating officer, said the company is moving from a "find and fix" mentality toward preventing issues before they occur, according to The Verge's report. That line matters because most AI quality programs quietly optimize find and fix. Generate code, run tests, open bugs, patch. Repeat. The stronger program changes inputs: requirements, design reviews, interface contracts, supplier data, and test oracles.

The 100,000 test figure is impressive only if those tests encode useful reality. Otherwise, the suite becomes a very large confidence machine. Anyone who has maintained flaky integration tests knows the feeling: the dashboard is green, the product is brittle, and nobody wants to touch the ancient helper that makes the build pass.

The better model is a loop:

Veteran engineers identify failure patterns from prior launches.
Those patterns become structured requirements, assertions, simulations, and review criteria.
Automated tests revalidate late changes quickly.
Field data and repair data update the test suite.
Humans audit the blind spots, especially when the product changes shape.

That loop is slower than a demo. It is also how you avoid turning AI into a faster route to rework.

What should your team change before copying Ford's playbook?

Start by finding the people your AI plan quietly assumes away. In a software company, they may be the staff engineer who knows why the billing migration still has a feature flag, the support lead who can predict which customers will break a release, or the security reviewer who remembers the third party OAuth incident from 2021.

Then convert that knowledge into assets without pretending the conversion is complete. A good first month could produce 20 high value regression tests, 10 annotated design review examples, and a short failure taxonomy. That beats a thousand generated test cases nobody trusts.

Use Ford's 350 engineer move as a budget signal. If a company with Ford's process maturity had to rebuild its expertise layer, a 12 person product team should be careful about replacing review with a chat box. The smaller your team, the more dangerous it is to let one senior person's context disappear into a vague prompt called "best practices".

Your operating plan should include four checks:

Map expertise before automation. List the decisions that currently require senior approval, then write down the evidence those people use.
Create named owners for AI training data. Data quality is a job, not a side effect of tool adoption.
Define a customer facing defect metric. Pick one number that hurts when it rises, such as incidents per 1,000 sessions or refund tickets per release.
Run late change drills. Ford says automated validation helps recheck late software changes, according to The Verge's report, and your team should know how long that loop takes before launch week.

There is a hiring lesson too. AI does not remove the need for senior engineers. It changes the work they should do. Less time should go to routine inspection. More time should go to defining the tests, constraints, and examples that make automated inspection worth trusting.

The human hand stays on the torque wrench

Ford's quality win is encouraging because it shows the mess can improve. It is uncomfortable because the improvement required admitting that automation had been asked to carry knowledge it did not contain.

The next wave of AI adoption will produce more Ford shaped stories. Some will happen in hospitals, banks, logistics networks, and developer platforms rather than assembly plants. The teams that come out ahead will treat experts as the source of the system's judgment, not as legacy cost centers waiting for deletion.

A model can help you move faster. Someone still has to know where the bolts shear.

Ford AI quality fix puts veterans back in the loop

What did Ford's quality rebound actually show?

Why should builders care about a car company's AI mistake?

Did AI fail here, or did Ford use it the wrong way?

What should your team change before copying Ford's playbook?

The human hand stays on the torque wrench

Sources

More from Engineering

CISA KEV vulnerabilities put edge gear on watch now

FortiBleed FortiGate credentials need action now

Ford AI quality fix puts veterans back in the loop