The COBOL Cliff Is Not a Talent Problem. It's an Institutional Knowledge Problem

The COBOL Cliff Is Not a Talent Problem. It's an Institutional Knowledge Problem.

Matt Deaton

May 18, 2026

Share this article

The conversation about legacy engineering talent at most US financial institutions follows a familiar shape.

Open requisitions for COBOL or RPG roles sit unfilled for six to nine months. Salary bands creep upward each renewal cycle. The recruiting team reports back that the pipeline has dried up. A senior engineer announces retirement, and the institution scrambles to find a replacement — or, more commonly, persuades the retiring engineer to come back as a contractor at a substantial premium.

This is the version of the problem that shows up on the executive dashboard. It is also the wrong version.

The actual risk inside most banks, credit unions, and insurance carriers running legacy infrastructure in 2026 is not that COBOL programmers are scarce. It is that the institution's critical systems have quietly become dependent on individual people, specific engineers who remember how specific batch processes work, who can read specific undocumented modules, who hold specific business logic in their heads that was never written down. When those people leave, the systems do not stop working. They simply become unreadable.

That is concentration risk. And it is the cost most institutions track least carefully, and the cost that compounds most reliably.

Key takeaways

The average COBOL programmer in the US is now 58 years old, with roughly 10% retiring each year, a pattern documented by Phil Teplitzky's widely-cited 2019 demographic study and reaffirmed in subsequent industry analysis.
COBOL still runs roughly 43% of US banking systems, 95% of ATM transactions, and powers $3 trillion in daily commerce, none of which is being decommissioned on a timeline that matches the retirement curve.
The labor market problem is real (an estimated 24,000 COBOL programmers in the US against rising demand), but it understates the institutional risk. The deeper problem is that legacy systems with two-or-fewer maintainers become functionally undocumented when those maintainers retire.
Concentration risk is not a hiring metric. It is the percentage of business-critical systems that depend on a single individual's working knowledge, and it is rising at every institution that defers modernization.

The talent data, briefly, before we move past it

The labor-market shortage is real and the numbers are roughly what every CTO already knows.

The average age of a COBOL programmer is 58, and 10% are retiring each year. An estimated 84,000 mainframe COBOL programming positions were projected to be unfilled by 2020, with no demographic mechanism in place to reverse the trend. Subsequent industry tracking has not improved the picture; if anything, the average age has crept higher as the cohort that was 58 in 2019 either retired or moved further toward retirement.

The demand side has not softened. COBOL processes $3 trillion in daily financial transactions and runs 95% of US ATM swipes. Over 85% of universities dropped COBOL from their curriculum since the 1990s, leaving a critical skills gap that 60% of organizations say is their biggest challenge right now. The pipeline is not refilling at meaningful scale.

The compensation data reflects the scarcity. Mainframe COBOL developers now earn an average of $125,525 per year, with demand projected to grow 15% over the next decade as organizations scramble to maintain and modernize legacy infrastructure. RPG specialists, the AS/400 / IBM i cohort that runs significant portions of credit union and community bank infrastructure — are in similar or worse demographic shape.

These numbers explain why open reqs don't get filled. They do not, by themselves, explain why some institutions handle the transition smoothly and others stumble into existential crises when a single engineer retires. That second question is the one that matters, and the answer has very little to do with the labor market.

Why two metrics, and one is the wrong one

Most institutions track legacy talent risk with a single metric: open requisitions, time-to-fill, or some variant of recruiter pipeline health.

That metric measures the flow into the workforce. The metric that matters is the concentration of knowledge within the workforce already there.

The distinction is best illustrated with a thought experiment. Consider two community banks, each with 30 engineers on legacy maintenance.

Bank A has 30 engineers, each of whom can read and modify any of the bank's 50 business-critical legacy applications. Documentation is comprehensive. Knowledge is distributed. Two engineers retire next month.
Bank B has 30 engineers, but each of the bank's 50 business-critical applications has been "owned" by one or two specific engineers for the past 15 years. Documentation exists in fragments. Most of the working knowledge lives in those engineers' heads. Two engineers retire next month.

Both banks face an identical hiring market. Both face the same recruiter response times. Both will pay similar salary premiums to replace the retired engineers.

Bank A loses some velocity and absorbs the loss. Bank B loses the ability to safely modify roughly seven business-critical applications, possibly forever, because the institutional knowledge required to understand the existing code has walked out the door, and the new engineers cannot acquire it from documentation that does not exist.

Bank A has a hiring problem. Bank B has a concentration risk problem. Only one of them is currently being measured.

A simple definition, because it gets used loosely

Concentration risk, in the legacy systems context, is the share of an institution's business-critical applications whose continued safe modification depends on the working knowledge of two or fewer named individuals.

The formula, in plain form:

Concentration risk = (Number of business-critical systems where ≤2 internal staff have working knowledge) ÷ (Total business-critical systems)

A 30% concentration risk score means roughly one in three of the institution's critical systems is one or two retirements away from becoming functionally opaque. The cost of one critical retirement event, meaning an event in which the only engineers who could safely modify a critical system exit the institution, is not a hiring cost. It is a remediation cost, and it typically runs 12 to 24 months of forensic dependency mapping, often without the people who built the system to consult.

The reason this is not on most executive dashboards is mechanical: nobody asks for the number. CFOs ask about open headcount. CROs ask about cyber and operational risk. CIOs ask about uptime and project delivery. Nobody asks, "for how many of our critical systems is the only working knowledge concentrated in fewer than three people?"

What the experienced engineers already know

The institutional knowledge problem is widely understood inside the engineer community that actually does the work, it just rarely makes it to the executive level in the right form.

Consider how one bank IT leader described the problem that really keeps him up at night: the fear that he may not be able to transfer the deep understanding of the business logic embedded in the bank's programs before that understanding walks out the door with retiring employees.

That sentence is about the asymmetry between the speed of retirement and the speed of knowledge transfer. The same source notes that the trick is to develop a curriculum that teaches not just COBOL, but the business rules behind the code that runs the company. The COBOL syntax can be learned. The business rules, why this batch process runs at 2:14 AM and not 2:15, why this field is padded with a specific character, why this exception handler exists, cannot be learned from a textbook. They have to be transferred from people who remember.

When those people leave without transferring that knowledge, what remains is the worst version of a legacy system: code that runs in production, that nobody fully understands, that cannot be modified without significant risk, and that cannot be modernized without a forensic discovery effort that may take six months to a year and produce only partial answers.

A widely-cited example, told often in modernization circles, makes the point: a major US retailer discovered that its 2,500 store locations all ran on COBOL programs totaling approximately 20 million lines of code, maintained by two engineers — one of whom was retiring. The retailer was not facing a hiring problem. It was facing the imminent disappearance of the only living understanding of a system its entire operation depended on.

A 12-question concentration risk audit

The good news is that concentration risk is measurable, and it is measurable without expensive consulting engagements. A reasonable first audit at most institutions can be completed in a working week by the CTO's office.

The questions worth asking, in roughly this order:

How many business-critical applications run on COBOL, RPG, or pre-modern .NET frameworks?
For each, how many internal engineers can read and safely modify the codebase without external assistance?
For each application, what is the age of the most senior maintainer?
For each application, when was the last comprehensive documentation refresh?
For each application, has there been a knowledge-transfer event (pairing, code walkthrough, structured handoff) in the last 24 months?
For each application, is the business logic captured anywhere outside the code itself?
What is the cost of one week of downtime for each application?
For each application, is there a contracted external party who could maintain it if all internal maintainers left?
What is the average retirement-event probability across the engineering cohort over the next 36 months?
For applications with ≤2 maintainers, what is the dependency-mapping discovery time the institution would face if all maintainers left tomorrow?
How much of the institution's regulatory reporting depends on systems in the high-concentration cohort?
What is the institution's current spending on "knowledge insurance", formal pairing, documentation refresh, structured handoff, versus replacement hiring?

The scoring is straightforward. Applications where ≤2 engineers have working knowledge are Critical. Applications where 3-5 engineers have working knowledge are Elevated. Applications with 6+ are Manageable. The institution's overall concentration risk score is the percentage of business-critical applications in Critical or Elevated status.

In our work with mid-size US financial institutions, that score commonly comes back between 25% and 50%, which is uncomfortable, and the institutions running the audit for the first time are almost always surprised by it.

Why this changes how modernization gets scoped

The reframe from "talent problem" to "concentration risk problem" matters for one specific reason: it changes the order in which modernization work should happen.

A talent framing pushes the institution toward broad hiring and broad training programs, useful, but slow, and not closely tied to which systems matter most. A concentration risk framing pushes the institution toward a different sequence:

First, audit. Identify which specific systems are in Critical status before any modernization budget is allocated.
Second, harvest. Where critical-status systems still have living maintainers, run structured knowledge-extraction sessions, paired with AI-assisted discovery tooling, to capture the business logic in modernization-ready form while the people who know it are still there.
Third, modernize the critical-status systems first. Not the most ambitious system, not the most visible one, the one whose loss of working knowledge would cause the largest unrecoverable damage.
Fourth, hire and train against a documented system, rather than asking new engineers to acquire institutional knowledge from people who have already retired.

This sequence is not faster or cheaper than the talent-pipeline approach. It is more defensible. It uses the institution's surviving expertise while that expertise still exists, rather than hoping the labor market will eventually back-fill what is being lost.

The implication is uncomfortable but worth stating directly: the optimal time to modernize a critical-status legacy system is when the engineers who built it are still in the building, not when they no longer are.

For most US financial institutions, that window is open right now. For some, it is closing in 18 months. For a few, it has already closed.

Knowing which category an institution is in starts with the audit, not with the recruiter.

Calculate you own ROI. Unlock the Real ROI of AI-Driven Development

Share this article

Subscribe to our newsletter

Stay informed with the latest insights and trends in the industry