Why Your Best Engineers Write the Most Bugs

A vice-president of engineering at a mid-sized technology firm recently conducted an analysis that surprised her. She extracted two years of data from the company's Git repositories and bug tracker, covering fifty engineers. The hypothesis was straightforward: the engineers who delivered the most business value would have the fewest bugs. Technical excellence and code quality should correlate positively. The data suggested otherwise.

The five engineers who had delivered the most significant business impact, measured by features shipped, revenue enabled, and user problems solved, had introduced the highest absolute number of bugs. Not merely a few more bugs than their peers. Dramatically more. One engineer, responsible for the year's most successful product launch, had introduced more bugs than the bottom ten performers combined.

Before concluding that the company had rewarded incompetence, the vice-president examined the data more carefully. The high-bug engineers had also committed five times more code than average. They worked on substantially harder problems: authentication systems, payment processing, real-time data synchronization. They took risks on features that might not work but could be valuable. When their code failed, it failed visibly because it was deployed to production quickly and used immediately by customers.

The low-bug engineers, by contrast, committed less code. They worked on well-understood problems with established solutions. They took fewer risks and waited longer before deploying. When their code did have bugs, those bugs often went undiscovered because the code saw limited use. Some maintained legacy systems that were barely touched. A few had spent months on projects that never shipped at all, producing precisely zero bugs because they produced nothing.

What the analysis revealed was not that excellent engineers write buggy code, but that the metric everyone had been optimizing for, bugs per engineer, measured the wrong thing. The relevant metric was not bugs per person or even bugs per line of code. It was bugs per unit of value delivered. On that measure, the supposedly bug-prone engineers were the most effective in the company.

The Analysis That Surprised Everyone

The data, once examined properly, told a consistent story. Across fifty engineers over twenty-four months, the patterns held. Engineers who shipped more code introduced more bugs. Engineers who worked on harder problems introduced more bugs. Engineers who deployed frequently introduced more bugs, but also fixed them faster, learned from them more effectively, and delivered more value to the business.

Consider the numbers specifically. The top performer had committed 3,247 changes over two years. She had 127 bugs assigned to her code. The bottom performer had committed 203 changes over the same period. He had 8 bugs assigned. On the surface, the bottom performer appeared more careful. But the top performer's code was live in production, used by tens of thousands of customers daily, generating millions in revenue. The bottom performer's code existed largely in feature branches that had never been merged or in internal tools that few people used.

The top performer's bugs were also different in character. They were edge cases in newly built systems: race conditions in concurrent code, unexpected behavior when processing malformed data, performance problems under load. These are bugs that emerge from doing something novel. They are discoverable only through actual use. They suggest code that is stretching the system's capabilities.

The bottom performer's bugs, when they appeared, were often different. They were failures to check null values in straightforward code. They were off-by-one errors in loops. They were misunderstandings of well-documented APIs. These bugs suggest something closer to carelessness or lack of understanding. The difference matters.

What the company had been measuring was bug count. What they should have been measuring was bug count relative to three factors: volume of code shipped, difficulty of problems attempted, and business value delivered. When normalized for those factors, the rankings inverted. The supposed low-quality engineers were revealed as high-performers working on hard problems. The supposedly careful engineers were revealed as low-output engineers working on simple things.

The Difference Between Bugs and Defects

Understanding why high-performing engineers write more bugs requires distinguishing between different types of software failures. The industry uses "bug" to describe everything from catastrophic security vulnerabilities to minor display glitches. Treating all bugs as equivalent obscures important differences in their causes and implications.

Consider bugs in novel code versus defects in well-understood code. When an engineer builds a new real-time notification system and discovers that messages sometimes arrive out of order, that is a bug. It is also a sign that the engineer is working on a problem involving concurrency, network latency, and distributed state. All of these are intrinsically difficult. The bug represents incomplete understanding of a complex problem space.

When an engineer modifies a well-established user interface and introduces a bug that crashes the application on startup, that is a defect. It suggests insufficient care in a well-understood domain. These failures look similar in a bug tracker (both are tickets to be fixed), but they indicate different things about the engineer and the work.

Bugs in new code are often discoveries. The engineer learns that the problem has an aspect that was not initially apparent. A payment system might work perfectly in testing yet fail in production when encountering transaction data from a legacy system that formats currencies differently. An authentication system might handle standard cases flawlessly yet break when a user's name contains Unicode characters that the database was not configured to accept. These failures reveal hidden complexity and drive understanding forward.

Defects in maintenance work, by contrast, often indicate insufficient attention. An engineer changes a well-tested function without running the existing tests. An engineer copies code from one place to another without understanding what it does. An engineer makes an assumption about data that would have been contradicted by examining the database. These failures suggest process problems or skill gaps.

The difference extends to how bugs should be interpreted. When an engineer building a new machine learning pipeline encounters numerous bugs related to data quality and model performance, that is signal. It suggests that the problem is harder than initially understood, that requirements need refinement, or that more research is needed. High bug rates in genuinely novel work should prompt investigation, not criticism.

When an engineer maintaining a stable system introduces defects at a high rate, that is also signal, but of a different kind. It might suggest that the engineer lacks knowledge of the system, that testing is inadequate, or that changes are being made too hastily. The appropriate response is different: training, process improvement, or reassignment.

Many organizations fail to make this distinction. A single bug count combines edge cases in novel distributed systems with null pointer exceptions in straightforward business logic. The resulting metric rewards engineers who avoid difficult problems and punishes those who attempt them. This is not an accident of measurement. It is a systematic incentive toward conservative, low-value work.

Why Productivity and Bug Rate Correlate

The correlation between productivity and bug count is not mysterious. It follows directly from several mechanisms, each of which is independently sufficient to produce the observed pattern.

Most obviously, more code means more opportunities for bugs. An engineer who commits one thousand lines of code will, on average, introduce more bugs than an engineer who commits one hundred lines, even if both engineers have identical defect rates per line. The relationship is linear and mechanical. Organizations that measure bug count without normalizing for code volume are effectively penalizing output.

Less obviously but more importantly, harder problems generate more bugs. An engineer building a new database replication system will encounter more bugs than one adding a form field to a web page. The replication system involves distributed state, network failures, consistency models, and performance trade-offs; the form field involves updating HTML and perhaps writing a simple validation function. The complexity is not comparable, and neither is the bug rate.

Organizations that assign their best engineers to their hardest problems (the rational thing to do) will observe those engineers having high bug counts. This does not mean the assignments are wrong. It means the bugs reflect problem difficulty rather than engineer quality. The alternative, assigning hard problems to weaker engineers, produces even more bugs, slower progress, and greater risk that the problem will not get solved.

Risk-taking also drives bug rates upward. Engineers who experiment with new approaches, who try solutions that might not work, who push the boundaries of what the system can do inevitably introduce more bugs than those who stick to proven patterns. Successful experiments become new capabilities that competitors cannot match. Failed experiments become bugs. Both outcomes are necessary for innovation to occur.

Speed contributes as well. Engineers who ship code quickly get feedback quickly, discovering bugs rapidly. Engineers who hold code for extensive review before deploying discover bugs more slowly. When they finally deploy, they often introduce bugs that could have been found and fixed weeks earlier. The fast engineer's bugs are visible; the slow engineer's bugs are latent. Bug trackers count only the visible ones.

There is also a selection effect. Code that is deployed to production and used by customers generates bug reports. Code that is not deployed generates no bug reports, regardless of how many bugs it contains. An engineer who deploys continuously will have bugs found and reported. An engineer who works on long-lived feature branches will have their bugs discovered much later, if at all. The engineer who ships nothing has a perfect bug record.

These mechanisms compound. The engineer who ships lots of code, works on hard problems, takes risks, and deploys frequently will have a dramatically higher bug count than one who ships little code, works on simple problems, avoids risks, and deploys slowly. This does not make the first engineer less valuable. It makes them more valuable. Naive bug metrics, however, will mislead.

The Zero-Bug Culture That Killed Innovation

What happens when organizations optimize for low bug counts provides instructive case studies in perverse incentives. One software company, facing customer complaints about quality, implemented a "zero defects" policy. Every bug was treated as a failure. Bug counts by engineer were published on internal dashboards. Bonuses were tied to staying below bug quotas. Management believed this would drive quality improvements. It drove something else entirely.

Within six months, engineering velocity had dropped by eighty percent. Engineers were not spending more time on testing or quality assurance. Instead, they had stopped working on anything difficult. Projects involving architectural changes, new infrastructure, or novel features were avoided. Engineers focused on safe, simple modifications unlikely to generate bugs: updating documentation, adjusting styling, adding configuration options.

The bug count did decline, which management initially celebrated as success. But revenue growth stalled. Product launches were delayed. Competitors began shipping features the company had discussed but not attempted. Customer satisfaction, which the quality push was meant to improve, actually declined because fewer new features were being delivered and old problems were not being addressed.

The policy's effects on collaboration were particularly destructive. Engineers stopped volunteering for difficult projects because those projects carried high risk of exceeding bug quotas. The hardest problems either ended up assigned to junior engineers who could not refuse them or simply did not get solved. Knowledge sharing declined because helping a colleague meant looking at their code and potentially discovering bugs that would count against them.

Code review became adversarial. Rather than focusing on whether code solved the problem correctly, reviewers focused on finding potential bugs that would vindicate their criticism. Engineers began writing defensive code optimized for passing review rather than for maintainability or performance. Clever solutions that carried any risk were rejected in favor of verbose, conservative approaches that were less likely to generate scrutiny.

The company's best engineers began leaving. In exit interviews, they cited frustration with the inability to work on meaningful problems. They described an environment that punished taking risks and rewarded avoiding work. Several joined competitors, where they promptly shipped the ambitious features that their former employer had been too cautious to attempt.

The policy was eventually abandoned, but not before the company had lost significant market share and several of its strongest engineers. The replacement policy measured different things: customer-impacting incidents, time to resolve issues, and business value delivered. Bug counts were still tracked but were not treated as the primary quality metric. Velocity recovered, though the organizational trust damaged by the zero-defects period took years to rebuild.

This pattern repeats across companies with surprising frequency. The specifics vary (different metrics, different incentive structures), but the core dynamic remains. Optimize for low bug counts, and engineers will optimize their behavior to minimize bugs. That optimization rarely takes the form of writing better code. It usually takes the form of writing less code, working on easier problems, and avoiding risks.

The Optimal Bug Rate Is Not Zero

If zero bugs is the wrong target, what is the right one? The answer depends on what the organization is trying to achieve and varies substantially across domains. A medical device company faces different trade-offs than a social media startup. But some principles apply broadly.

Too few bugs indicates insufficient risk-taking. When an engineering team's bug rate is extremely low, the most likely explanation is not exceptional skill but rather that they are working on insufficiently ambitious problems. They may be capable of more but are holding back to maintain their quality metrics. This is locally rational behavior that is globally suboptimal.

Consider a team that consistently ships features with zero bugs. This sounds admirable until you ask what features they are shipping. If they are adding simple UI improvements and minor configuration options while declining to work on the complex backend refactoring the system needs, their zero-bug record is masking a failure to address important problems. The organization would be better served by higher bug rates and more ambitious technical work.

Too many bugs, conversely, indicates inadequate care or preparation. An extremely high bug rate (particularly if the bugs are in well-understood domains or involve basic programming errors) suggests process problems. Perhaps testing is inadequate, code review is insufficient, or engineers lack necessary knowledge. These problems should be addressed through improved practices, not by celebrating the high output.

The optimal rate lies between these extremes and varies by context. For a team building a new product in an emerging market, a high bug rate is acceptable because speed matters more than perfection and the cost of bugs is relatively low. For a team maintaining a financial transaction system, a low bug rate is appropriate because the cost of bugs is high and the requirements are well understood. Neither team should try to match the other's bug rate.

More precisely, the optimal bug rate should be calibrated to the cost of bugs in the specific context. This cost includes both the direct cost of fixing bugs and the indirect cost of customer impact. For software controlling industrial machinery, a bug might mean physical damage and safety risks; a low bug rate is worth substantial reduction in development speed. For a consumer mobile app, a bug might mean poor user experience and a one-star review; speed often matters more than perfection.

The economic calculation is straightforward in principle. Reducing bugs requires time: more testing, more review, more defensive programming, more cautious deployment. That time has opportunity cost in the form of features not shipped, experiments not run, and problems not solved. The organization should accept the level of bugs at which the marginal cost of further reduction exceeds the marginal benefit.

This calculation is rarely performed explicitly. Instead, organizations tend to anchor on "as few bugs as possible" without considering the cost. The result is over-investment in bug reduction relative to the actual business value it provides. Engineering time that could have been spent building features customers want is instead spent eliminating obscure edge cases that will never be encountered.

A more sophisticated approach recognizes that not all bugs are equally important. Bugs that crash the application for all users are critical; bugs that cause minor display issues for a small percentage of users in unusual circumstances are not. Rather than optimizing for total bug count, organizations should optimize for weighted bug count, where weights reflect business impact.

This leads to a different optimization strategy. Instead of trying to prevent all bugs equally, focus effort on preventing high-impact bugs. Instead of treating all bugs as urgent, triage them by severity. Instead of measuring engineers by total bug count, measure them by customer-impacting incidents. These approaches align engineering effort with business value rather than with an arbitrary quality metric.

Bugs as Information About Risk

Beyond their direct impact on users and systems, bugs serve a valuable signaling function. They provide information about the difficulty of work, the effectiveness of processes, and the distribution of engineering effort. Organizations that treat bugs only as failures to be minimized miss this informational value.

A project with an unusually high bug rate might indicate that the problem is harder than initially estimated. Perhaps the requirements were unclear, the technology is less mature than assumed, or the team lacks necessary expertise. Learning these things early (while the project is small) is more valuable than learning them late.

Organizations that punish high bug rates suppress this signal. Engineers become reluctant to report that a project is encountering unexpected difficulties because doing so will reflect poorly on their quality metrics. The result is that problems get hidden until they become crises. A project that should have been reconsidered or resourced differently proceeds until it is too large to fail, at which point its accumulated problems become organizational emergencies.

Conversely, a project with an unusually low bug rate might be worth examining. It might indicate excellent engineering, but it might also indicate that the project is too easy, that engineers are spending excessive time on problems that do not require it, or that the project is not being tested adequately. A zero-bug record is not always something to celebrate.

Bug patterns across an organization reveal where technical investment is needed. If bugs cluster around particular parts of the system, those parts might need refactoring, better documentation, or improved testing infrastructure. If bugs cluster around particular types of problems (say, database performance or authentication edge cases), that might indicate knowledge gaps that training could address.

Temporal patterns in bug rates also carry information. A sudden increase in bugs might indicate that testing practices have degraded, that the team is under excessive time pressure, or that new engineers lack adequate onboarding. A gradual increase might indicate accumulating technical debt making the codebase harder to modify safely. These are problems that can be addressed if recognized, but they must first be visible.

The diagnostic value of bugs depends on having accurate categorization. A bug in a new machine learning system that occurs because training data had unexpected characteristics differs from a bug in the same system that occurs because someone forgot to validate inputs. The first teaches something about the data; the second indicates a process failure. Both might be recorded identically in a bug tracker, but they warrant different organizational responses.

Organizations that aggregate all bugs into a single count lose this diagnostic information. They know they have bugs but not what those bugs mean. Richer categorization (by severity, by root cause, by component, by newness of code) provides context that makes bug data actionable. A count of 200 bugs means little. A report showing that 150 are minor UI issues, 40 are edge cases in a new system still being tuned, and 10 are serious defects in core functionality means considerably more.

The Incentive Problem

The difficulty with measuring bug counts is not primarily technical but political. Once bug count becomes a metric by which engineers are evaluated, it ceases to be a useful measure of quality. This is an instance of Goodhart's Law: when a measure becomes a target, it ceases to be a good measure. Bugs are particularly vulnerable to this dynamic because gaming the metric is straightforward.

The most direct form of gaming is simply avoiding work that might generate bugs. Engineers decline to work on difficult projects. They focus on safe, simple tasks. They stop taking risks. This behavior is rational if bug count affects compensation or career progression. The engineer who works on the authentication system will have more bugs than the engineer who updates documentation. Why accept that disadvantage?

More subtly, engineers learn to structure their work to minimize measured bugs. Rather than building a new feature in a single, coherent pull request that might have bugs discovered during review or testing, they split it into many small pull requests that are individually less likely to be scrutinized carefully. Rather than deploying code quickly where bugs will be discovered, they keep it in development branches longer. Rather than working in areas with good test coverage where bugs will be caught, they work in areas with poor coverage where bugs might go undetected.

When bug counts are public, even more perverse incentives emerge. Engineers become reluctant to find bugs in others' code because doing so might damage relationships. Code review becomes superficial because thorough review might uncover bugs that would make the reviewer seem critical. Collaboration suffers because working together increases the chance of one engineer's bugs being discovered by another.

The categorization of bugs becomes political. Is this a bug or a feature request? Is it a new bug or a regression of an old bug? Should it be assigned to the engineer who wrote the original code or the engineer who wrote the recent change that exposed the problem? These classifications matter if they affect metrics, and so engineers spend time arguing about classifications rather than fixing bugs.

Severity ratings face similar pressures. An engineer with a high bug count might argue that their bugs are all minor; an engineer with a low bug count might highlight that their few bugs are severe. Whether these claims are accurate becomes difficult to assess because the person with the strongest incentive to categorize bugs is the person whose metrics they affect. Organizations can establish independent triage processes, but this adds overhead and often produces its own biases.

The problem extends to how bugs are discovered and reported. If engineers are evaluated by bug count, they have incentive to ensure bugs in their code are not discovered, or at least not recorded. Testing might become less thorough. Bug reports might be classified as "working as intended." Bugs might be fixed quietly without creating tickets. All of these behaviors make bug counts meaningless as quality measures while simultaneously degrading actual quality.

The solution is not to stop measuring bugs. It is to recognize that bug counts are primarily useful as diagnostic information, not as evaluation metrics. Bugs should be tracked to understand where problems are occurring and what patterns exist. They should inform decisions about where to invest in testing, documentation, or refactoring. They should not directly affect individual performance evaluations.

This is difficult to implement because managers naturally want metrics they can use to differentiate performance. Telling managers they should not use bug counts leaves them asking what they should use instead. Evaluating engineering performance requires judgment based on multiple factors: business impact, code quality, collaboration, mentorship, and technical growth. Reducing this to a single number is attractive but misleading.

What to Measure Instead

If bug count is a misleading metric, what should organizations measure to understand engineering effectiveness and software quality? The answer is not a single replacement metric but a portfolio of measurements that together provide a more complete picture.

Start with customer-impacting incidents rather than all bugs. Most bugs never affect users; they are caught in code review, testing, or staging environments and fixed before deployment. Counting them conflates quality problems with healthy development practices. A team that finds many bugs before deployment might be doing excellent testing, not poor development.

Customer-impacting incidents are failures that reached production and affected user experience. They are the bugs that actually matter. Measuring these focuses attention on what the organization should care about: delivering reliable service to customers. It also provides clearer signal about quality. A team with few customer incidents but many pre-deployment bugs has good quality. A team with few pre-deployment bugs but many customer incidents has a testing problem.

Within customer-impacting incidents, severity matters enormously. An incident that takes down the service for all users is not comparable to an incident that causes a minor display issue for users of a particular browser. Organizations should weight incidents by their business impact: user-hours affected, revenue lost, customer relationships damaged. This captures what actually matters.

Time-to-detection and time-to-fix are more informative than incident count. A system that has occasional incidents but detects and resolves them within minutes may be more reliable in practice than a system that has fewer incidents but takes hours to detect and fix them. The total downtime or degradation is what affects customers, not the number of distinct incidents.

These metrics encourage different behavior than bug counts. Rather than avoiding risky work, engineers focus on making systems observable so problems are detected quickly. Rather than hiding bugs, engineers focus on making systems recoverable so problems can be fixed rapidly. Rather than preventing all failures, engineers focus on minimizing blast radius when failures occur.

Mean time to recovery (MTTR) and mean time between failures (MTBF) capture reliability from a user perspective. A system might have frequent small failures (high failure rate) but recover quickly (low MTTR) and thus provide good user experience. Another system might have rare failures (low failure rate) but require extensive recovery time (high MTTR) and thus provide worse user experience despite fewer failures.

Business value delivered provides essential context for interpreting quality metrics. An engineer who ships a feature that increases revenue by ten percent while introducing three customer incidents has probably delivered more value than one who ships nothing while introducing zero incidents. The optimal balance depends on the organization's priorities and risk tolerance, but ignoring value means treating quality as an end in itself rather than as a means to business success.

Learning from failures matters as much as preventing them. When incidents occur, did the team conduct a thorough postmortem? Were systemic improvements identified and implemented? Did the organization learn something that will prevent entire classes of future problems? An incident that teaches important lessons might be more valuable than avoiding the incident would have been.

Code review effectiveness can be measured by what percentage of bugs are caught before merging versus after deployment. A team that catches ninety percent of bugs in review has good practices; a team that catches ten percent has a review problem. This metric encourages thorough review without penalizing engineers who wrote code that had reviewable bugs.

Test coverage and test effectiveness provide information about quality practices. Not all test coverage is equal; tests that thoroughly validate behavior are more valuable than those that execute code without checking outputs. Coverage combined with low incident rates suggests good testing practices, while low coverage combined with low incident rates suggests either very good code or inadequate observability to detect bugs.

These metrics are more complex to track than a simple bug count. That is appropriate. Engineering performance is complex. Reducing it to a single number produces gaming and misaligned incentives. A richer set of metrics, interpreted with judgment rather than mechanically, provides better information about what is actually happening.

How Top Teams Handle Bugs

Organizations that maintain both quality and velocity approach bugs differently from those that struggle with one or both. Their practices are neither mysterious nor revolutionary. They are consistent application of principles that balance learning from failures with moving quickly.

Blameless postmortems are foundational. When an incident occurs, the focus is on understanding what happened and how to prevent similar incidents, not on identifying who made a mistake. This is not about avoiding accountability (engineers are still responsible for their work), but about creating an environment where problems are discussed openly rather than hidden.

Blameless does not mean causeless. Postmortems identify what went wrong and what changes would have prevented the problem. The framing, however, is about systemic improvements: better testing, clearer documentation, more defensive code, additional monitoring. The question is not "why did engineer X make this mistake" but "why did our processes allow this class of mistake to reach production."

Rapid iteration is valued over perfection. Code is deployed frequently, often multiple times per day, meaning bugs are discovered quickly while context is fresh and fixes are straightforward. Each deployment contains relatively few changes, making bugs easier to isolate and resolve. Large, infrequent deployments mean bugs are discovered long after the code was written, when fixes are more difficult.

This approach requires infrastructure that supports rapid deployment safely. Feature flags allow code to be deployed without immediately exposing it to all users. Monitoring and alerting detect problems quickly. Automated rollback mechanisms limit the blast radius of failures. These investments enable velocity by reducing the risk of each individual deployment.

Clear severity categorization focuses effort where it matters. Not all bugs warrant immediate attention. Severity levels (typically ranging from critical to minor) guide response. Critical bugs get immediate attention from whoever can fix them fastest; minor bugs are scheduled in the normal workflow. This prevents bug fixing from becoming all-consuming while ensuring serious problems get addressed.

The categorization must be meaningful and consistently applied. Too many severity levels create confusion about how to classify bugs. Too few levels result in everything being marked urgent. Most organizations find that three to five levels work: critical, high, medium, low, and possibly cosmetic. The definitions must be specific enough that categorization is usually unambiguous.

Investment in recovery mechanisms is valued as much as prevention. Systems are designed to degrade gracefully when components fail; caching reduces dependency on backend services, retry logic with exponential backoff handles transient failures, and circuit breakers prevent cascading failures. These patterns do not prevent bugs but limit their impact when they occur.

This mindset shift is significant. Rather than trying to eliminate all bugs (which is impossible), the focus is on ensuring bugs have limited impact. A bug in the recommendation system should not take down the entire site; a bug in payment processing should not affect users who are not making purchases. Isolation and containment matter as much as correctness.

Testing practices balance thoroughness with efficiency. Critical paths (authentication, payment, data integrity) receive extensive testing including edge cases and failure modes; less critical features receive lighter testing focused on common cases. This allocation reflects risk: bugs in critical systems have high impact and warrant high investment in prevention.

The testing portfolio includes different types of tests for different purposes. Unit tests validate individual components. Integration tests verify that components work together. End-to-end tests confirm that complete workflows function. Performance tests ensure the system handles load. Security tests check for vulnerabilities. No single type of test is sufficient, but neither is every type needed for every component.

Documentation emphasizes what is surprising or non-obvious. Code that follows standard patterns needs minimal documentation. Code that makes unusual choices, handles edge cases specially, or works around known issues needs explanation. This focuses documentation effort where it provides most value: helping future engineers understand why code is written as it is.

Knowledge sharing is systematic rather than ad hoc. Engineers rotate through different parts of the codebase to distribute expertise; code review is seen as teaching opportunity as much as quality control; pairing or mobbing on difficult problems spreads understanding. These practices ensure that knowledge of how systems work and why they are designed as they are exists in multiple people, not just the original authors.

The Broader Pattern: Risk and Reward

The relationship between bugs and productivity is an instance of a more general pattern: risk and reward are correlated. Organizations that avoid risk avoid reward. Organizations that embrace calculated risk generate returns that more conservative competitors cannot match. This pattern appears throughout business but is particularly visible in software engineering.

Consider product development. A company that only ships features certain to succeed will have a portfolio of unimaginative products that match what competitors already offer. A company that experiments with features that might fail will have some failures but also successes that competitors lack. The portfolio with higher variance produces better outcomes on average, despite containing more individual failures.

The same logic applies to technical decisions. Using a new database technology might introduce bugs that a mature, well-understood database would not have. But it might also enable capabilities that were impossible with older technology. The organization that always chooses the safe, mature option will lag behind competitors willing to occasionally choose newer, riskier options.

This does not mean risk should be embraced indiscriminately but rather managed. In software terms, this means taking risks where the potential reward is high and the cost of failure is manageable. Experimenting with a new caching strategy that might have bugs but could improve performance tenfold is a good risk. Rewriting the authentication system with unproven technology is probably not, because the cost of failure is too high.

Organizations often systematically undervalue risk-taking because the costs are visible and immediate while the benefits are uncertain and delayed. A bug introduced by trying a new approach shows up in this quarter's metrics. The competitive advantage gained from the approach might not be evident for a year. Quarterly reviews and short-term metrics bias toward conservative choices.

This bias manifests in safety culture that can become anti-innovation. Every incident triggers a push for more process, more review, more caution. Each addition makes sense in isolation (of course we should prevent similar incidents), but the accumulated effect is that everything becomes slower and more difficult. Teams spend so much energy avoiding failure that they have little left for achieving success.

Companies that thrive tend to have explicitly articulated risk tolerances. They decide which domains require conservative approaches and which benefit from experimentation. Payment processing might have low risk tolerance (bugs are costly and capabilities are well understood); a new social feature might have high risk tolerance (bugs are not catastrophic and the best approach is unclear). Different standards apply to different contexts.

The economic value of acceptable risk is substantial. A company that deploys twice as fast as competitors because it accepts higher bug rates might have twice as many bugs per line of code but ships three times as much value because it learns faster, iterates more, and reaches market sooner. The competitor with fewer bugs but slower deployment sees this as quality advantage. The market sees it as being behind.

This dynamic is particularly visible in startup versus established company competition. Startups can move fast and break things because they have little to lose. Established companies must move carefully because they have users, revenue, and reputation at stake. This advantage compounds: startups learn faster, iterate more, and sometimes disrupt markets that established companies were certain they understood. The established companies' quality advantage becomes irrelevant if they are solving the wrong problems.

The optimal risk posture depends on competitive position and market dynamics. A dominant company in a stable market can and should be conservative, defending its position through reliability and scale. A challenger in a dynamic market must be aggressive, accepting higher risk to move faster than the incumbent. The bug rate that is optimal for one is disastrous for the other.

The Diagnostic Question

How should organizations think about engineering quality and bug rates given these complexities? A single question clarifies most situations: are your best engineers producing more or fewer bugs than average?

If your best engineers produce fewer bugs than average, you likely have a functional quality culture. Engineers who have more experience, better judgment, and deeper expertise are writing more reliable code. This is how things should work. You might still want to examine whether the absolute bug rate is appropriate for your context, but the distribution suggests healthy fundamentals.

If your best engineers produce more bugs than average, investigate why. There are two possibilities. First, you might be in the situation described at the start of this article: your best engineers work on harder problems, ship more code, and take more risks. The higher bug count reflects higher productivity and ambition. This is not a problem to fix.

Second, you might have a process or cultural problem. Perhaps your best engineers are overworked and cutting corners. Perhaps they have become careless because they believe their reputation protects them. Perhaps they face inadequate code review because reviewers assume their code is fine. These are problems that should be addressed.

Distinguishing between these cases requires examining the nature of bugs, not just their count. Are the bugs edge cases in novel systems or basic errors in straightforward code? Do they teach something about the problem domain or suggest insufficient care? Are they discovered quickly through good monitoring or slowly through customer reports? The answers determine whether high bug rates reflect productivity or problems.

A related diagnostic: are engineers volunteering for your hardest technical problems or avoiding them? If the hardest problems consistently get assigned rather than chosen, that suggests engineers perceive penalty for taking risk. If engineers compete to work on difficult projects, that suggests they trust that their work will be evaluated fairly accounting for difficulty.

Another signal: how do engineers respond to incidents? Do they openly discuss what went wrong and what they learned, or do they become defensive and minimize? Defensive behavior suggests fear of blame. Open discussion suggests focus on learning. The former culture generates hidden problems. The latter generates improvement.

Organizations should also examine whether bug rates differ systematically across types of work. If new feature development has substantially higher bug rates than maintenance work, that is expected, new code is harder than modifying existing code. If maintenance work has high bug rates, that suggests problems: either the codebase is very difficult to modify safely, or engineers lack necessary understanding, or testing is inadequate.

The most important diagnostic might be velocity over time. Is engineering productivity increasing, stable, or declining? Declining velocity despite constant or falling bug rates suggests over-investment in quality at the expense of shipping. Increasing velocity with rising bug rates suggests under-investment in quality. Increasing velocity with stable bug rates suggests you have found a workable balance.

The Uncomfortable Truth

The conclusion that high-performing engineers often have high bug counts is uncomfortable because it contradicts the appealing story that excellence means perfection. It requires accepting that valuable work involves failures, that learning requires mistakes, and that moving fast means occasionally breaking things. These truths are easier to acknowledge in the abstract than to apply when evaluating specific engineers and specific bugs.

The difficulty is partly psychological. Humans are wired to notice failures more than successes. An engineer who ships ten features, nine of which work perfectly and one of which has bugs, is often remembered for the bugs. The nine successes become the expected baseline. The one failure becomes the story. This bias toward remembering failures makes it hard to evaluate productivity fairly.

It is also partly political. Executives understand bug counts and can report them simply: "We had 47 bugs last quarter, down from 53 the previous quarter." Business value is harder to quantify: "We shipped features that we believe will increase revenue but the impact will not be clear for months." The simple metric drives out the important one.

Organizations that overcome this tendency share certain characteristics. They have leadership that understands software development well enough to avoid naive metrics. They have engineering practices that surface nuance: categorizing bugs by severity, distinguishing bugs in new code from defects in old code, measuring customer impact rather than just bug count. They have evaluation processes that rely on judgment informed by data rather than mechanical application of metrics.

Most importantly, they have cultures that treat bugs as information rather than as failures. When a bug is discovered, the first question is "what did we learn," not "who is responsible." Postmortems focus on systemic improvements: what can we change to prevent entire classes of bugs? Incident reviews focus on detection and recovery: how did we find this problem and how quickly did we fix it?

This mindset does not excuse carelessness. Engineers are still expected to test their code, think through edge cases, and follow good practices. But it acknowledges that perfect code is impossible, that novel work involves discovering unknown complexities, and that the alternative to accepting some bugs is accepting less ambition. For most organizations in most contexts, less ambition is the worse choice.

Implications for Management

What should engineering managers do with the understanding that productivity and bug rates correlate positively? Several practices follow from this insight.

First, stop using bug count as a performance metric. Track bugs for diagnostic purposes (to understand where problems cluster and what improvements might help), but do not tie individual bug counts to compensation or advancement. The behavior this creates is engineers optimizing for low bug counts rather than high value delivery.

Second, normalize for context when evaluating work quality. An engineer working on a new distributed system should not be compared directly to an engineer maintaining a mature web interface. The former will have more bugs because the work is harder. Comparing their bug counts implies they are doing comparable work, which they are not.

Third, measure customer impact rather than internal bug counts. The bugs that matter are the ones that affect users. An engineer who introduces many bugs but catches them all in testing before deployment has demonstrated good practices, not poor quality. An engineer who introduces few bugs but misses the few that reach customers has a different problem.

Fourth, celebrate learning from failures. When an incident occurs, treat it as an opportunity to improve systems and processes. The postmortem should identify concrete actions: adding tests, improving monitoring, updating documentation, and refactoring fragile code. If incidents consistently lead to improvements, they become valuable. If they only lead to blame, they become hidden.

Fifth, invest in making bugs less costly. Rather than only trying to prevent bugs, build systems that detect them quickly, recover from them automatically, and contain their impact. This approach acknowledges that bugs are inevitable and focuses on minimizing their consequences rather than eliminating their occurrence.

Sixth, align incentives with business outcomes. Engineers should be evaluated on value delivered, not just absence of bugs. This is harder to measure, value often becomes clear only months after code ships, but it is what actually matters. Metrics that are easy to measure but do not correlate with success are worse than useless because they drive wrong behavior.

Seventh, distinguish between bugs that teach and bugs that indicate problems. A bug that reveals unexpected complexity in a problem domain is valuable information; a bug that results from not running tests is a process failure. These look similar in a bug tracker but warrant different responses. The former might justify revisiting requirements or estimates; the latter should prompt process improvements.

Eighth, maintain appropriate risk tolerance for the context. Teams working on critical infrastructure should have lower bug tolerance than teams building experimental features. Make these expectations explicit so engineers understand what trade-offs are appropriate for their work. Without explicit guidance, engineers tend toward excessive caution because that minimizes personal risk.

Conclusion

The data that surprised the vice-president of engineering reveals an uncomfortable truth: the engineers who introduce the most bugs are often the most valuable. Not because bugs themselves are valuable, but because they correlate with productivity, ambition, and risk-taking; qualities that drive organizational success.

This correlation exists for straightforward reasons. More code means more bugs. Harder problems mean more bugs. Faster deployment means bugs are discovered sooner. Risk-taking means more failures alongside more successes. The alternative to accepting these bugs is accepting less output, easier problems, slower deployment, and more conservative approaches. For most organizations, that is a poor trade.

The optimal bug rate is not zero. It is the rate that emerges when engineers work on appropriately ambitious problems at sustainable pace with good practices. That rate is higher than most organizations expect. It varies by context: financial systems should have lower bug rates than experimental features. But in almost all contexts, zero bugs indicates insufficient ambition.

Organizations that understand this measure different things. They focus on customer-impacting incidents rather than all bugs. They measure detection time and recovery time, not just incident count. They evaluate engineers on value delivered relative to bugs introduced, not on bug count in isolation. These approaches require more judgment and provide less satisfying simplicity, but they align metrics with actual business goals.

The deeper insight is that bugs are information, not just failures. They reveal problem difficulty, process effectiveness, and risk distribution. High bug rates in new code suggest challenges that might warrant additional resources. High bug rates in maintenance work suggest technical debt or knowledge gaps. Zero bug rates might indicate insufficient ambition. These signals are valuable if interpreted correctly.

Managing bug rates effectively requires culture as much as metrics. Blameless postmortems that focus on systemic improvements rather than individual responsibility. Recognition that novel work involves discovering unexpected complexities. Explicit risk tolerances that vary by context. Investment in recovery mechanisms as much as prevention. These practices distinguish organizations that balance quality and velocity from those that sacrifice one for the other.

The uncomfortable conclusion is that organizations should want their best engineers to have high bug counts, not low ones. Not because bugs are desirable but because they are a side effect of working on hard problems, shipping code quickly, and taking necessary risks. The engineers with pristine bug records might be excellent, or they might be avoiding difficult work, shipping slowly, or working in areas with poor observability where bugs go undetected. The metric alone cannot distinguish these cases.

What matters is not the raw bug count but the value created relative to bugs introduced. An engineer who increases revenue by twenty percent while introducing thirty bugs has delivered more value than an engineer who maintains zero bugs while shipping nothing. The optimal trade-off depends on competitive context and business priorities, but for most organizations, accepting more bugs in exchange for more value is the right choice.

This thesis can be misunderstood as advocating for carelessness or suggesting that quality does not matter. Neither is true. Quality matters enormously, but quality means building what customers need, delivering it reliably, and recovering quickly when problems occur. It does not mean preventing every possible bug. That standard is impossibly expensive and produces organizations that ship little while perfecting much.

The real question for any organization is not "how do we reduce our bug count" but "how do we deliver maximum value while maintaining acceptable reliability." The answer almost certainly involves accepting more bugs than a naive quality-first approach would tolerate, particularly in areas where velocity matters and cost of failure is low. Organizations that recognize this outperform those that do not, not because they have more bugs, but because they ship more value.

The vice-president's analysis taught an important lesson. The correlation she expected, best engineers have fewest bugs, was backwards. The actual correlation, best engineers have most bugs, reflected reality about how software is built and value is created. Organizations that understand this can design better metrics, create better incentives, and ultimately build better products. Those that do not will continue to optimize for bug counts while their competitors optimize for customer value. The market will determine which approach works.