Understanding the Extended Futzing Fraction in AI Coding

In part one, I introduced Glyph Lefkowitz’s “Futzing Fraction,” which attempts to measure whether vibe coding actually saves time or burns money on expensive procrastination. The results weren’t encouraging; even expert developers showed efficiency losses, with futzing fractions consistently above 1.

TL;DR of Part 1: Glyph’s formula FF = (Inference + Writing + Checking) / (Human Baseline × Success Probability) revealed that vibe coding is inefficient across all skill levels. The “slot machine” psychology of intermittent reinforcement makes us remember the occasional wins while forgetting the frequent losses, leading to poor self-assessment of AI productivity.

Glyph’s original formula assumes an idealized world where all developers have the same skills, projects are equally complex, and all errors cost the same to fix. With the experience of building my AI assistant and with over two decades of working in large enterprises, I realized we need to account for messier realities.

The Extended Formula

While Glyph’s formula captures the core economics of vibe coding, it doesn’t account for the realities I encountered while building my AI assistant. Not all development tasks are identical: some tasks are trivial, others are very complex, some developers can spot hallucinations instantly, others can’t, and some bugs are harmless while others can be catastrophic.
I extended the formula to include these factors:

Mathematical formula depicting the extended futzing fraction (FF') with variables I, W, C, L, H, P, S, X, and E.

The new variables capture what the original missed:

S (Skill Factor): How good you are at evaluating and fixing AI output. This doesn’t just affect how fast you can debug AI mistakes; it changes your success rate. Skilled developers write better prompts and spot problems faster.
X (Complexity Multiplier): Simple CSS tweaks are different from implementing OAuth flows. As I learned about dealing with tokenization complexities, AI confidence doesn’t scale with actual difficulty.
E (Error Cost Multiplier): A broken button is annoying; a security vulnerability is catastrophic. When working on authentication features, even small AI mistakes had massive downstream costs.
L (Learning Factor): The overhead of figuring out how to use AI effectively in the first place. Sometimes this pays dividends, sometimes it’s just extra work.

I linked skill (S) to success probability (P) because better developers get fewer mistakes to begin with. I cap the success rate at 95% due to AI hallucinations. Even the best developers can’t eliminate AI’s tendency to generate broken code.

Oh, and for you stats nerds out there, here’s the formula I used to correlate skill and success rate:
P = clamp(P₀ × (1 + α × (S – 1)), 0, 0.95),
where P₀ is the baseline model accuracy and α controls how much skill moves the needle.

Let me run the numbers for a 40-minute coding task with somewhat realistic parameters. Comparing the original formula (FF) with the improved version (FF’):

Citizen Developer: FF ≈ 1.61 vs FF’ ≈ 3.44
Competent Developer: FF ≈ 1.42 vs FF’ ≈ 2.43
Expert Developer: FF ≈ 1.18 vs FF’ ≈ 1.44

Both formulas agree that vibe coding isn’t worth it in these examples (all results > 1). But the improved formula leads to more depressing results. The original FF is more forgiving as it ignores hidden costs: complexity inflation, error impact penalties, and the learning overhead of figuring out AI workflows.

The improved FF’ is stricter and more realistic for production work. The X and E multipliers surface the costs I experienced, where a “simple” AI-generated authentication flow turned into hours of gaslighting, arguing back and forth, and manual debugging of security issues. Skill (S) helps a bit, but not enough to cancel out the complexity and error penalties in higher-risk contexts.

The extended formula shows that while skill narrows the gap, it doesn’t eliminate the fundamental inefficiency. The formula confirms what I felt intuitively: I was paying a premium to introduce risk.

Real-World Scenarios: Where the Formula Meets Reality

Let me walk through a few scenarios that capture different ways people use AI for coding, using the improved futzing fraction to see what it tells us.

Scenario 1: The “No Code” Citizen Developer

Meet Sarah, a marketing manager who’s been told she can build her web apps with AI. She has basic computer skills but no programming background. She’s trying to build a customer feedback portal for her team.

S = 0.6 (below novice, can’t evaluate code quality)
P ≈ 0.05 (success rate tanks when you can’t spot errors)
X = 2.0 (seems simple, but involves forms, validation, data storage)
E = 3.0 (customer data, potential security issues)
L = 1.5 (steep learning curve with no foundation)

FF’ ≈ 6.2

Sarah is spending over six times as long as it would take a developer to build it properly. Worse, she doesn’t know when the AI gets security wrong, so she’s building technical debt and potential vulnerabilities. The whole “democratization of software development” story falls apart, unless Sarah’s time is free, and quality or security don’t matter.

Scenario 2: The Competent Developer

I’d put myself in this category: okay-ish programming skills, familiar with the stack, building features where I could afford some mistakes while learning.

S = 1.2 (competent, good at spotting obvious errors)
P ≈ 0.09 (slightly better success rate due to experience)
X = 1.3 (moderate complexity, some novel patterns)
E = 1.5 (prototype stage, mistakes were educational, not catastrophic)
L = 0.9 (learning benefits from exploring new approaches)

FF’ ≈ 1.8

Still inefficient, but not catastrophically so. I got something working and learned useful patterns, but I definitely could have coded many features faster by hand. The learning factor (L < 1) helped, but not enough to make it worthwhile purely from a time perspective.

Scenario 2a: Competent Developer Building the “Citizen Coder” Project

What if someone with my skill level had tackled Sarah’s customer feedback portal instead?

S = 1.2 (competent, can spot errors)
P ≈ 0.09 (better chance than Sarah)
X = 2.0 (web app complexity doesn’t change)
E = 3.0 (still customer data risk)
L = 0.9 (learning from practice)

FF’ ≈ 3.6

Less than Sarah’s 6.2, but still more than 3x the time it would take to code directly. AI helps when you can evaluate its output, but the overhead is still significant for anything involving real complexity and risk.

Scenario 3: Security-Critical Production Code

Now imagine an expert developer working on authentication flows for a financial application, the kind of code where mistakes have real consequences.

S = 1.5 (expert level, excellent at code review)
P ≈ 0.12 (best possible success rate, but still limited by AI capabilities)
X = 2.5 (complex security requirements, edge cases)
E = 5.0 (security vulnerabilities could be catastrophic)
L = 1.0 (neutral, expert already knows the patterns)

FF’ ≈ 5.8

Even with expert skills, the combination of complexity and error cost makes vibe coding wildly inefficient for critical systems. This aligns with what I learned working on security features for the assistant; the stakes were too high for AI’s misplaced confidence.

Scenario 3a: Expert Developer Building the AI Assistant Project

So, what happens when an expert tackles my AI assistant project with its orchestration complexity?

S = 1.5 (expert level)
P ≈ 0.12 (best possible, but not magic)
X = 3.0 (AI assistant orchestration is more complex than simple CRUD apps)
E = 4.0 (security/privacy risk of handling sensitive workflows)
L = 1.0 (neutral, an expert already knows the patterns)

FF’ ≈ 4.2

Even experts eventually hit diminishing returns. Despite the higher skill levels of the developer, the higher complexity (X) and risk (E) keep the futzing factor above 1. This shows why even the most capable developers struggle with vibe coding on non-trivial projects.

The Pattern

Vibe coding works best (though still inefficiently) when you have high skill, low stakes, and learning opportunities. The moment you add complexity or raise the error cost, the futzing fraction explodes. The “AI replaces developers” narrative completely ignores these realities; it assumes all code is equally simple and all mistakes are relatively cheap.

What’s Next

The extended futzing fraction reveals that vibe coding is even less efficient than the original analysis suggested. But what do you actually do with this information? In part three, I’ll give you a practical framework for deciding when to use AI and when to skip it, based on back-of-the-napkin math rather than vendor promises or gut feelings.