In part one of the blog series, I introduced Glyph Lefkowitz’s “Futzing Fraction” and discussed how vibe coding is likely inefficient across all skill levels of development tasks. In part two, I extended the formula to account for real-world factors like developer skill, task complexity, and error costs. The results make vibe coding look even less appealing than with the original formula.
The Story So Far: Both the original and extended futzing fractions consistently show vibe coding as inefficient (FF > 1) versus traditional development. The extended formula reveals a harsher reality: citizen developers can waste up to 6x their time, competent developers can still experience a loss of efficiency ranging from 80% to 240%, and even experts struggle to get AI to produce acceptable code without coaxing, especially on complex or high-stakes tasks. The formula casts a negative light on the “AI replaces developers” narrative by showing that skill, complexity, and error costs matter enormously.
Quick Formula Reference:
Glyph’s Original Futzing Fraction:
My Extended Futzing Fraction:
Where:
- I = Inference time (waiting for AI)
- W = Writing prompts
- C = Checking/fixing AI output
- H = Human baseline (time to code manually)
- P = Probability AI gets it right
- S = Skill factor (your ability to evaluate/fix AI output)
- L = Learning factor (overhead of figuring out AI workflows)
- X = Complexity multiplier
- E = Error cost multiplier
Rule of thumb: FF > 1 means you’re wasting time. FF < 1 means AI is actually helping.
Now comes the crucial question: what to do with this information?
Practical Framework: How to Stop Futzing Around
After playing with the numbers and working with vibe coding on my AI assistant, here’s what the futzing fraction taught me about using AI effectively.
The “AI Replaces Developers” Story Is Mathematical Nonsense
The standard vendor narrative assumes that coding is coding, that building a to-do app and implementing OAuth have the same complexity, error tolerance, and skill requirements. The improved futzing fraction shows this is, at best, wishful thinking. Even expert developers struggle to break even on moderately complex tasks, and citizen developers are consistently burning 3-6x the time they would save by simply hiring a competent developer.
The standard vendor narrative assumes that coding is coding, that building a to-do app and implementing OAuth have the same complexity, error tolerance, and skill requirements. The improved futzing fraction shows this is, at best, wishful thinking. Even expert developers struggle to break even on moderately complex tasks, and citizen developers are consistently burning 3-6x the time they would save by simply hiring a competent developer.
Set Futzing Budgets, Not Futzing Goals
Based on my experience, here’s what I wish I’d done from the start:
Time-box vibe sessions. Set a hard limit: “I’ll spend 30 minutes trying to get AI to solve this. If FF’ > 1 by then, I code it myself.” I wasted hours on features where I knew by attempt #3 that I should have sucked it up and written the code.
Track your actual success rates. Stop trusting vendor benchmarks and start measuring your P for different types of tasks. My success rate for UI work was maybe 15%, but for authentication flows it was closer to 5%.
Apply the formula as a decision filter. Before using Copilot or Windsurf, estimate your variables. High complexity (X > 2)? High error cost (E > 2)? Low skill for this specific task (S < 1)? Go ahead and code it yourself and save some frustration.
Team Applications: Who Should Futz and When
Junior developers: Focus on high-L, low-E tasks where learning matters more than efficiency. Use vibe coding for exploring patterns and understanding concepts, but not for shipping features. Always have a senior developer review the work, because your ability to spot errors may be lower than you think (I know mine was at times).
Senior developers: Use vibe coding selectively for prototyping and exploration where mistakes are cheap. However, when the stakes are high or the complexity is difficult to suss out, trust your skills over those of a chatbot.
Critical features: FF’ < 0.5 or write it yourself. Authentication, payment processing, data handling, anything where E > 3 probably isn’t worth the risk. AI’s tendency to hallucinate APIs and skip error handling makes it fundamentally unsuitable for high-stakes code.
Measure, Don’t Just Feel
The most significant insight from formalizing this in a simple formula is that most of us are terrible at estimating our productivity. The intermittent reinforcement of occasional big wins (when vibing saves you 2 hours) makes us forget the frequent small losses (when it wastes 20 minutes 10 times in a day).
Start tracking your C (checking time), as that’s where the hidden costs live. If you’re going to spend more than five minutes going back and forth with an AI, debugging and trying to fix errors caused by previous AI output, you probably should go ahead and write it yourself.
Track your actual P by task type, you might discover that vibe coding is great for CSS or inserting logging statements, but terrible for async logic, or helpful for boilerplate but dangerous for edge cases. Use that data to inform your futzing decisions.
The formula is a reality check for when vibe coding stops being productive and starts causing non-productive water-treading.
The Formula for Honest AI Adoption
Building a project via vibe coding taught me that the futzing fraction – both Glyph’s original and my extended version – aligns perfectly with what I experienced. The formula validates the intuitive sense that something was off, that I was treading water and wasting time for hours arguing with a Markov chain, even when I occasionally hit those satisfying moments where it generated exactly what I needed.
Looking back, there were times when the futzing fraction was under one, meaning it was the right choice.
AI coding assistants genuinely helped with specific tasks, such as generating boilerplate code, adding consistent logging patterns throughout a codebase, scaffolding new modules with standard structures, and refactoring repetitive patterns. These tasks typically had low complexity multipliers (X), minimal error costs (E), and high learning factors (L) since the code was reusable.
The sweet spot seems to be tasks where:
- The scope is well-defined and narrow
- Mistakes are obvious and easily fixed
- The output serves as a starting point rather than the final destination
- You’re working in familiar territory where your skill factor (S) is high
Suppose you are spending time writing small utility scripts, generating test data, creating configuration templates, or handling routine code transformations. In that case, you may be better off using AI, as those tasks yield futzing fractions closer to or below 1.
Some vendors sell the myth that coding is about to become as easy as writing an email (which, let’s be honest, many still struggle with). Vibe coding can indeed be genuinely helpful for specific, well-defined tasks where the cost of errors is low and the scope is well defined. But for complex features, security-critical code, or anything requiring deep system understanding, good software still requires skill, judgment, and experience that can’t be prompt-engineered away.
I’ve shifted to a more surgical approach based on these calculations. I vibe code selectively for the tasks where my measured success rates are high and the futzing fraction consistently stays below 1: boilerplate generation, adding logging, refactoring patterns, and exploring new libraries in low-stakes environments. But for anything complex, security-related, where I can’t waste the time debugging hallucinated APIs and phantom imports, I crack my knuckles and reach for my keyboard.
If you made it this far, thanks for sticking with me through this slog of a blog series. In other news, I recently took a road trip to North Dakota and captured the following picture of the night sky.


