Knowledge Hub
A repeatable and lean framework for building valuable products, with proven guides and best practices across product, design, and engineering.
Estimation is one of the most common sources of friction between engineering teams and the rest of the business. A feature gets estimated at two weeks and ends up taking six. Stakeholders lose confidence in timelines, engineers feel pressure to commit to numbers they don't believe, and nobody can explain where things went wrong.
The problem isn't that accurate estimation is impossible. More often, teams estimate the wrong things using the wrong methods. At OAK'S LAB, estimation occurs across multiple stages for different purposes, and accuracy improves by separating discovery estimation from delivery estimation.
Key Takeaways
- Most estimation problems come from conflating discovery estimation (rough sizing for prioritization) with delivery estimation (detailed sizing for sprint commitment). They serve different purposes and need different methods.
- Story points measure relative effort and complexity, not hours. They include both development and testing work combined, so the estimate reflects what it actually takes to get a feature to "done," not just "code complete."
- Velocity baselines (average story points completed per sprint) are the most reliable predictor of future capacity. Ignoring your own historical data in favor of stakeholder optimism is how teams consistently miss deadlines.
- Good estimation turns stakeholder conversations from opinions and best wishes into informed trade-off discussions backed by real data.
Why Estimation Matters
You can't plan without estimates, and if you can't plan, it becomes really hard to run a successful business. Stakeholders need to know when features will ship, especially when those features are tied to business outcomes like customer commitments or a sales cycle. Engineers need to know how much they can realistically accomplish in a sprint. Product Leads need to understand the effort required so they can prioritize intelligently.
Estimation isn't about exact precision. Nobody needs to know a feature will take exactly 43.5 hours. What matters is whether it's a one-week effort or a two-month effort, because those two answers lead to very different planning decisions.
The Two Types of Estimation
Discovery estimation: Rough sizing of potential features before refinement. The goal is prioritization, not precision. You need to know if solving this problem is a small, medium, or large effort so you can weigh it against business impact.
Delivery estimation: Detailed sizing of refined work before it enters a sprint. The goal is commitment. Engineers need accurate estimates so they can commit to a sprint with confidence rather than crossing their fingers.
Teams that conflate these two types produce incorrect estimates every time. Trying to get delivery-level precision during discovery wastes time on features that might not make the roadmap. Using discovery-level rough sizing for sprint commitment is how you end up consistently overcommitting.
How We Estimate in Discovery
When discovery identifies a potential feature for the roadmap, the team needs a rough sense of effort before prioritizing it. You're asking one question: is this a quick win or a multi-month project?
The typical approach uses relative “t-shirt” sizing:
- Small: Less than a week
- Medium: One to two weeks
- Large: Three to four weeks
- Extra Large: A month or more
This rough sizing, combined with business-impact scoring, creates a priority score. High impact, small effort? Build it soon. Low impact, large effort? Icebox it. The math isn't complicated, but the discipline to follow it is.
Discovery estimation also benefits from a lean mindset. Most features should be scoped as lean and minimalistic so the team can invest real complexity budget in the handful that actually differentiate your product. That mindset forces honest conversations early about where complexity should live and where it shouldn't.
How We Estimate in Delivery
Once discovery refines a feature into detailed specifications, engineers estimate it in story points during backlog refinement. Story points represent effort and complexity relative to other work the team has completed. At OAK'S LAB, estimates include both development and testing effort combined, because a feature isn't done when the code is written. It's done when it's tested and working.
The scale follows a modified Fibonacci sequence: 1 (trivial), 2 (simple), 3 (moderate), 5 (complex), 8 (significant effort approaching the sprint limit), and 10 (upper boundary). The Fibonacci spacing is deliberate: as complexity increases, so does estimation uncertainty, and the widening gaps between values reflect that increasing uncertainty rather than pretending you can distinguish between a 6 and a 7 on a complex feature. Anything estimated higher than 10 needs to be broken into smaller pieces.
Story points aren't literal hours. They represent relative effort. A 3-point story takes the same cognitive effort regardless of who picks it up, even if one engineer completes it faster than another. That's what makes points useful for team planning: complexity remains constant even as individual speeds vary.
Teams estimate through refinement sessions. The Product Lead and Design Lead present the scope, and the Tech Lead plus engineers estimate. Everyone reviews the spec, discusses clarifying questions, and discloses their estimate simultaneously. If estimates vary widely, the team discusses why until consensus emerges. That discussion is often where the real value lives, because it surfaces assumptions and edge cases nobody had considered. And everyone has equal power in these sessions. If a junior engineer spots a simplification or a reusable component, they have the full mandate to raise it. Some of the best estimation conversations start with, "Why are we building this from scratch?"
The Estimation Matrix
One practice that helps keep story point calibration consistent is an estimation matrix: a shared reference that maps story point values to concrete examples from your project. Early on, you might start with generic benchmarks, but as the project progresses, you replace those with actual completed stories from your codebase. "A 3-point story" means nothing in the abstract. It means something when the whole team agrees, "it's about the same effort as the user profile screen we built last sprint." The matrix keeps you from reiterating what the numbers mean every refinement session.
Estimation Accuracy Improves Over Time
Estimation happens differently depending on where the project stands. Early estimates during scoping or a Foundation Phase are necessarily rougher because the team is working from less information. As delivery begins and engineers gain familiarity with the codebase and specifications, estimates become progressively more accurate.
The critical principle: when later, more accurate estimates diverge from earlier ones, you can't quietly absorb the difference. That discrepancy needs to surface through scope management so stakeholders can make an informed trade-off. Silently accepting an overrun is how budgets blow up.
The Velocity Baseline
Once a team completes several sprints, they establish their delivery velocity: the average number of story points completed per sprint. This baseline becomes the commitment guide. If a team consistently completes 25 points per sprint, they commit to roughly 25 points in the next sprint. Velocity fluctuates sprint to sprint, but over time the average stabilizes. That stable velocity lets stakeholders predict timelines with confidence rather than hope.
This connects to the Dual-Track Agile process: consistent velocity is only possible when the work entering sprints has been validated and refined through discovery. Bad data in, bad data out applies to sprint planning too. And when planning capacity, teams need to account for reality: upcoming vacations, known unknowns like integration complexity or team members ramping up, and any technical debt that needs addressing.
Common Estimation Mistakes
Estimating before refinement. You can't accurately estimate vague requirements. Discovery needs to refine the feature into detailed specifications before delivery can estimate with confidence. Otherwise, your "two-week estimate" turns into six weeks because nobody accounted for complexity that wasn't in the original spec.
Confusing story points with hours. Story points measure relative complexity, not time. The moment you start converting points to hours, the system breaks because different engineers work at different speeds while complexity remains constant. A manager saying "one point equals four hours" doesn't understand the methodology.
Ignoring velocity data. Teams commit to more than their velocity supports because stakeholders want faster progress. Historical velocity is the most reliable predictor of future capacity. Wishing your team were faster doesn't make them faster, it just makes your estimates wrong.
Not breaking down large stories. Stories above 10 points are too big to estimate accurately and too complex to complete in a single sprint. If an engineer says "that's a 10," the next question should be "how do we split it?"
Not updating estimates when reality changes. Undiscovered complexities appear as development progresses. When the real effort diverges from the plan, the team needs to adjust the estimation, and that adjustment needs to be reflected in the roadmap and communicated to stakeholders. Treating your original estimate as gospel when reality has shifted means your roadmap is fiction.
How Estimation Improves Priority Decisions
Good estimation turns opinion-driven arguments into informed trade-off discussions. When stakeholders request a feature, estimation reveals its actual cost:
Stakeholder: "We need this dashboard for the board meeting next month."
Product: "That's an eight-week effort based on our velocity."
Stakeholder: "What if we simplified it?"
Product: "If we cut these three metrics, it's two weeks."
Without estimation, that conversation is just opinions. With estimation, it's a real trade-off discussion. The Product Lead can present options backed by data rather than defend a position based on gut feeling.
We saw this play out with a pre-seed client building a deal management platform. Mid-project, the strategic direction shifted from a marketplace model to an internal tool. Because the team separated discovery estimation from delivery estimation and maintained a calibrated story point scale throughout, they could quickly size the new direction against the remaining budget. The Product Lead presented concrete options: here's what the pivot costs, here's what fits, here's what gets cut. Stakeholders chose which features to prioritize for the new direction based on real effort data rather than gut feel. The result was a concept to launch in four months, on budget, with the client's founder noting they'd never experienced a team that delivered without budget surprises. That outcome is only possible when estimation discipline gives you the data to make trade-off decisions under pressure.
Why Estimation Enables Predictability
Estimation isn't just an engineering concern. It enables business planning across the organization. Fundraising depends on demonstrating progress toward milestones. Sales depends on knowing when features will ship, because your sales team is making promises based on your roadmap. Marketing depends on coordinating launches, and they can't plan a campaign if "it'll be done when it's done" is the best timeline they're getting from engineering. All of these require reliable timelines, which only exist when teams estimate accurately and honor their velocity.
The burnup chart is one of the most effective tools for making estimation discipline visible. A burnup chart tracks cumulative completed work against the total planned scope, giving stakeholders an exact picture of where the team stands at any point in the sprint or engagement. When estimates are accurate and velocity is predictable, the burnup line trends steadily toward the scope line. When it doesn't, the gap tells you something concrete: either scope grew, velocity dropped, or estimates were off. That diagnostic clarity is only possible when the underlying estimates are disciplined.
What This Means in Practice
Before improving your team's estimation practices:
1. Review the accuracy of your recent sprint estimates
Compare what your team committed to at the sprint start versus what actually shipped. For each feature, note the original estimate versus actual effort. Were the variances random or systematically biased in one direction?
Validate by checking whether misses trace back to poor estimation, unclear requirements, or mid-sprint scope changes. Each root cause has a different fix. If estimates were off because requirements changed, that's a discovery problem, not an estimation problem.
Red flag Your team consistently overcommits by a significant margin. That usually means engineers estimate based on best-case scenarios, or stakeholder pressure is inflating commitments beyond what velocity supports.
2. Check whether you're separating discovery estimation from delivery estimation
Look at how your team estimates work when it first appears on the roadmap versus when it enters a sprint. Are these two different conversations using different methods? Or does the team give precise estimates for unrefined features, only to get blamed when those early guesses turn out wrong?
Validate by checking whether roadmap prioritization uses rough sizing (t-shirt sizes, relative effort bands) while sprint commitment uses detailed story points. If both use the same method, the process conflates two activities and produces unreliable results for both.
Red flag Stakeholders reference early discovery estimates as delivery commitments. "You said this was a medium effort, which means two weeks" is the sound of someone treating a rough prioritization guess as a sprint commitment.
3. Establish and respect your velocity baseline
Calculate your team's average story points completed over recent sprints. That's your velocity baseline. Compare it to what you've been committing to. If commitments consistently exceed velocity, your estimation process is being overridden by wishful thinking.
Validate by sharing velocity data with stakeholders before sprint planning. When everyone sees actual capacity, planning becomes more realistic. "We can fit 25 points this sprint, which means features A, B, and C. Feature D moves to next sprint."
Red flag Your team's velocity varies wildly from sprint to sprint. High variance usually indicates inconsistent story sizing, scope creep within sprints, or work entering without sufficient refinement.
Frequently Asked Questions
How does estimation work when an external partner like OAK'S LAB is doing the engineering alongside our internal team?
The same two-stage process applies. What changes is that both teams need a shared estimation matrix so "a 5-point story" means the same thing to everyone. At OAK'S LAB, the Tech Lead establishes the estimation framework during the Foundation Phase, calibrating story points against real examples from the codebase so internal and external engineers estimate on the same scale. Without that shared calibration, you end up with two teams that nominally use the same numbers but mean completely different things by them.
How do you handle estimation when the project scope shifts significantly mid-engagement?
The team re-estimates affected items against the new direction rather than layering work on top of existing commitments. If a feature was 5 points under the old scope but now requires 8, that delta surfaces through the scope management process so stakeholders can make informed trade-offs. At OAK'S LAB, if delivery estimates exceed those agreed during the Foundation Phase, we address the issue directly with the stakeholder. Budget surprises come from teams that avoid these conversations, not from teams that have them early.
How do you track whether the team's velocity is on pace throughout an engagement?
The burnup chart is the primary tool. It tracks cumulative completed story points against the total planned scope over time, giving leadership a clear visual of whether the team is on pace, falling behind, or absorbing scope changes. When the completed-work line trends steadily toward the scope line, the team is healthy. When a gap opens, you can diagnose whether scope grew (the scope line moved up), velocity dropped (the completed line flattened), or both. At OAK'S LAB, the Product Lead reviews burnup data in every steering committee and sprint review so stakeholders always have an accurate picture of progress. The burnup chart only works when estimates are accurate and velocity is predictable, which is why estimation discipline matters so much: it's the foundation that makes progress tracking meaningful rather than decorative.
How do you prevent estimation from being used as a tool for micromanaging engineering teams?
By keeping the primary focus on team velocity. Overall team velocity is the most important metric because it reflects how the team delivers as a unit. But when team velocity is inconsistent or the team is consistently underdelivering, you need to go deeper. At OAK'S LAB, we do track individual engineer output alongside team velocity because understanding where capacity is going is essential to diagnosing problems. If the team is underperforming, individual data helps identify whether the issue is a specific bottleneck, a knowledge gap, or a workload imbalance. The goal isn't surveillance. It's accountability: holding people to being strong contributors to the team. In estimation sessions, the Tech Lead facilitates, but any engineer can challenge an estimate. That collaborative dynamic only works when the team trusts that estimation data is being used to improve delivery, not to punish individuals for honest estimates.
How do you maintain estimation accuracy when the team composition changes mid-project?
Velocity naturally dips when team members join or leave. The estimation matrix helps continuity by documenting what each story point value looks like with real project examples, giving new members a concrete reference. Expect velocity to fluctuate for a couple of sprints and plan commitments conservatively during that window. The worst thing you can do is maintain the same velocity expectations during onboarding, because the knowledge-transfer overhead temporarily reduces capacity even as headcount increases.
Subscribe to our newsletter and receive the latest updates from our CEO.
All newsletters
(42)
6 Reasons Why JavaScript is the Best For Your MVP
Technology
Business
February 26, 2020
