Skip to content
Ayhan Sipahi Ayhan Sipahi

Compressing Time to Production

How high-performing teams shrink the lead time from code-complete to live in production, without trading away security or code quality. A guide for tech leads.

In many organizations a feature can be code-complete for days before it is live, stuck behind a review queue, a release train, or a deploy slot it missed. The metric that captures this is lead time for changes: the span from a change being committed to running in production, and most of that clock is process, not programming. The most useful move a tech lead can make is to treat lead time as the primary delivery metric and attack its biggest wait-state structurally. The order is fixed: shrink batch size, decouple deploy from release, then turn the security gate and the cross-team gate into self-service guardrails.

This is a decision guide, not a tutorial. It favors frameworks and trade-offs over code, and every number here is a cited finding from published research, not a measurement of mine.

The Lead-Time Clock

Lead time for changes is “the amount of time it takes for a change to go from committed to version control to deployed in production.” The clock starts at code-complete, not at the idea; it excludes the time to decide what to build. That framing matters because it isolates the part a delivery process owns: the wait between “the code is done” and “users have it.”

Most of that span is wait-state, not work. A change sits in a build/test queue, then a review queue, then a security gate, then it is batched into a release train, then it waits for a deploy window, and sometimes it waits on another team to deploy first. The diagram below names those waits; the sections that follow attack them one at a time.

Code-complete (commit)

Build / test wait

Review queue

Security gate

Release-train batching

Deploy window

Waiting on other teams

Live in production

The permission slip for compressing this clock is the most durable finding in the DORA research program: speed and stability are not a trade-off. For over a decade, the data has shown “there is no trade-off between improving performance and achieving higher levels of stability and quality. Rather, high performers do better at both.” That result has held since the 2015 analysis, so optimizing for lead time does not mean accepting more outages; the same structural choices tend to improve both.

It helps to ground the targets without treating them as timeless law. In the 2024 State of DevOps report, the elite cluster deploys on demand, with lead time under one day, a change failure rate around 5%, and failed deployment recovery under one hour; elite was roughly 19% of respondents, with lead time about 127 times faster than the low cluster. Two cautions apply. First, these bands are self-reported survey clusters that DORA re-derives each report year, so they are not fixed thresholds (the 2022 report had only three clusters and no elite tier). Bind any band to its report year. Second, the under-one-day figure is the lead-time target; the under-one-hour figure is recovery. They are different metrics and are easy to confuse.

One naming note, because it appears in older material: the well-known “four key metrics” are now five. The fourth key was renamed in 2023 from mean time to restore to failed deployment recovery time, and the model expanded in 2024. The ideas are stable; the labels move.

Small Batches and the Boring Deploy

The first lever is batch size, because it sets a ceiling on everything downstream. A small change clears the build/test queue faster, gets reviewed faster, and carries a smaller blast radius if it fails. DORA states the underlying principle directly: “make each change as small as possible to make the delivery process fast and stable,” and treats deployment frequency as a proxy for batch size. Therefore the goal is not a heroic release; it is a deploy so routine that it is a non-event.

Trunk-based development is the workflow that makes small batches the default. Developers collaborate on a single branch, the trunk, and resist long-lived feature branches; release branches are cut just in time so the trunk stays deployable. The hard case is a large refactor that cannot land in one small commit. Branch by Abstraction handles it: introduce an abstraction layer, migrate callers behind it incrementally, and keep the trunk shippable throughout, rather than parking the work on a branch that drifts for weeks.

The table below contrasts the two regimes across the dimensions a tech lead actually weighs.

DimensionSmall batches (trunk-based)Big-bang release train
Lead timeShort; each change flows on its ownLong; a change waits for the train
Blast radiusSmall; one change per deployLarge; a week of changes ship together
Deploy frequencyHigh; deploys become routineLow; deploys stay rare and high-stakes
Rollback costCheap; revert one changeExpensive; unwind a bundle to find the culprit

One honest nuance: the 2024 report found, for the first time, that the medium cluster had a lower change failure rate than the high cluster, which DORA itself flagged as “unusual.” The clean gradient from elite to low still holds, so do not over-claim that every step up in speed reduces change failure rate across every cluster pair. The defensible claim is the durable one: small batches make the delivery process faster and more stable, and they make the next lever possible.

The Release Switch, Not the Release Train

Even with small batches, a deploy stays risky as long as deploying code is the same act as exposing it to users. The second lever separates those two events. Code ships to production dark, behind a flag; a flag flip, not a release window, turns the behavior on. This is “the most common way to implement the Continuous Delivery principle of separating release from deployment,” in Pete Hodgson’s account: release toggles let you ship “incomplete and un-tested codepaths … as latent code.”

Merge to trunk

Deploy dark to prod

Flag flip (release)

Progressive rollout (canary / percentage)

Flag cleanup

Once deploy and release are separate, the release itself can be progressive instead of all-at-once. Dark launching calls new back-end behavior for existing users “without the users being able to tell,” which lets you load-test in production before anyone sees a UI change. Canary and blue-green releases bound the blast radius by exposing a slice of traffic first. This is productizable: Argo Rollouts runs an AnalysisRun against an AnalysisTemplate and auto-promotes or auto-aborts a canary against metric thresholds, so the rollout gate is data, not a human watching a dashboard.

Flags are not free, which is the part teams skip. Hodgson is explicit that “savvy teams view the Feature Toggles in their codebase as inventory which comes with a carrying cost,” and that different toggle types need different handling. The four categories sort on two axes, longevity and dynamism, and the management implication differs for each.

Toggle typeLongevityDynamismManagement implication
ReleaseShortStatic (deploy-time)Retire as soon as the feature is fully on
ExperimentShortDynamic (per-request)Tie lifetime to the experiment; clean up after the result
OpsLongDynamicOwned by operators as a kill switch; review periodically
PermissioningLongDynamic (per-user)Often a permanent product capability, not debt

The carrying cost is real. Knight Capital’s roughly USD 460M trading loss is, among other things, the canonical case of a deploy gone wrong compounded by a reused, un-retired flag. It is a warning about flags as un-managed inventory, not proof that flags are dangerous; the fix is an expiration and retirement process, which the table above is meant to support.

Security Review Without the Queue

A mandatory manual security gate is the most common hidden tax on lead time, and the most defensible to remove, because the fix makes security better, not weaker. DORA treats security-review time as a lead-time metric to drive down on purpose: measure “how much time the review adds,” and it “should go down until it reaches an agreed-to minimum,” while “the security review process doesn’t slow down development.” The underlying principle is Deming’s: “cease dependence on inspection … build quality into the product,” and “generate evidence on demand” for auditors rather than gating every change behind a person.

Structurally, that means three things working together: automated controls shifted left into the pipeline, risk-based routing instead of a uniform checklist, and a paved road that makes the secure path the default. The routing is the key insight, because applying the high-risk manual gate to every change regardless of actual risk is what creates the backlog.

low risk

standard

high risk

Change merged

Risk classifier

Auto-gate: SAST / SCA / DAST pass

Paved-road self-serve

White-glove human review

Ship

Netflix frames this as guardrails over gates: build “secure by default central platforms” because “per-app security assessments … do not scale,” and keep “white glove” human review for the genuinely high-risk teams rather than for everyone. NIST’s Secure Software Development Framework backs the risk-based stance from the standards side: its practices are “outcome-based,” and “the intention of the SSDF is not to create a checklist,” organized into four groups (PO, PS, PW, RV) rather than a fixed gate sequence. The automation has a maturity ladder too. OWASP’s DevSecOps Maturity Model puts SAST, SCA, and DAST in CI with automated reporting at level 2, and configures “pipelines … to fail based on severity thresholds” at level 3, so the gate becomes a threshold the build enforces, not a meeting. Supply-chain assurance is incrementally adoptable the same way: SLSA v1.2 ladders from build level 1 (provenance) to level 2 (signed provenance from a hosted platform) to level 3 (a hardened platform), so a team can start without rebuilding everything.

There is an honest counterweight here, and it is the whole reason this lever has a boundary. The 2024 DORA report found that adopting an internal developer platform was associated, in the short term, with reduced throughput (directionally around 8%) and reduced stability (around 14%). DORA attributed the dip to added handoffs, not to security steps specifically, so treat the security-step reading as a plausible mechanism rather than DORA’s finding. The lesson is precise: a guardrail compresses lead time only when the paved road is genuinely self-service. A half-built platform that inserts a new handoff can make lead time worse. (One widely repeated statistic, that dependency automation yields “around 40% fewer vulnerabilities,” is unsourced; do not use it. If dependency cadence matters to your case, GitHub’s Octoverse 2025 reports the weaker, real figures: critical fixes 30% faster, from 37 to 26 days, and 26% fewer repositories with critical alerts.) This lever gets a full treatment in the companion piece on keeping security review off the critical path.

Contracts That Let You Ship Alone

When a change to one service forces a coordinated deploy with everyone who consumes it, the cross-team wait becomes the dominant term in lead time. Independent deployability is the cure, and it is fundamentally a contract problem: if you can change and deploy a service without coordinating with its consumers, you have removed the largest cross-team source of delay. The discipline Sam Newman describes in Building Microservices is a useful target, paraphrasing the book’s test of whether you can change a service and deploy it to production without having to change anything else.

Three mechanisms get you there. The first is a versioning contract. Semantic versioning encodes compatibility in the number itself: MAJOR signals an incompatible change, MINOR a backward-compatible addition, PATCH a backward-compatible fix, all relative to a declared public API. The second is Parallel Change, also called expand and migrate and contract. You expand the interface to support both old and new, migrate consumers at their own pace, then contract away the old path; the producer deploys without waiting on consumer migration. The same discipline applies to databases, where “every migration must be backward compatible with the currently running application code.” The cost is real: during the migrate phase you maintain both versions, and abandoning the contract phase leaves you permanently worse off.

Consumer

Producer

Expand: add new alongside old

Both paths live

Contract: remove old

Still on old path

Migrate at own pace

On new path

The third mechanism makes the contract executable. Consumer-driven contract testing removes hidden coupling: “any provider behaviour not used by current consumers is free to change without breaking tests.” Pact (the open-source project, with its Pact Broker) records what each consumer actually needs; PactFlow is SmartBear’s commercial hosted broker, and its bi-directional mode is a distinct, weaker check than full consumer-driven testing. The payoff is the deploy gate. Instead of “deploy the pre-tested set together,” which the Pact docs describe as the old-fashioned bottleneck of “deploying sets of pre-tested applications together,” a can-i-deploy check inspects the contract matrix and answers whether this version is safe to release into a given environment:

pact-broker can-i-deploy --pacticipant Orders --version 1.4.2 --to-environment production

It exits zero to deploy and non-zero to block, so the cross-team coordination becomes an automated check rather than a calendar event. Two adjacent practices reinforce the contract. The Tolerant Reader applies Postel’s Law on the consumer side, “be conservative in what you do, be liberal in what you accept,” so consumers ignore unknown fields instead of breaking on additions. For event-driven systems, a schema registry sets deploy order through compatibility modes; in Confluent’s scheme, BACKWARD (its default) means upgrade consumers first, FORWARD means producers first, with FULL and TRANSITIVE as stricter options, and a field delete is safe only when the field is optional or defaulted. Stripe’s public API is a clean example of additive, date-based evolution: major releases carry breaking changes, while the monthly releases are backward-compatible only and “safe to upgrade … without breaking any existing code” (the current version string rolls forward over time, so treat any specific one as an example). The monorepo-versus-polyrepo question that often rides alongside this has no authoritative answer; it is a genuine trade-off, not a number.

The Client You Cannot Roll Back

Mobile breaks the symmetry that the previous sections assume. A server deploy is reversible in seconds; a shipped binary is not. Store review sits outside the team’s control, an iPhone cannot be downgraded by the user, and old client versions linger in the wild for weeks to months after a release, so any bug in a binary is live until users update. Apple’s own figure sets the baseline expectation for the gate the team does not own: “on average, 90% of submissions are reviewed in less than 24 hours,” with a multi-day tail for the rest. Even months after a major mobile OS ships, it reaches only about two-thirds of its install base, which is the directional truth behind why old clients do not simply disappear.

The lever is to move as much product decision-making server-side as the binary will allow, so that the slow client release is not on the critical path for everyday changes. The Backends For Frontends pattern is the seam. A BFF is one backend per user experience, owned by the same team that owns the UI, which simplifies “lining up release of both client and server components”; the rule of thumb is “one experience, one BFF.” Because the BFF deploys at server cadence, the team can change behavior daily behind it while the binary updates on the store’s slower clock.

Client cadence (store review + adoption)

Server cadence (reversible, daily)

thin requests

decisions, content, layout

BFF (owned by UI team)

Downstream services

Mobile binary

The aggressive end of this spectrum is server-driven UI, which moves layout and actions, not just data, to server cadence. Airbnb’s Ghost Platform is “a unified, opinionated, server-driven UI system” that orchestrates layout and actions across iOS, Android, and web with backward-compatible sections. The cost is real and worth stating plainly, since this is vendor experience and not a universal recommendation: server-driven UI carries a permanent backward-compatibility obligation plus added complexity in the codebase, performance, and debugging. Spotify built HubFramework, a component-driven UI framework for backend-driven layouts, and later deprecated it. Minimum-version gating and kill switches are common practice for forcing the worst clients to update, but treat forced-update as a product and policy decision, not a guarantee the platform grants. The defensible default is narrower: own the BFF, push decisions server-side, and reach for full server-driven UI only when the variation rate justifies its tax. The mobile case gets its own companion piece on shipping when the client cannot be rolled back.

When the Default Holds and When to Override

The through-line is Team Topologies: every wait-state above is a handoff, and the work is to turn handoffs into self-service flow. A platform team turns the security-review queue, the deploy step, and the contract matrix into things stream-aligned teams operate themselves. The original statement of this is Werner Vogels’ “you build it, you run it,” which is foundational lore worth citing as the origin rather than a recent finding. That continuous deployment scales is no longer in question: Google’s monorepo is “used by 95% of [its] software developers,” absorbing roughly 16,000 human and 24,000 automated changes per workday. Meta’s heritage shows the same arc, with the caveat that its two pipelines should not be conflated: its mobile release cadence compressed from four-week cycles toward weekly, while the gradual employees-to-2%-to-100% ramp belongs to the web flow.

So the recommendation stands, with a sequence: measure the lead-time clock first, then attack the biggest wait-state, in the order batch size, then deploy-release decoupling, then the security and cross-team gates. The boundary is the carrying cost. Flags become inventory and need a retirement process. Contracts add dual-version maintenance during the migrate phase. A paved road compresses lead time only when it is genuinely self-service; a half-built platform adds a handoff and can reduce throughput. Server-driven UI carries a permanent backward-compatibility tax. None of these is free, and the override cases are concrete: the mobile asymmetry changes which lever leads, and the boundary check, asking whether the next flag or contract layer is genuinely self-service, decides when to stop adding structure rather than start.

The single next step is the measurement, not the tooling: instrument the clock from commit to production, find the largest wait-state, and aim the first lever there.

References

Related posts