Netflix deploys thousands of times per day. Google pushes code to production continuously. Spotify releases features to millions of users without a single "launch day." None of these companies ship features by flipping a switch from 0% to 100%. They use progressive delivery -- and the mechanism that makes it work is feature flags.
Progressive delivery has moved from a niche practice to an industry expectation. The DORA research program has consistently shown that elite-performing teams invest in deployment practices that reduce risk, and progressive delivery is a natural extension of those principles. Yet most teams adopting progressive delivery focus on the rollout mechanics while ignoring a critical downstream consequence: progressive delivery teams create significantly more feature flags than traditional deployment teams, and those flags have shorter intended lifespans, which means they accumulate faster when cleanup is neglected.
This guide covers the practical implementation of progressive delivery with feature flags -- the rollout strategies, the monitoring requirements, the decision frameworks for promoting versus rolling back -- and the lifecycle implications that most guides overlook.
What progressive delivery actually means
Progressive delivery is a deployment methodology where new features are released incrementally to expanding subsets of users, with monitoring and decision gates at each stage. It builds on continuous delivery by adding controlled exposure as a layer between deployment and release.
The distinction matters: deployment is putting code in production; release is exposing that code to users. Progressive delivery separates these two events, using feature flags as the control mechanism.
| Concept | Traditional Deployment | Progressive Delivery |
|---|---|---|
| Release scope | All users simultaneously | Subsets of users, expanding over time |
| Rollback mechanism | Redeploy previous version | Toggle flag off (seconds, not minutes) |
| Risk exposure | 100% of users from minute one | 1-5% initially, expanding with confidence |
| Monitoring | Post-deployment observation | Active decision gates between stages |
| Time to full release | Instant (or never, if it breaks) | Hours to weeks, depending on risk |
| Feature flags required | Optional | Essential |
Progressive delivery is not a single technique. It encompasses several rollout strategies, each suited to different risk profiles and use cases.
The four progressive delivery strategies
1. Canary releases
A canary release exposes a new feature to a small, representative subset of users before broader rollout. The name comes from the coal mining practice of sending a canary into a mine to detect toxic gases -- if the canary is fine, the mine is safe.
How it works with feature flags:
// LaunchDarkly canary release
const showNewCheckout = ldClient.variation(
'release_new_checkout_flow',
user,
false // default: old checkout
);
if (showNewCheckout) {
return <NewCheckoutFlow cart={cart} />;
}
return <LegacyCheckoutFlow cart={cart} />;
In the flag management platform, you configure the flag to target a small percentage of traffic:
Stage 1: 2% of users (canary population)
→ Monitor for 24-48 hours
→ Check: error rates, latency, conversion
→ Decision: advance or rollback
When to use canary releases:
- High-risk changes to critical user flows (checkout, authentication, payments)
- Infrastructure changes that affect performance (new database, new API version)
- Features where incorrect behavior has immediate business impact
Canary population sizing:
| Risk Level | Initial Canary % | Monitoring Duration | Escalation Step |
|---|---|---|---|
| Low (UI cosmetic) | 5-10% | 4-8 hours | Jump to 50% |
| Medium (new feature) | 2-5% | 24-48 hours | Increment by 10% |
| High (payment flow) | 0.5-1% | 48-72 hours | Increment by 5% |
| Critical (auth change) | Internal only first | 72+ hours | 1% external canary |
2. Percentage rollouts
Percentage rollouts gradually increase the proportion of users who see the new feature. Unlike canary releases, which focus on a small initial group, percentage rollouts define the full progression from 0% to 100%.
Implementation with Unleash:
from UnleashClient import UnleashClient
client = UnleashClient(
url="https://unleash.example.com/api",
app_name="checkout-service",
instance_id="checkout-1"
)
client.initialize_client()
# Unleash handles percentage rollout via gradualRolloutUserId strategy
if client.is_enabled("new-search-algorithm", {"userId": user.id}):
return execute_new_search(query)
return execute_legacy_search(query)
Typical percentage rollout schedule:
Day 0: 1% rollout → Smoke test, basic health metrics
Day 1: 5% rollout → Error rate comparison, p99 latency
Day 3: 10% rollout → Business metrics baseline
Day 5: 25% rollout → Statistical significance for A/B metrics
Day 7: 50% rollout → Load testing at scale
Day 10: 75% rollout → Final edge case monitoring
Day 14: 100% rollout → Full release, cleanup clock starts
The critical detail: each percentage increase is a decision, not a schedule. Advancing from 10% to 25% should happen because monitoring data at 10% confirms stability, not because the calendar says it is day 5. Teams that treat rollout schedules as fixed timelines defeat the purpose of progressive delivery.
3. Ring deployments
Ring deployments organize users into concentric rings, with each ring representing a broader audience. Features move outward through the rings as confidence increases.
Ring structure:
| Ring | Audience | Size | Purpose |
|---|---|---|---|
| Ring 0 | Development team | 10-50 users | Dogfooding, functional verification |
| Ring 1 | Internal company | 100-1,000 users | Broader internal validation |
| Ring 2 | Beta users / power users | 1,000-10,000 users | Real-world usage patterns |
| Ring 3 | 10% of production | 10% of total users | Production-grade monitoring |
| Ring 4 | Full production | All users | General availability |
Implementation with Split.io:
// Split.io ring deployment using attributes
factory := client.NewSplitFactory("API_KEY", cfg)
splitClient := factory.Client()
treatment := splitClient.Treatment(
user.ID,
"new_dashboard_layout",
map[string]interface{}{
"ring": getUserRing(user),
"employee": user.IsEmployee,
"beta_tester": user.IsBetaTester,
},
)
switch treatment {
case "on":
return renderNewDashboard(user)
case "off":
return renderLegacyDashboard(user)
default:
return renderLegacyDashboard(user) // control behavior
}
Ring deployments are especially effective for B2B SaaS products where you can define rings by customer tier, contract type, or relationship. Your most strategic accounts stay in later rings, protected from early issues.
4. Feature gates with targeting rules
Feature gates use attribute-based targeting to control which users see a feature. Rather than random percentage-based selection, features are exposed to users matching specific criteria.
Common targeting dimensions:
| Dimension | Example | Use Case |
|---|---|---|
| Geography | Country, region, timezone | Regulatory compliance, regional features |
| Account tier | Free, Pro, Enterprise | Tiered feature access |
| User role | Admin, editor, viewer | Role-specific capabilities |
| Platform | iOS, Android, web | Platform-specific rollouts |
| Cohort | Signup date, usage level | Behavioral targeting |
| Organization | Specific company IDs | Customer-specific enablement |
// LaunchDarkly multi-rule targeting
const context = {
kind: 'multi',
user: {
key: user.id,
country: user.country,
plan: user.plan,
},
organization: {
key: org.id,
tier: org.tier,
employeeCount: org.size,
},
};
const showAdvancedAnalytics = ldClient.variation(
'release_advanced_analytics_v2',
context,
false
);
Progressive delivery rarely uses a single strategy in isolation. A typical rollout might start with ring deployment (internal team first), then switch to percentage rollout (1% to 100%), with feature gates layered on top for geographic or regulatory constraints.
Monitoring and observability during rollouts
Progressive delivery without monitoring is just slow deployment. The entire value proposition depends on observing behavior at each stage and making data-driven decisions about whether to advance, hold, or rollback.
The three monitoring pillars
1. Technical health metrics
These are the non-negotiable metrics that every rollout stage must track:
| Metric | Measurement | Rollback Trigger |
|---|---|---|
| Error rate | Percentage of requests returning 5xx | > 0.5% increase over baseline |
| Latency (p50) | Median response time | > 50ms increase |
| Latency (p99) | Tail response time | > 200ms increase |
| CPU utilization | Average across instances | > 80% sustained |
| Memory utilization | Average across instances | > 85% sustained |
| Downstream error rate | Errors in dependent services | Any increase correlated with rollout |
2. Business metrics
Technical health is necessary but not sufficient. A feature can be technically flawless and still damage business outcomes:
| Metric | Measurement | Rollback Trigger |
|---|---|---|
| Conversion rate | Percentage completing target action | > 2% decrease |
| Revenue per session | Average revenue per user session | Any statistically significant decrease |
| User engagement | Session duration, pages per session | > 10% decrease |
| Support ticket volume | New tickets per hour | > 25% increase |
| Funnel completion | Step-by-step completion rate | Drop at any step > 5% |
3. User experience metrics
| Metric | Measurement | Rollback Trigger |
|---|---|---|
| Core Web Vitals (LCP) | Largest Contentful Paint | > 2.5s |
| Core Web Vitals (CLS) | Cumulative Layout Shift | > 0.1 |
| Client-side errors | JavaScript exceptions per session | > 2x baseline |
| Rage clicks | Repeated clicks on same element | > 3x baseline |
Building a rollout decision framework
Ad hoc decisions during rollouts lead to inconsistency. One team rolls back on a 0.1% error increase; another pushes to 100% despite a 2% latency regression. A decision framework eliminates this variance.
The promote/hold/rollback matrix:
| Signal | Promote | Hold | Rollback |
|---|---|---|---|
| Error rate | At or below baseline | Slightly above baseline (< 0.3%) | > 0.5% above baseline |
| Latency p99 | At or below baseline | < 100ms above baseline | > 200ms above baseline |
| Business metrics | Neutral or positive | Inconclusive (need more data) | Statistically significant negative |
| User complaints | None | Isolated reports (< 3) | Pattern of related complaints |
| Downstream health | All services healthy | Minor warnings, no impact | Any downstream degradation |
Decision rules:
- Promote: All signals are in the "Promote" column, or at worst "Hold" with a clear explanation
- Hold: Any signal is in the "Hold" column. Extend monitoring at current percentage for 24-48 hours
- Rollback: Any single signal is in the "Rollback" column. Do not wait for multiple signals
The rollback threshold should be unambiguous and pre-agreed. When in doubt, roll back. The cost of a delayed rollout is almost always lower than the cost of a degraded experience for users in the treatment group.
Automated rollback triggers
Manual monitoring does not scale. For critical rollouts, connect your monitoring system to your flag management platform for automated rollback:
# Datadog monitor → LaunchDarkly trigger example
# Monitor: Error rate for new-checkout-flow users
monitors:
- name: "New Checkout Error Rate"
type: metric alert
query: |
sum(last_5m):sum:checkout.errors{feature_flag:new_checkout_flow}.as_count()
/ sum:checkout.requests{feature_flag:new_checkout_flow}.as_count() > 0.05
thresholds:
critical: 0.05 # 5% error rate
warning: 0.02 # 2% error rate
notify:
- "@webhook-launchdarkly-rollback" # Triggers flag disable
- "@slack-checkout-team"
- "@pagerduty-checkout-oncall"
This creates a safety net: if the feature causes a 5% error rate spike, the flag is automatically disabled -- in seconds, not the minutes it would take for a human to notice, diagnose, and act.
The progressive delivery flag lifecycle
Here is the part most progressive delivery guides omit: what happens to all those flags after rollout completes?
Progressive delivery teams create a high volume of short-lived flags. Each feature rollout introduces at least one flag, often more. A team shipping 3-4 features per sprint generates 3-4 new rollout flags every two weeks. Over a quarter, that is 18-24 flags. Over a year, that is 72-96 flags -- per team.
Why progressive delivery flags are different
| Characteristic | Traditional Release Flag | Progressive Delivery Flag |
|---|---|---|
| Expected lifespan | 2-4 weeks | 1-3 weeks (rollout) + cleanup time |
| Complexity | Simple boolean | Percentage rules, targeting, segments |
| Monitoring integration | Optional | Required (decision gates) |
| Rollback frequency | Rare | Regular (5-10% of rollouts) |
| Volume created per quarter | 5-10 per team | 18-24 per team |
| Cleanup urgency | Moderate | High (targeting rules add complexity) |
The targeting rules and percentage configurations that make progressive delivery powerful also make stale flags more dangerous. A progressive delivery flag is not a simple if/else. It carries targeting rules, segment definitions, percentage allocations, and monitoring hooks. When this flag becomes stale, it is not just dead code -- it is dead code wrapped in configuration complexity that no one understands six months later.
The cleanup timeline for progressive delivery flags
Day 0: Flag created, rollout begins
Day 1-14: Progressive rollout (1% → 100%)
Day 14: Full rollout achieved
Day 14-28: Stabilization monitoring (2-week window)
Day 28: CLEANUP DEADLINE
Day 28-31: Flag removal PR created, reviewed, merged
Day 31: Flag fully removed from codebase
Total lifecycle: ~1 month
Any progressive delivery flag older than 45 days is overdue for cleanup. The entire methodology is predicated on short-lived, focused rollouts. A flag that lingers for months contradicts the philosophy and adds the exact kind of complexity that progressive delivery was designed to control.
The accumulation problem
Progressive delivery amplifies the flag accumulation problem because of the volume of flags created. Consider a typical engineering organization with 5 product teams, each shipping 3 features per sprint:
| Time Period | Flags Created | Flags Cleaned (80% cleanup rate) | Net Flag Growth |
|---|---|---|---|
| Month 1 | 30 | 24 | +6 |
| Month 3 | 90 | 72 | +18 |
| Month 6 | 180 | 144 | +36 |
| Month 12 | 360 | 288 | +72 |
Even with an 80% cleanup rate -- which is better than what most teams achieve -- the organization accumulates 72 stale flags per year. At 70% cleanup rate, that number jumps to 108. If cleanup drops below 50%, the accumulation quickly becomes unmanageable.
Progressive delivery demands a higher cleanup rate than traditional deployment because the flag creation rate is higher. Teams practicing progressive delivery need automated cleanup tooling not as a nice-to-have but as a prerequisite for the methodology to be sustainable.
Implementing progressive delivery with major providers
LaunchDarkly
LaunchDarkly's percentage rollout is configured through targeting rules:
// LaunchDarkly percentage rollout with fallthrough
// Configure in dashboard or via API:
// - Rule 1: Internal users → serve "on"
// - Fallthrough: 10% → "on", 90% → "off"
import * as ld from '@launchdarkly/node-server-sdk';
const client = ld.init('sdk-key');
await client.waitForInitialization();
const context: ld.LDContext = {
kind: 'user',
key: user.id,
email: user.email,
custom: {
plan: user.plan,
region: user.region,
},
};
const useNewPricing = await client.variation(
'release_new_pricing_engine',
context,
false
);
LaunchDarkly-specific progressive delivery features:
- Experimentation: Built-in A/B testing with statistical significance calculations
- Approval workflows: Require approvals for production flag changes
- Scheduled flag changes: Pre-schedule percentage increases
- Flag triggers: External systems can modify flag state (useful for automated rollback)
Unleash
Unleash uses activation strategies for progressive delivery:
// Unleash gradual rollout strategy
import io.getunleash.Unleash;
import io.getunleash.DefaultUnleash;
import io.getunleash.util.UnleashConfig;
UnleashConfig config = UnleashConfig.builder()
.appName("payment-service")
.instanceId("payment-1")
.unleashAPI("https://unleash.example.com/api")
.build();
Unleash unleash = new DefaultUnleash(config);
// Unleash handles percentage via gradualRolloutUserId strategy
// Configure in Unleash dashboard:
// Strategy: gradualRolloutUserId
// Percentage: 25
// GroupId: "new-payment-flow"
UnleashContext context = UnleashContext.builder()
.userId(user.getId())
.sessionId(session.getId())
.addProperty("region", user.getRegion())
.build();
if (unleash.isEnabled("new-payment-flow", context)) {
return processPaymentV2(order);
}
return processPaymentV1(order);
Unleash-specific progressive delivery features:
- Strategy constraints: Combine percentage rollout with attribute-based targeting
- Variants: Multi-variant flags for A/B/C testing
- Custom strategies: Define organization-specific rollout logic
- Environment-based configuration: Different rollout percentages per environment
Split.io
Split.io is purpose-built for progressive delivery with built-in experimentation:
from splitio import get_factory
factory = get_factory('API_KEY')
client = factory.client()
factory.block_until_ready(5)
# Split.io treatments support multi-variant progressive delivery
treatment = client.get_treatment(
user.id,
'new_recommendation_engine',
{
'plan': user.plan,
'country': user.country,
'signup_date': user.created_at.isoformat(),
}
)
if treatment == 'v2_algorithm':
recommendations = get_recommendations_v2(user)
elif treatment == 'v2_algorithm_with_personalization':
recommendations = get_recommendations_v2_personalized(user)
else: # 'control' or 'off'
recommendations = get_recommendations_v1(user)
# Split.io track call for experimentation metrics
client.track(user.id, 'user', 'recommendation_click', recommendation.id)
Split.io-specific progressive delivery features:
- Kill switch: Instant rollback built into every split
- Traffic allocation: Control what percentage of traffic participates in the split
- Multi-treatment: Beyond boolean on/off -- test multiple variants simultaneously
- Impression listeners: Real-time tracking of which treatment each user received
Cross-cutting concerns
Sticky assignment
Users must receive consistent treatment throughout a rollout. A user who sees the new checkout flow on Monday must see it on Tuesday. Random reassignment on every request creates a disorienting experience and invalidates any metric analysis.
All major flag providers handle this through deterministic hashing of the user identifier and the flag key. The hash produces a consistent bucket assignment, so the same user always gets the same treatment for a given flag:
hash(userId + flagKey) % 100 → bucket number
If bucket < rollout percentage → treatment "on"
Else → treatment "off"
This means increasing the rollout percentage from 10% to 25% does not reassign existing users. The users in buckets 0-9 stay on the new treatment; users in buckets 10-24 join them.
Multi-service rollouts
Modern architectures rarely have features contained in a single service. A new checkout flow might span the frontend, the cart service, the payment service, and the notification service. Progressive delivery across multiple services requires coordination:
Option 1: Shared flag, independent evaluation
Each service evaluates the same flag key independently. Because assignment is deterministic by user ID, the same user gets consistent treatment across services.
User ID: "user-123"
Flag: "release_new_checkout"
→ Frontend evaluates: ON (hash bucket 7, rollout at 10%)
→ Cart service evaluates: ON (same hash, same bucket)
→ Payment service evaluates: ON (same hash, same bucket)
Option 2: Flag context propagation
The entry-point service evaluates the flag and propagates the decision via request headers or context:
Frontend evaluates flag → ON
→ Sets header: X-Feature-New-Checkout: true
→ Cart service reads header: new checkout path
→ Payment service reads header: new payment path
Option 1 is simpler but requires all services to have access to the flag management SDK. Option 2 centralizes the decision but requires header propagation infrastructure.
Database migrations and progressive delivery
Progressive delivery works well for stateless features but requires careful handling when database schema changes are involved. You cannot progressively roll out a database migration to 10% of users.
Pattern: Expand-Contract with feature flags
Phase 1 (Expand): Deploy new schema alongside old
→ Flag OFF: Read/write old schema
→ Flag ON: Write to both, read from new
Phase 2 (Progressive rollout): Gradually shift reads
→ 10% of users read from new schema
→ Monitor data consistency
→ Increase to 100%
Phase 3 (Contract): Remove old schema
→ Flag cleanup: remove the migration flag
→ Drop old columns/tables
This pattern ensures data integrity while still enabling progressive rollout. The feature flag controls which schema is authoritative for reads, while writes go to both schemas during the transition.
Building the progressive delivery practice
Start small: The first rollout
For teams new to progressive delivery, start with a low-risk feature and a simple rollout plan:
- Choose a non-critical feature -- a UI change, a new informational widget, a cosmetic update
- Define three stages: 10% for 48 hours, 50% for 48 hours, 100%
- Define one rollback trigger: Error rate increase > 1%
- Assign one person to monitor and make promote/hold/rollback decisions
- Set a cleanup date for 14 days after reaching 100%
This minimal approach teaches the mechanics without the pressure of a critical feature rollout.
Scaling: The progressive delivery checklist
As the practice matures, standardize with a checklist for every rollout:
Pre-rollout:
- Flag created with clear naming convention
- Rollout plan documented (stages, percentages, durations)
- Monitoring dashboards configured for treatment vs. control comparison
- Rollback triggers defined with specific thresholds
- Sticky assignment verified (same user, same treatment)
- Multi-service coordination confirmed (if applicable)
- Cleanup ticket created with date 14 days after expected 100% rollout
During rollout:
- Each stage monitored for defined duration before advancing
- Promote/hold/rollback decision documented at each stage
- Anomalies investigated before advancing (even if within thresholds)
- Stakeholders notified at 50% and 100% milestones
Post-rollout:
- Stabilization monitoring for 14 days at 100%
- Business metrics confirmed stable
- Flag cleanup PR created or generated
- Cleanup PR reviewed and merged within 7 days
- Flag archived in management platform
- Monitoring dashboards cleaned up
The cleanup imperative
Progressive delivery and flag cleanup are two sides of the same coin. You cannot practice progressive delivery sustainably without a disciplined cleanup process. The math is unforgiving: if your team creates 20 progressive delivery flags per quarter and cleans up 15, you accumulate 20 stale flags per year. In three years, that is 60 stale flags -- each carrying the targeting rules, percentage configurations, and monitoring hooks from rollouts that ended long ago.
Automated cleanup tooling is not optional for progressive delivery teams. Tools like FlagShark detect when progressive delivery flags have completed their rollout and stabilization phases, then generate cleanup PRs that safely remove the flag code using tree-sitter-based AST analysis. This closes the lifecycle loop automatically: the flag is created for the rollout, it serves its purpose during progressive delivery, and it is removed when that purpose is fulfilled.
Metrics that matter
Track these metrics to evaluate the health of your progressive delivery practice:
| Metric | Healthy | Warning | Action Required |
|---|---|---|---|
| Average rollout duration (0% to 100%) | 5-14 days | 14-30 days | > 30 days |
| Rollback rate | 5-10% | 10-20% | > 20% |
| Time from 100% to flag cleanup | < 30 days | 30-60 days | > 60 days |
| Active progressive delivery flags | < 10 per team | 10-20 per team | > 20 per team |
| Flag cleanup rate | > 90% | 70-90% | < 70% |
| Incidents caused by stale rollout flags | 0 per quarter | 1 per quarter | > 1 per quarter |
Progressive delivery is the most effective methodology for reducing deployment risk while maintaining development velocity. Feature flags are the mechanism that makes it possible. But the methodology only works when the flags that enable it are treated as temporary infrastructure with a defined lifecycle. Create the flag, execute the rollout, monitor for stability, clean up the flag, repeat. Teams that nail this cycle ship faster, break less, and maintain the codebase clarity that allows them to keep shipping faster. Teams that skip the cleanup step eventually drown in the very flags that were supposed to make them faster.