Your team has a thorough CI/CD pipeline. Every pull request is linted, tested, type-checked, and scanned for security vulnerabilities before it can merge. But feature flags? They flow into the codebase unchecked and accumulate uncleaned, completely invisible to the pipeline that guards everything else.
This is the blind spot in modern CI/CD. Organizations invest heavily in automated quality gates for code style, test coverage, dependency vulnerabilities, and even accessibility compliance. Yet the single largest source of long-lived technical debt -- abandoned feature flags -- passes through every gate without triggering a single warning.
Based on what we have seen, most engineering teams accumulate a steady stream of stale flags every quarter. Over a year, that can easily mean dozens of abandoned flags adding conditional complexity, inflating test matrices, and slowing down every developer who encounters them. The fix is not more discipline or more spreadsheets. The fix is treating flag hygiene the same way you treat code quality: as an automated, enforceable part of your CI/CD pipeline.
Why flags need CI/CD integration
Feature flag management platforms like LaunchDarkly, Split, and Unleash excel at controlling flag evaluation at runtime. They tell you which flags exist, who they target, and whether they are on or off. What they cannot tell you is where a flag lives in your code, how deeply it is embedded, or whether removing it is safe.
That gap between the management platform and the codebase is where technical debt breeds. A flag can be archived in LaunchDarkly while still controlling critical code paths in production. A flag can be 100% enabled for six months with no one realizing it should have been hardcoded and cleaned up weeks after rollout.
CI/CD is the natural enforcement point for flag hygiene because it already sits at the intersection of code changes and deployment decisions. Every relevant event -- flag creation, flag aging, flag removal -- manifests as a code change that passes through your pipeline.
The flag lifecycle mapped to CI/CD events
| Flag Lifecycle Stage | CI/CD Event | Automated Action |
|---|---|---|
| Flag created | New flag reference detected in PR | Log flag creation, start age tracking |
| Flag aging | Time passes, flag remains in code | Warn at 30/60/90 days in PR checks |
| Flag fully rolled out | Flag at 100% for N days | Generate cleanup PR automatically |
| Flag being removed | Removal PR opened | Validate complete removal, run tests |
| Flag removed | Removal PR merged | Confirm no remaining references |
| Flag re-introduced | Old flag key reappears | Block merge, require justification |
Building the flag-aware CI pipeline
Integrating flag cleanup into CI/CD involves four layers, each building on the previous one. Teams can adopt these incrementally, starting with detection and progressing to full enforcement.
Layer 1: Flag detection on every PR
The foundation is knowing when flags enter or leave your codebase. A PR check that scans for flag SDK method calls gives you visibility into every flag change.
GitHub Actions example -- flag detection check:
name: Flag Detection
on:
pull_request:
types: [opened, synchronize]
jobs:
detect-flags:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Detect flag changes in PR
run: |
# Get changed files
CHANGED_FILES=$(git diff --name-only origin/${{ github.base_ref }}...HEAD)
# Patterns for common flag SDK methods
FLAG_PATTERNS=(
"variation\("
"BoolVariation\("
"StringVariation\("
"isEnabled\("
"is_enabled\("
"useFeatureFlag\("
"getFlag\("
)
NEW_FLAGS=()
REMOVED_FLAGS=()
for file in $CHANGED_FILES; do
if [ -f "$file" ]; then
for pattern in "${FLAG_PATTERNS[@]}"; do
# Check for new flag references (added lines)
ADDED=$(git diff origin/${{ github.base_ref }}...HEAD -- "$file" \
| grep "^+" | grep -E "$pattern" || true)
if [ -n "$ADDED" ]; then
NEW_FLAGS+=("$file: $ADDED")
fi
# Check for removed flag references
REMOVED=$(git diff origin/${{ github.base_ref }}...HEAD -- "$file" \
| grep "^-" | grep -E "$pattern" || true)
if [ -n "$REMOVED" ]; then
REMOVED_FLAGS+=("$file: $REMOVED")
fi
done
fi
done
# Report findings
if [ ${#NEW_FLAGS[@]} -gt 0 ]; then
echo "::notice::New flag references detected in this PR"
printf '%s\n' "${NEW_FLAGS[@]}"
fi
if [ ${#REMOVED_FLAGS[@]} -gt 0 ]; then
echo "::notice::Flag references removed in this PR"
printf '%s\n' "${REMOVED_FLAGS[@]}"
fi
This basic approach uses regex matching and works as a starting point. For production use, AST-based parsing (using tools like tree-sitter) provides significantly more accurate detection. Regex will miss dynamic flag key construction and generate false positives on comments and strings that happen to match the pattern.
What this gives you: Visibility. Every PR that adds or removes a flag reference is annotated, creating a searchable history of flag changes in your repository.
Layer 2: Age-based warnings and enforcement
Once you can detect flags, the next step is tracking their age and surfacing warnings when flags exceed their expected lifespan.
GitHub Actions example -- flag age warnings:
name: Flag Age Check
on:
pull_request:
types: [opened, synchronize]
jobs:
check-flag-age:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Check for stale flag references
run: |
# Define age thresholds (in days)
WARN_THRESHOLD=60
ERROR_THRESHOLD=90
BLOCK_THRESHOLD=180
WARNINGS=0
ERRORS=0
# Get all flag references in changed files
CHANGED_FILES=$(git diff --name-only origin/${{ github.base_ref }}...HEAD)
for file in $CHANGED_FILES; do
if [ -f "$file" ]; then
# Find flag key strings in the file
FLAG_KEYS=$(grep -oP '["'"'"']([a-z][a-z0-9_-]+\.?[a-z0-9_-]+)["'"'"']' "$file" \
| sort -u || true)
for key in $FLAG_KEYS; do
# Check when this flag key first appeared in git history
FIRST_COMMIT_DATE=$(git log --all --diff-filter=A \
-S "$key" --format="%ai" -- "$file" | tail -1)
if [ -n "$FIRST_COMMIT_DATE" ]; then
FIRST_EPOCH=$(date -d "$FIRST_COMMIT_DATE" +%s 2>/dev/null || echo "0")
NOW_EPOCH=$(date +%s)
AGE_DAYS=$(( (NOW_EPOCH - FIRST_EPOCH) / 86400 ))
if [ "$AGE_DAYS" -gt "$BLOCK_THRESHOLD" ]; then
echo "::error file=$file::Flag '$key' is $AGE_DAYS days old (threshold: $BLOCK_THRESHOLD days). This flag must be cleaned up before adding new references."
ERRORS=$((ERRORS + 1))
elif [ "$AGE_DAYS" -gt "$ERROR_THRESHOLD" ]; then
echo "::error file=$file::Flag '$key' is $AGE_DAYS days old (threshold: $ERROR_THRESHOLD days). Schedule cleanup immediately."
ERRORS=$((ERRORS + 1))
elif [ "$AGE_DAYS" -gt "$WARN_THRESHOLD" ]; then
echo "::warning file=$file::Flag '$key' is $AGE_DAYS days old (threshold: $WARN_THRESHOLD days). Consider scheduling cleanup."
WARNINGS=$((WARNINGS + 1))
fi
fi
done
fi
done
echo "Flag age check complete: $WARNINGS warnings, $ERRORS errors"
# Optionally fail the check on errors
if [ "$ERRORS" -gt 0 ]; then
echo "::error::$ERRORS flag(s) exceed the maximum age threshold. Clean up stale flags before modifying them."
exit 1
fi
Recommended age thresholds:
| Flag Age | CI Action | Rationale |
|---|---|---|
| 0-30 days | No action | Normal development lifecycle |
| 30-60 days | Info annotation | Gentle reminder to plan cleanup |
| 60-90 days | Warning | Flag likely stale, should be scheduled |
| 90-120 days | Error (non-blocking) | Flag is overdue for cleanup |
| 120-180 days | Error (blocking on new references) | No new code should touch this flag |
| 180+ days | Error (blocking all changes) | Flag must be removed before any related changes |
The key principle is progressive enforcement. Early warnings are informational. Later stages become blocking. This gives teams time to schedule cleanup without creating a sudden enforcement cliff that breaks existing workflows.
Layer 3: Flag management platform integration
Connecting your CI pipeline to your flag management platform (LaunchDarkly, Split, Unleash, etc.) enables much richer checks. You can verify whether a flag is still active, what percentage it is rolled out to, and whether it has been archived.
GitHub Actions example -- LaunchDarkly integration:
name: Flag Platform Sync
on:
pull_request:
types: [opened, synchronize]
schedule:
- cron: '0 9 * * 1' # Weekly Monday check
jobs:
sync-flag-status:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check flag status against LaunchDarkly
env:
LD_API_KEY: ${{ secrets.LAUNCHDARKLY_API_KEY }}
LD_PROJECT: ${{ vars.LD_PROJECT_KEY }}
run: |
# Fetch all flags from LaunchDarkly
FLAGS_RESPONSE=$(curl -s -H "Authorization: $LD_API_KEY" \
"https://app.launchdarkly.com/api/v2/flags/$LD_PROJECT?summary=true")
# Parse flag statuses
echo "$FLAGS_RESPONSE" | jq -r '.items[] | "\(.key) \(.archived) \(.environments.production.on)"' \
> /tmp/ld_flags.txt
STALE_IN_CODE=()
# Find flag references in code that are archived in LaunchDarkly
while IFS=' ' read -r flag_key archived prod_on; do
if [ "$archived" = "true" ]; then
# Flag is archived in LD but might still be in code
REFERENCES=$(grep -r "$flag_key" --include="*.go" --include="*.ts" \
--include="*.py" --include="*.java" -l . || true)
if [ -n "$REFERENCES" ]; then
STALE_IN_CODE+=("$flag_key (archived in LD, still in: $REFERENCES)")
fi
fi
done < /tmp/ld_flags.txt
if [ ${#STALE_IN_CODE[@]} -gt 0 ]; then
echo "## Stale Flags Found" >> $GITHUB_STEP_SUMMARY
echo "The following flags are archived in LaunchDarkly but still referenced in code:" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
for flag in "${STALE_IN_CODE[@]}"; do
echo "- $flag" >> $GITHUB_STEP_SUMMARY
done
fi
Platform integration checks to implement:
| Check | What It Catches | Severity |
|---|---|---|
| Archived flag in code | Flag removed from platform but code still branches on it | Error |
| 100% rollout for 30+ days | Flag fully enabled but not cleaned up | Warning |
| Code reference with no platform flag | Typo in flag key or flag deleted from platform | Error |
| Platform flag with no code references | Flag exists in platform but is not used anywhere | Info |
| Flag targeting "everyone" | Flag that should be hardcoded | Warning |
Layer 4: Automated cleanup PR generation
The most impactful CI/CD integration is automated generation of cleanup pull requests. When a flag has been fully rolled out for a defined period, the pipeline automatically creates a PR that removes the flag from code, replacing conditional logic with the winning code path.
The automated cleanup workflow:
Flag reaches 100% rollout
↓
Grace period elapses (e.g., 14 days at 100%)
↓
CI detects flag is a cleanup candidate
↓
Automated PR is generated:
├── Removes flag evaluation calls
├── Removes dead code branches (the "off" path)
├── Removes flag imports if no longer needed
└── Adds PR description with context and verification steps
↓
PR is assigned to flag owner for review
↓
Standard review and merge process
↓
Follow-up check confirms all references removed
GitHub Actions example -- scheduled cleanup PR generation:
name: Flag Cleanup Generator
on:
schedule:
- cron: '0 10 * * 1' # Every Monday at 10 AM
workflow_dispatch: # Allow manual trigger
jobs:
generate-cleanup-prs:
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
steps:
- uses: actions/checkout@v4
- name: Identify cleanup candidates
id: candidates
env:
LD_API_KEY: ${{ secrets.LAUNCHDARKLY_API_KEY }}
LD_PROJECT: ${{ vars.LD_PROJECT_KEY }}
run: |
# Fetch flags that are 100% ON for 14+ days
FLAGS=$(curl -s -H "Authorization: $LD_API_KEY" \
"https://app.launchdarkly.com/api/v2/flags/$LD_PROJECT")
CLEANUP_CANDIDATES=$(echo "$FLAGS" | jq -r '
.items[]
| select(.environments.production.on == true)
| select(.environments.production.lastModified
| split("T")[0]
| strptime("%Y-%m-%d")
| mktime < (now - 1209600))
| .key
')
echo "candidates=$CLEANUP_CANDIDATES" >> $GITHUB_OUTPUT
- name: Generate removal PRs
if: steps.candidates.outputs.candidates != ''
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
for flag_key in ${{ steps.candidates.outputs.candidates }}; do
BRANCH="cleanup/remove-flag-${flag_key}"
# Check if cleanup PR already exists
EXISTING=$(gh pr list --search "head:$BRANCH" --json number --jq length)
if [ "$EXISTING" -gt 0 ]; then
echo "Cleanup PR already exists for $flag_key, skipping"
continue
fi
# Create cleanup branch
git checkout -b "$BRANCH"
# Remove flag references (simplified -- production would use AST parsing)
grep -rl "$flag_key" --include="*.go" --include="*.ts" --include="*.py" . \
| while read file; do
echo "Would clean $flag_key from $file"
# AST-based removal would happen here
done
# Create PR if there are changes
if [ -n "$(git status --porcelain)" ]; then
git add -A
git commit -m "chore: remove stale flag '$flag_key'"
git push origin "$BRANCH"
gh pr create \
--title "chore: remove stale flag '$flag_key'" \
--body "## Automated Flag Cleanup
This PR removes the feature flag \`$flag_key\` which has been 100% enabled for 14+ days.
### Verification
- [ ] Flag is confirmed 100% ON in all environments
- [ ] No active experiments depend on this flag
- [ ] Tests pass with flag code removed
- [ ] No other flags depend on this flag's value
*Generated automatically by the Flag Cleanup pipeline.*" \
--label "flag-cleanup,automated"
fi
git checkout main
done
This example demonstrates the pattern, but production-grade flag removal requires AST-based code transformation to safely remove conditional branches. Regex-based removal is fragile and dangerous for anything beyond trivial flag usage. Tools like FlagShark handle this complexity by using tree-sitter parsing to understand the syntax tree and generate safe, accurate removal PRs across multiple programming languages.
The flag cleanup gate pattern
The most effective CI/CD integration pattern is the "flag cleanup gate," a structured approach that combines all four layers into a unified policy enforcement system.
How the flag cleanup gate works
Developer opens PR
↓
┌─────────────────────────────────────────────┐
│ FLAG CLEANUP GATE │
│ │
│ 1. DETECT: Scan for flag references │
│ → New flags? Log creation event │
│ → Removed flags? Verify complete removal │
│ │
│ 2. AGE CHECK: Evaluate flag ages │
│ → Under 60 days? Pass │
│ → 60-90 days? Warning annotation │
│ → 90+ days? Require cleanup plan │
│ │
│ 3. PLATFORM SYNC: Check flag status │
│ → Archived in platform? Block merge │
│ → 100% enabled 30+ days? Warn │
│ → Unknown flag key? Error │
│ │
│ 4. POLICY: Enforce team standards │
│ → Max flags per service? Check │
│ → Required flag documentation? Check │
│ → Expiration date set? Check │
│ │
│ RESULT: Pass / Warn / Block │
└─────────────────────────────────────────────┘
↓
Standard review and merge flow
Policy configuration example
Define your flag cleanup policies in a configuration file that lives in your repository:
# .flag-policy.yaml
flag_cleanup:
detection:
enabled: true
languages: [go, typescript, python, java]
scan_config_files: true
age_thresholds:
info: 30 # days
warning: 60
error: 90
block: 180
platform_integration:
provider: launchdarkly
project_key: my-project
check_archived: true
check_fully_rolled_out: true
fully_rolled_out_grace_days: 14
enforcement:
block_new_references_to_stale_flags: true
require_expiration_date: true
max_flags_per_file: 5
require_flag_documentation: false
cleanup_prs:
enabled: true
schedule: weekly
auto_assign_to: flag_owner
labels: [flag-cleanup, automated]
require_approval: true
Enforcement levels for different teams
Not every team is ready for full enforcement on day one. Adopt the flag cleanup gate progressively:
| Level | Detection | Age Warnings | Platform Sync | Blocking | Cleanup PRs |
|---|---|---|---|---|---|
| Level 1: Observability | On | Info only | Off | None | Off |
| Level 2: Awareness | On | Warnings | On | None | Off |
| Level 3: Guidance | On | Warnings + Errors | On | Stale flags only | Manual trigger |
| Level 4: Enforcement | On | All thresholds | On | Full policy | Automated weekly |
| Level 5: Zero-debt | On | All thresholds | On | Full policy + max flag limits | Automated daily |
Most teams should start at Level 2 and progress to Level 4 over one quarter. Level 5 is aspirational and appropriate for teams with mature flag management practices.
Measuring CI/CD flag integration effectiveness
Track these metrics to evaluate whether your CI/CD flag integration is working:
Leading indicators (process health)
| Metric | Target | How to Measure |
|---|---|---|
| Flag detection coverage | 100% of PRs scanned | CI job success rate |
| Age warning response rate | 80%+ warnings addressed within 2 weeks | Time from warning to flag removal |
| Cleanup PR merge time | < 1 week from generation | PR open duration |
| False positive rate | < 5% of detections | Manual review of flagged items |
Lagging indicators (outcomes)
The outcomes you should expect to see after integrating flag cleanup into CI/CD include:
- Average flag age drops significantly. Flags that used to linger for months get cleaned up within weeks.
- Stale flag percentage declines. The proportion of flags older than 90 days should decrease steadily.
- Cleanup velocity increases. Automated detection and cleanup PR generation means more flags get removed per month with less manual effort.
- Flag-related incidents decrease. Fewer stale flags means fewer unexpected interactions and dead code paths in production.
In our experience, the key driver is not the blocking enforcement -- it is the visibility. When developers see age warnings on every PR, they internalize flag lifecycle management as a natural part of development, not a separate chore.
Common pitfalls and how to avoid them
Pitfall 1: Starting with blocking enforcement
Teams that immediately block merges for stale flags face developer revolt. Engineers with deadlines will work around blocking checks rather than pause to clean up flags they did not create. Start with visibility and warnings. Build the cultural expectation before adding enforcement.
Pitfall 2: Regex-based detection in production
Simple grep patterns catch obvious cases but miss dynamic flag evaluation, generate false positives on comments and documentation, and cannot distinguish between flag creation and flag removal. AST-based parsing is essential for accurate detection at scale.
Pitfall 3: Ignoring configuration files
Flags are not only referenced in application code. They appear in configuration files, environment variables, Terraform modules, Kubernetes manifests, and CI/CD pipeline definitions. Your scanning must cover these non-code locations.
Pitfall 4: No grace period for newly created flags
A flag created yesterday should not trigger a 90-day warning. Ensure your age checks account for flag creation date, not just the date the file was last modified. This is a common bug in naive implementations that generates noisy false positives.
Pitfall 5: Treating all flags the same
Kill switches, experiment flags, and release flags have different expected lifespans. Your policy should differentiate between flag types:
| Flag Type | Expected Lifespan | Warning Threshold | Block Threshold |
|---|---|---|---|
| Release flag | 2-4 weeks | 60 days | 120 days |
| Experiment flag | 1-2 weeks | 30 days | 60 days |
| Ops/kill switch | Indefinite | Annual review | Never |
| Permission flag | Varies | 90 days | 180 days |
Getting started: A 30-day adoption plan
Week 1: Detection only
- Add a flag detection job to your CI pipeline (Layer 1)
- Configure it to annotate PRs with flag changes -- no blocking
- Run it for a week to calibrate detection accuracy
- Fix false positives by tuning patterns or switching to AST-based detection
Week 2: Age tracking and warnings
- Implement age-based warnings (Layer 2)
- Start with generous thresholds (90-day warning, 180-day error)
- Review the first week's warnings with the team
- Adjust thresholds based on team feedback
Week 3: Platform integration
- Connect your CI pipeline to your flag management platform (Layer 3)
- Implement the "archived in platform, still in code" check
- Run the weekly platform sync report
- Review results with engineering leads
Week 4: Cleanup automation
- Enable automated cleanup PR generation (Layer 4)
- Start with manual trigger only (no scheduled generation)
- Generate cleanup PRs for the 5 oldest stale flags
- Refine the PR template and review process based on feedback
The CI/CD pipeline is the most under-utilized tool in the fight against feature flag debt. Every organization already has the infrastructure to enforce flag hygiene -- they just have not connected the pieces. By treating flag cleanup as a first-class CI/CD concern, you transform flag management from an afterthought into an automated, measurable, and enforceable part of your development workflow. The flags that used to haunt your codebase for years will now have a clear path from creation to cleanup, tracked and enforced at every step.