Somewhere in your codebase right now, there are feature flags that nobody remembers creating. Flags controlling code paths that haven't been touched in months. Flags referencing experiments that ended two quarters ago. Flags whose owners left the company a year before you joined.
You know they're there. You've stumbled across them during late-night debugging sessions, muttered something about cleaning them up "next sprint," and moved on. The backlog is full, the roadmap is packed, and flag cleanup never quite makes it to the top.
Here is the uncomfortable truth: most enterprise codebases contain far more feature flags than anyone expects, and the majority of them are stale. Every one of those stale flags adds cognitive load, increases testing complexity, and creates potential failure modes. But most teams never audit their flags because they assume it will take days of painstaking work.
It does not have to. You can perform a meaningful feature flag audit in 30 minutes flat. Not a comprehensive deep-clean -- that comes later -- but a rapid triage that identifies your biggest risks, builds a removal backlog, and gives you the data to justify dedicated cleanup time to your engineering lead.
Set a timer. Let's go.
Minute 0-5: Build your flag inventory with grep
Before you can assess anything, you need a list. The fastest way to build a flag inventory is to search your codebase for calls to your feature flag SDK.
Finding flags by SDK method calls
Start with your primary flag provider. If you use LaunchDarkly, Unleash, Split, or any other provider, you already know the method names. Search for those directly.
LaunchDarkly (Go):
grep -rn "BoolVariation\|StringVariation\|IntVariation\|Float64Variation\|JSONVariation" \
--include="*.go" \
--exclude-dir=vendor \
--exclude-dir=node_modules \
. | sort -t: -k1,1 > /tmp/flag-audit.txt
LaunchDarkly (TypeScript/JavaScript):
grep -rn "variation\|boolVariation\|stringVariation\|intVariation\|jsonVariation" \
--include="*.ts" --include="*.tsx" --include="*.js" --include="*.jsx" \
--exclude-dir=node_modules \
--exclude-dir=dist \
. | sort -t: -k1,1 >> /tmp/flag-audit.txt
LaunchDarkly (Python):
grep -rn "variation\|variation_detail\|bool_variation" \
--include="*.py" \
--exclude-dir=venv \
--exclude-dir=.venv \
. | sort -t: -k1,1 >> /tmp/flag-audit.txt
Unleash (any language):
grep -rn "isEnabled\|is_enabled\|IsEnabled\|getVariant\|get_variant\|GetVariant" \
--include="*.go" --include="*.ts" --include="*.tsx" --include="*.js" --include="*.py" \
--exclude-dir=vendor --exclude-dir=node_modules --exclude-dir=venv \
. | sort -t: -k1,1 >> /tmp/flag-audit.txt
Extracting unique flag keys
Now extract just the flag key strings from those results:
grep -ohP '(?:Variation|isEnabled|is_enabled|IsEnabled|GetVariant)\s*\(\s*"([^"]+)"' \
/tmp/flag-audit.txt | \
grep -oP '"[^"]+"' | \
sort -u > /tmp/flag-keys.txt
wc -l /tmp/flag-keys.txt
That wc -l output is your total flag count. Write it down -- this is your baseline metric.
What you should have after 5 minutes:
- A file listing every line in your codebase that evaluates a flag
- A deduplicated list of unique flag keys
- A total flag count
If your total is under 20, you are in decent shape. Between 20 and 50, you have work to do. Over 100, you have a serious flag debt problem. Over 200, you are not alone -- but you need to act soon.
Minute 5-10: Cross-reference with your flag management platform
Open your flag management platform (LaunchDarkly, Unleash, Split, ConfigCat, or whatever you use) in another browser tab. You are looking for two things: flags that exist in code but not in the platform, and flags that exist in the platform but not in code.
Flags in code but not in the platform
These are orphaned flags -- code references to flags that have been deleted from your management platform. They are evaluating against default values, which means they are dead code paths that still add complexity.
Export your platform's flag list (most platforms have a CSV or API export) and compare:
# Export your platform flags to a file (method varies by provider)
# For LaunchDarkly CLI:
# ld flags list --project default --environment production > /tmp/platform-flags.txt
# Compare: flags in code but NOT in platform
comm -23 /tmp/flag-keys.txt /tmp/platform-flags.txt > /tmp/orphaned-flags.txt
echo "Orphaned flags (in code, not in platform):"
cat /tmp/orphaned-flags.txt
Flags in the platform but not in code
These are phantom flags -- configured in your management platform but never referenced in code. They might be remnants of deleted features, or they might indicate that flag cleanup happened in code but nobody removed the platform configuration.
# Compare: flags in platform but NOT in code
comm -13 /tmp/flag-keys.txt /tmp/platform-flags.txt > /tmp/phantom-flags.txt
echo "Phantom flags (in platform, not in code):"
cat /tmp/phantom-flags.txt
Quick platform checks
While you are in your platform, note these details for each flag:
| Check | What to Look For |
|---|---|
| Targeting rules | Flags with no targeting rules (serving default to everyone) |
| Percentage rollout | Flags at 100% or 0% for more than 30 days |
| Last evaluated | Flags with zero evaluations in the past 30 days |
| Environments | Flags with different states across environments |
| Tags/descriptions | Flags with no tags, descriptions, or ownership metadata |
What you should have after 10 minutes:
- A list of orphaned flags (in code, not in platform)
- A list of phantom flags (in platform, not in code)
- Notes on flags with suspicious platform configurations
Minute 10-15: Check git blame for age and ownership
Now it gets interesting. For each flag in your inventory, you want to know two things: how old is it, and who created it?
Finding flag creation dates
Use git log to find when each flag key first appeared in the codebase:
while IFS= read -r flag; do
# Remove quotes from flag key
clean_flag=$(echo "$flag" | tr -d '"')
# Find the first commit that introduced this flag
first_commit=$(git log --all --diff-filter=A -p --reverse -S "$clean_flag" \
--format="%H %ai %an" -- '*.go' '*.ts' '*.tsx' '*.js' '*.py' | head -1)
if [ -n "$first_commit" ]; then
echo "$clean_flag | $first_commit"
else
echo "$clean_flag | UNKNOWN ORIGIN"
fi
done < /tmp/flag-keys.txt > /tmp/flag-ages.txt
This command searches the entire git history for the first commit that added each flag key. It outputs the flag name, commit hash, date, and author.
Quick age analysis
# Count flags older than 90 days
echo "=== Flag Age Distribution ==="
echo "Flags older than 90 days:"
awk -F'|' '{print $2}' /tmp/flag-ages.txt | \
awk '{print $2}' | \
while read date; do
if [ "$(date -d "$date" +%s 2>/dev/null)" -lt "$(date -d '90 days ago' +%s)" ]; then
echo "$date"
fi
done | wc -l
echo "Flags older than 180 days:"
awk -F'|' '{print $2}' /tmp/flag-ages.txt | \
awk '{print $2}' | \
while read date; do
if [ "$(date -d "$date" +%s 2>/dev/null)" -lt "$(date -d '180 days ago' +%s)" ]; then
echo "$date"
fi
done | wc -l
Ownership check
For each flag, identify whether the original author is still on the team:
# List unique flag authors
echo "=== Flag Authors ==="
awk -F'|' '{print $2}' /tmp/flag-ages.txt | awk '{for(i=4;i<=NF;i++) printf "%s ",$i; print ""}' | sort | uniq -c | sort -rn
Flags whose authors have left the company are higher risk -- there is no institutional knowledge about why they were created or what edge cases they handle.
What you should have after 15 minutes:
- The creation date for each flag
- The original author for each flag
- A count of flags by age bucket (30/60/90/180+ days)
- A list of flags with no identifiable owner
Minute 15-20: Categorize your flags
Now categorize every flag into one of five buckets. This is the most important step because it determines your action plan.
The five flag categories
| Category | Definition | Criteria | Action |
|---|---|---|---|
| Active | Currently in rollout or experiment | Created < 30 days ago, targeting rules active, not at 100% | Leave alone |
| Completed | Rollout finished, flag at 100% | At 100% for 30+ days, no targeting rules | Remove (safe) |
| Stale | No recent activity, unclear purpose | 90+ days old, no recent evaluations, no documentation | Investigate, then remove |
| Orphaned | In code but not in platform | Flag key not found in management platform | Remove (safe) |
| Risky | Complex dependencies or interactions | Nested with other flags, shared across services, kill switch | Careful removal with testing |
Scoring rubric
Assign each flag a risk score from 1-10 based on these factors:
| Factor | Low Risk (1-3) | Medium Risk (4-6) | High Risk (7-10) |
|---|---|---|---|
| Age | < 30 days | 30-90 days | 90+ days |
| References | 1-2 files | 3-5 files | 6+ files |
| Nesting | No nesting | Inside one conditional | Nested with other flags |
| Owner | Active team member | Team member, different team | Left company / unknown |
| Test coverage | Flag-specific tests exist | General tests cover path | No test coverage |
| Documentation | Documented with purpose | Partial documentation | No documentation |
| Service scope | Single service | 2-3 services | Cross-service dependency |
Total the scores. Flags scoring 25+ out of 70 should be prioritized for immediate investigation. Flags scoring 40+ are ticking time bombs.
Building the audit spreadsheet
Create a spreadsheet (or CSV) with this template:
Flag Key,Category,Age (Days),Owner,Files Referenced,Risk Score,Platform Status,Last Evaluated,Action,Priority,Ticket
enable-new-checkout,Completed,127,jane.doe,3,18,100% ON,2025-08-15,Remove,High,
legacy-auth-bypass,Stale,340,UNKNOWN,7,42,Not Found,Never,Investigate,Critical,
experiment-pricing-v2,Active,12,bob.smith,2,8,50% rollout,2025-09-10,Monitor,Low,
temp-fix-api-timeout,Orphaned,95,sarah.jones,1,22,Not Found,N/A,Remove,High,
Fill this out as fast as you can -- you do not need perfect data for every column. Estimates are fine. The goal is a complete picture, not a perfect one.
What you should have after 20 minutes:
- Every flag categorized into one of five buckets
- A risk score for each flag
- A spreadsheet tracking all flags with key metadata
Minute 20-25: Prioritize removal candidates
With your categorized inventory, sort by priority. The goal is to identify the flags you can remove with the least risk and the most impact.
The removal priority matrix
| Priority | Category | Criteria | Estimated Effort |
|---|---|---|---|
| P0 - Immediate | Orphaned | In code, not in platform, evaluating defaults | 15-30 min per flag |
| P1 - This Sprint | Completed | 100% on/off for 90+ days, clear purpose | 30-60 min per flag |
| P2 - Next Sprint | Stale | 90+ days, low reference count, owner available | 1-2 hours per flag |
| P3 - Scheduled | Stale | High reference count, complex dependencies | 2-4 hours per flag |
| P4 - Investigate | Risky | Nested flags, cross-service, unknown purpose | 4+ hours per flag |
Quick wins: flags you can remove today
Orphaned flags are almost always safe to remove immediately. They reference flag keys that no longer exist in your management platform, which means they are already evaluating to their default values in production. Removing them changes nothing about runtime behavior -- it just removes dead code.
Completed flags at 100% are the next easiest. If a flag has been serving 100% of traffic for 90+ days with no issues, the feature is stable. The flag is no longer providing value; it is only adding complexity.
Estimating cleanup impact
Calculate the total hours needed to clear your backlog:
P0 flags: ___ flags x 0.5 hours = ___ hours
P1 flags: ___ flags x 0.75 hours = ___ hours
P2 flags: ___ flags x 1.5 hours = ___ hours
P3 flags: ___ flags x 3 hours = ___ hours
P4 flags: ___ flags x 5 hours = ___ hours
---
Total estimated cleanup: ___ hours
This number is critical for step 6. It tells your engineering lead exactly how much investment is needed and lets you plan cleanup across multiple sprints rather than trying to do everything at once.
What you should have after 25 minutes:
- A prioritized removal list
- Estimated effort for each priority tier
- A total cleanup hours estimate
- A shortlist of quick wins you can tackle immediately
Minute 25-30: Create cleanup tickets
The audit is worthless if it does not lead to action. In the final five minutes, convert your findings into trackable work items.
Creating effective cleanup tickets
For each P0 and P1 flag, create a ticket with this template:
## Remove feature flag: [FLAG_KEY]
**Category:** [Orphaned / Completed / Stale]
**Risk Score:** [X/70]
**Age:** [X days]
**Owner:** [Original author]
**Files affected:** [List files]
### Context
[One sentence about what this flag controlled]
### Acceptance Criteria
- [ ] Flag evaluation code removed from all files
- [ ] Default/winning code path preserved
- [ ] Dead code path removed
- [ ] Flag removed from management platform (if applicable)
- [ ] Tests updated to remove flag-specific branches
- [ ] No regressions in affected test suites
### Estimated Effort
[X hours]
Batch tickets for efficiency
For P2 and P3 flags, group them into batch tickets by service or module:
## Flag cleanup batch: [Service/Module Name]
**Flags to remove:** [count]
**Total estimated effort:** [X hours]
### Flags
| Flag Key | Category | Risk Score | Files |
|----------|----------|------------|-------|
| flag-1 | Stale | 28 | 3 |
| flag-2 | Stale | 31 | 4 |
| flag-3 | Completed| 15 | 2 |
### Notes
[Any dependencies or ordering requirements]
The summary ticket
Create one summary ticket that captures the audit results:
## Feature Flag Audit Results - [Date]
### Summary
- **Total flags found:** [X]
- **Active (keep):** [X]
- **Removal candidates:** [X]
- **Estimated total cleanup:** [X hours]
### Breakdown
| Category | Count | Est. Hours |
|----------|-------|-----------|
| Orphaned (P0) | X | X |
| Completed (P1) | X | X |
| Stale (P2) | X | X |
| Complex (P3) | X | X |
| Risky (P4) | X | X |
### Recommendation
[Your recommendation for cleanup cadence and timeline]
What you should have after 30 minutes:
- Individual tickets for P0 and P1 flags
- Batch tickets for P2 and P3 flags
- A summary ticket with full audit results
- A recommended cleanup timeline
After the audit: Making it stick
A one-time audit solves today's problem. Preventing the next flag graveyard requires building auditing into your regular workflow.
Establish a recurring audit cadence
| Team Size | Recommended Cadence | Time Investment |
|---|---|---|
| 1-10 engineers | Monthly | 30 minutes |
| 10-30 engineers | Bi-weekly | 30-45 minutes |
| 30-100 engineers | Weekly | 45-60 minutes |
| 100+ engineers | Continuous (automated) | Tool-driven |
Automate what you can
The manual grep-and-git-blame approach works for a quick audit, but it does not scale. As your team and codebase grow, you need automated flag detection and lifecycle tracking.
Tools like FlagShark automate the entire inventory and age-tracking process by analyzing every pull request for flag additions and removals. Instead of running grep commands monthly, you get a continuously updated view of every flag in your codebase, when it was added, who added it, and whether it has been removed.
Set flag hygiene metrics
Track these metrics over time to measure improvement:
| Metric | Target | How to Measure |
|---|---|---|
| Total flag count | Stable or declining | Monthly audit |
| Stale flag percentage | < 20% | Flags > 90 days with no changes |
| Average flag age | < 60 days | Mean age of all active flags |
| Orphaned flag count | 0 | Cross-reference code vs. platform |
| Time to remove | < 14 days after 100% rollout | Track from rollout completion to removal |
| Audit completion rate | 100% | Tickets created vs. tickets completed |
The flag hygiene dashboard
If you track nothing else, track the stale flag ratio: the percentage of flags older than 90 days with no recent modifications. This single metric captures the health of your flag management practices better than any other.
A stale flag ratio under 20% means your team is actively managing flag lifecycle. Between 20-40% suggests flag debt is accumulating but manageable. Over 40% means you are heading toward flag hell and need to prioritize cleanup.
Common audit pitfalls to avoid
Pitfall 1: Only searching for known patterns
Your grep commands only find flags you know about. Teams often have flags hidden in configuration files, environment variables, or database-driven feature toggles that do not appear in code searches. Check your config files, YAML, JSON, and environment templates too.
Pitfall 2: Ignoring test files
Test files contain flag references that need cleanup too. When you remove a flag from production code, the corresponding test mocks, fixtures, and assertions also need updating. Include test files in your audit scope.
Pitfall 3: Treating all flags equally
A kill switch that protects against a known failure mode should not be treated the same as a stale release flag. Your scoring rubric accounts for this, but resist the urge to set a blanket "remove everything older than X days" policy without nuance.
Pitfall 4: Auditing without authority to act
An audit that produces tickets nobody works on is worse than no audit at all -- it creates the illusion of progress while debt continues to accumulate. Before you start, confirm that your team lead or engineering manager supports allocating time for the resulting cleanup work.
The 30-minute audit in practice
Here is what a real audit looks like for a mid-size codebase:
| Metric | Example Result |
|---|---|
| Total flags found | 67 |
| Active (keep) | 12 |
| Completed (P1 removal) | 18 |
| Orphaned (P0 removal) | 8 |
| Stale (P2/P3) | 23 |
| Risky (P4 investigate) | 6 |
| Estimated cleanup hours | 47 hours |
| Stale flag ratio | 46% |
In this example, 26 flags (the P0 and P1 items) can be removed with relatively low risk, taking roughly 16 hours of work. That is two days of focused cleanup that eliminates nearly 40% of the flag inventory. The remaining 29 flags need more investigation, but they are now tracked and prioritized instead of invisible.
Forty-seven hours might sound like a lot, but spread across a team of 10 engineers over two sprints, that is less than 3 hours per person per sprint. The productivity gains from removing 55 flags will pay that investment back within weeks.
Thirty minutes. That is all it takes to go from "we probably have some stale flags" to a prioritized, ticketed cleanup plan backed by real data. The flags hiding in your codebase are not going to clean themselves up. But now you know exactly where they are, how old they are, who owns them, and what it will take to remove them.
The only question left is when you start. Set a timer. Open a terminal. The audit begins now.