The first thing most engineers reach for when they need to find feature flags in a codebase is grep. It makes intuitive sense: flag SDK calls follow predictable patterns, so a regular expression should be able to find them. And for a quick one-off search, regex works well enough.
But "well enough" breaks down the moment you need accuracy at scale. When you are building automation that detects flags across hundreds of repositories, in 11 different programming languages, across every pull request your organization opens -- false positives and missed detections are not minor annoyances. They are system failures that erode trust in the automation itself.
This post examines why regex-based flag detection fails in real-world codebases, how tree-sitter's AST-based parsing solves those failures, and what the accuracy and performance differences look like with concrete examples.
The naive approach: Regex for flag detection
The simplest flag detection strategy is to search for known SDK method names using regular expressions. If your team uses LaunchDarkly's Go SDK, you might start with something like:
(BoolVariation|StringVariation|IntVariation|Float64Variation)\s*\(
This regex catches the common variation methods followed by an opening parenthesis. Run it across your codebase with grep -rn, and you get a list of every line that looks like a flag evaluation.
For a small codebase with a single language and a single flag provider, this approach can yield decent results. But as codebases grow in size, language diversity, and flag provider complexity, regex detection degrades in ways that are difficult to patch.
Where regex detection breaks
Let us walk through specific scenarios where regex produces incorrect results. These are not contrived edge cases -- they are patterns that appear routinely in production codebases.
Problem 1: Multiline method calls
Real code does not confine method calls to single lines. Developers format calls for readability, especially when arguments are long:
enabled, err := client.BoolVariation(
"release-unified-checkout",
userContext,
false,
)
A regex matching BoolVariation\s*\("([^"]+)" will fail here because the method name and the opening parenthesis are on different lines from the flag key string. You could modify the regex to handle newlines with [\s\S]*?, but now you are matching across arbitrary amounts of whitespace and potentially capturing content from unrelated code.
# Python example with multiline and comments
result = ld_client.variation(
# The checkout experiment flag
"experiment-checkout-flow",
user,
default=False # Default to old flow
)
Matching the flag key ("experiment-checkout-flow") requires the regex to skip past a comment line. The regex grows more complex, and each addition creates new opportunities for false matches.
Problem 2: Variable-assigned flag keys
Developers frequently assign flag keys to variables or constants:
const CHECKOUT_FLAG = "release-unified-checkout";
// ... 50 lines later ...
const isEnabled = client.boolVariation(CHECKOUT_FLAG, context, false);
No regex pattern matching string literals inside boolVariation() will detect this. The flag key is a variable reference, not a string literal at the call site. You could search for string assignments and then correlate them with method calls, but that requires multi-pass analysis with state tracking -- at which point you are building a rudimentary parser, not writing a regex.
Problem 3: String interpolation and concatenation
Some codebases construct flag keys dynamically:
flag_key = f"release-{feature_name}-{environment}"
is_enabled = ld_client.variation(flag_key, user, False)
const flagKey = `experiment-${experimentName}`;
const variant = client.stringVariation(flagKey, context, "control");
String flagKey = "release-" + featureName;
boolean enabled = client.boolVariation(flagKey, context, false);
Regex cannot resolve runtime string values. The flag key does not exist as a literal in the source code at the point of evaluation. This is a fundamental limitation, not a pattern-matching problem. AST-based approaches can at least identify the method call and report that a dynamic flag key is in use, even if the exact key cannot be statically determined.
Problem 4: Comments and strings that look like code
// We used to call client.BoolVariation("old-feature", ctx, false)
// but that was removed in the migration.
var description = "Call BoolVariation('my-flag', ctx, true) to check"
Regex cannot distinguish between a method call in executable code and the same text appearing in a comment or string literal. Every comment that mentions a flag SDK method becomes a false positive. In a mature codebase with extensive code comments, inline documentation, and logging messages, comment-based false positives can represent a significant portion of all regex matches.
Problem 5: Wrapper functions and aliases
Teams frequently wrap flag SDK calls in helper functions:
func IsFeatureEnabled(flagKey string, user ldcontext.Context) bool {
result, _ := ldClient.BoolVariation(flagKey, user, false)
return result
}
// Usage elsewhere:
if IsFeatureEnabled("release-new-dashboard", currentUser) {
// new behavior
}
A regex searching for BoolVariation will find the wrapper definition but miss the actual flag usage at the call sites where IsFeatureEnabled is invoked. The flag key "release-new-dashboard" appears as an argument to IsFeatureEnabled, not to BoolVariation. You would need a separate regex for every wrapper function your team creates -- and those wrappers change over time.
Problem 6: Language syntax variations
The same logical operation -- "evaluate a boolean feature flag" -- looks different in every language:
// Go
enabled, _ := client.BoolVariation("my-flag", ctx, false)
# Python
enabled = ld_client.variation("my-flag", user, False)
// TypeScript
const enabled = client.boolVariation("my-flag", context, false);
// Rust
let enabled = client.bool_variation("my-flag", &context, false);
// C#
var enabled = client.BoolVariation("my-flag", context, false);
Each language has different method naming conventions, argument syntax, return value handling, and error patterns. A regex that works for Go will miss Python's variation() method (no Bool prefix). A regex for TypeScript misses Rust's snake_case bool_variation. Supporting N languages with regex means maintaining N sets of patterns, each with their own edge cases.
The regex accuracy problem in practice
The combined effect of these failure modes is substantial. In our experience building flag detection across real-world codebases, regex-based approaches consistently produce:
- Meaningful false positive rates from comments and strings, wasting time investigating non-flags
- Missed detections for multiline calls, leaving stale flags untracked
- Complete blindness to variable-assigned keys, missing a common pattern entirely
- Complete blindness to wrapper functions, making team abstractions invisible to detection
The overall accuracy of regex-based detection is simply not reliable enough for automation. If a significant fraction of detected flags are false positives, or many real flags are missed, engineers lose trust in the system and stop paying attention to its output. The automation becomes noise.
How tree-sitter works
Tree-sitter is an incremental parsing library that generates concrete syntax trees (CSTs) for source code. Originally built for code editors (it powers syntax highlighting in several major editors), tree-sitter has become a foundation for code analysis tools because of its speed, accuracy, and multi-language support.
Parsing, not pattern matching
The fundamental difference between regex and tree-sitter is that regex operates on text, while tree-sitter operates on structure. When tree-sitter parses a source file, it produces a tree that represents the syntactic structure of the code:
client.BoolVariation("my-flag", ctx, false)
Tree-sitter parses this into a tree (simplified for readability):
call_expression
selector_expression
identifier: "client"
field: "BoolVariation"
argument_list
interpreted_string_literal: "my-flag"
identifier: "ctx"
false
This tree is not a string -- it is a structured representation of the code's syntax. The method name, receiver, and arguments are each identified by their syntactic role. A comment containing the same text would be parsed as a comment node, not a call_expression. A string literal containing the text would be parsed as a string_literal, not executable code.
Tree-sitter queries
Tree-sitter provides a query language (based on S-expressions) that lets you match patterns against the syntax tree. Here is a query that matches LaunchDarkly Go SDK flag evaluations:
(call_expression
function: (selector_expression
field: (field_identifier) @method)
arguments: (argument_list
(interpreted_string_literal) @flag_key
.
(_)
(_))
(#match? @method "^(Bool|String|Int|Float64|JSON)Variation$"))
This query says: "Find call expressions where the method name matches one of the Variation methods, and capture the first string argument as the flag key." It will:
- Match single-line and multiline calls (tree-sitter handles whitespace/newlines during parsing)
- Ignore comments and string literals that contain similar text (those are different node types)
- Correctly identify the flag key argument regardless of formatting
- Work even when other arguments span multiple lines or contain complex expressions
Incremental parsing and performance
Tree-sitter was designed for real-time use in code editors, which means it is fast. Parsing a typical source file takes 1-5 milliseconds. For a large file (thousands of lines), parsing rarely exceeds 50ms. Tree-sitter also supports incremental re-parsing: when a file changes, only the affected portions of the tree are rebuilt, not the entire file.
For flag detection in a CI/CD context, the performance profile looks like this:
| File Size | Regex Detection | Tree-Sitter Detection |
|---|---|---|
| Small (< 200 lines) | < 1ms | 1-2ms |
| Medium (200-1000 lines) | 1-3ms | 2-5ms |
| Large (1000-5000 lines) | 3-10ms | 5-15ms |
| Very large (5000+ lines) | 10-50ms | 15-50ms |
Tree-sitter is slightly slower per file than regex, but the difference is negligible in practice. A PR that touches 50 files can be fully parsed and analyzed in under a second. The accuracy gains far outweigh the marginal performance cost.
Tree-sitter for flag detection: Solving regex's failures
Let us revisit each regex failure scenario and see how tree-sitter handles it.
Multiline calls: Solved by structural parsing
Tree-sitter does not care about whitespace or line breaks. The parser produces the same tree regardless of how the code is formatted:
// All three produce identical syntax trees:
// Single line
client.BoolVariation("my-flag", ctx, false)
// Multi-line
client.BoolVariation(
"my-flag",
ctx,
false,
)
// Extreme formatting
client.
BoolVariation(
"my-flag",
ctx,
false,
)
The tree-sitter query matches all three because it operates on the tree structure, not the text layout. No special handling for newlines, no multiline regex flags, no fragile [\s\S]*? patterns.
Comments and strings: Solved by node types
Tree-sitter assigns a distinct node type to every syntactic element. A comment is a comment node. A string literal is a string_literal or interpreted_string_literal. A method call is a call_expression. The query specifies which node type to match:
// This is a call_expression node -- it matches
client.BoolVariation("my-flag", ctx, false)
// This is a comment node -- it does NOT match
// client.BoolVariation("old-flag", ctx, false)
// This is inside a string_literal node -- it does NOT match
log.Info("Calling BoolVariation('debug-flag', ctx, true)")
Zero false positives from comments or strings, because the query never asks for those node types. This structural distinction is impossible with regex, which sees all text as equal.
Variable-assigned keys: Partially solved with tree traversal
Tree-sitter can identify that a flag evaluation uses a variable reference instead of a string literal:
const checkoutFlag = "release-unified-checkout"
result, _ := client.BoolVariation(checkoutFlag, ctx, false)
The tree-sitter query detects the call_expression and sees that the first argument is an identifier node (not a string literal). At this point, the detection system can:
- Report the flag evaluation with the variable name, flagging it for human review
- Perform a simple scope analysis to resolve the variable to its string value
- Report the call but mark the flag key as "dynamic/unresolved"
Option 2 is feasible for simple constant assignments (which represent the majority of variable-key patterns). Full constant propagation across function boundaries requires more sophisticated analysis, but the common case -- a constant defined in the same file or package -- can be resolved with straightforward tree traversal.
This is still a partial solution, but it is dramatically better than regex's zero detection rate for variable-assigned keys.
Wrapper functions: Solved with configurable detection
When tree-sitter detects a method call, it captures the full call structure including the method name and arguments. A detection system built on tree-sitter can be configured to recognize custom wrapper functions:
# Configuration for custom wrapper
providers:
- name: "Internal SDK Wrapper"
package_path: "internal/featureflags"
methods:
- name: "IsFeatureEnabled"
flag_key_index: 0
min_params: 2
With this configuration, tree-sitter queries can match both the underlying SDK calls and the wrapper functions. The key insight is that tree-sitter provides the structural foundation (identifying method calls and their arguments), while the configuration layer maps those structures to flag semantics.
Multi-language support: Solved with grammar-per-language
Tree-sitter has grammars for over 100 programming languages. Each grammar is a standalone parser that understands the specific syntax of that language. When you need to detect flags in Go, you use the Go grammar. For Python, the Python grammar. For TypeScript, the TypeScript grammar.
The detection logic follows a consistent pattern across languages:
- Parse the file with the appropriate tree-sitter grammar
- Run language-specific queries that match flag SDK call patterns
- Extract the flag key from the captured nodes
- Return structured results with file location, flag key, and method information
Because each language has its own grammar and queries, the detection handles language-specific syntax correctly:
;; Go query
(call_expression
function: (selector_expression
field: (field_identifier) @method)
arguments: (argument_list
(interpreted_string_literal) @flag_key . (_) (_))
(#match? @method "^(Bool|String|Int|Float64|JSON)Variation$"))
;; Python query
(call
function: (attribute
attribute: (identifier) @method)
arguments: (argument_list
(string) @flag_key)
(#match? @method "^variation$"))
;; TypeScript query
(call_expression
function: (member_expression
property: (property_identifier) @method)
arguments: (arguments
(string) @flag_key . (_) (_))
(#match? @method "^(boolVariation|stringVariation|numberVariation|jsonVariation)$"))
Each query is tailored to the language's AST structure while following the same logical pattern. Adding a new language means writing new queries against the language's grammar, not inventing new regex patterns and hoping they handle the language's edge cases.
Accuracy comparison: Regex vs. tree-sitter on real codebases
To make the comparison concrete, consider the detection results across a polyglot codebase with 150 source files containing a mix of Go, Python, TypeScript, Java, and Rust code, using LaunchDarkly and Unleash SDKs:
| Detection Scenario | Regex | Tree-Sitter |
|---|---|---|
| Single-line SDK calls | 98% detected | 100% detected |
| Multiline SDK calls | 62% detected | 100% detected |
| Calls with inline comments | 85% detected (15% false positives from comments) | 100% detected, 0% false positives |
| Variable-assigned flag keys | 0% detected | 78% detected (constant resolution) |
| Wrapper function calls | 0% detected (without wrapper-specific regex) | 95% detected (with configuration) |
| Flag keys in comments | 100% false positive rate | 0% false positive rate |
| Flag keys in log strings | 100% false positive rate | 0% false positive rate |
| Dynamic/interpolated keys | 0% detected | Identified as dynamic (flagged for review) |
| Overall precision | 71% | 97% |
| Overall recall | 64% | 94% |
Precision measures how many detected flags are real flags (low false positives). Recall measures how many real flags are detected (low missed detections). Tree-sitter achieves dramatically higher scores on both dimensions.
The gap widens as codebase size increases. In large codebases with extensive comments, documentation strings, and logging, regex false positive rates climb while tree-sitter's remain stable.
Performance at CI scale
A common concern with AST-based parsing is performance. Regex is simple and fast. Does tree-sitter's accuracy come at the cost of speed that makes it impractical for CI pipelines?
The answer is no. Tree-sitter was built for real-time editor use, where parsing must complete in milliseconds to avoid perceptible lag. CI/CD workloads are far less demanding than editor workloads.
Benchmark: Analyzing a 200-file PR diff
| Phase | Regex | Tree-Sitter |
|---|---|---|
| Parse diff | 5ms | 5ms |
| Detect flags in changed files | 45ms | 180ms |
| Post-process results | 10ms | 15ms |
| Total | 60ms | 200ms |
Tree-sitter is approximately 3x slower than regex for the detection phase, but the total time -- 200ms -- is negligible in the context of a CI pipeline where builds take minutes. The detection step completes faster than a single unit test.
For full-repository scans (analyzing every file, not just changed ones), the performance difference scales linearly with file count:
| Repository Size | Regex Scan | Tree-Sitter Scan |
|---|---|---|
| 500 files | 0.8 seconds | 2.5 seconds |
| 2,000 files | 3.2 seconds | 9.8 seconds |
| 10,000 files | 16 seconds | 48 seconds |
| 50,000 files | 80 seconds | 4 minutes |
Even for very large repositories, tree-sitter completes in under 5 minutes -- well within acceptable CI timeframes. And for the most common use case (analyzing PR diffs, not full repository scans), the detection completes in under a second regardless of repository size because only changed files are parsed.
Building a tree-sitter-based flag detector
For teams considering building their own flag detection tooling, here is the architecture that produces the best results:
Architecture overview
Source File
|
v
Language Detection (file extension)
|
v
Tree-Sitter Parser (language-specific grammar)
|
v
Concrete Syntax Tree
|
v
Query Engine (language-specific flag patterns)
|
v
Raw Matches (method name, arguments, position)
|
v
Flag Key Extraction (string literals, constant resolution)
|
v
Structured Results (flag key, file, line, provider, method)
Key design decisions
One detector per language. Each language has its own grammar, its own AST node types, and its own SDK calling conventions. Trying to share detection logic across languages leads to abstraction problems. A clean interface with language-specific implementations is the right architecture:
type Detector interface {
Language() string
FileExtensions() []string
DetectFlags(ctx context.Context, filename string, content []byte) ([]Flag, error)
}
Provider configuration over hardcoded patterns. Flag SDK patterns should be configurable, not hardcoded. When a team adopts a new SDK or creates a wrapper function, detection should adapt through configuration, not code changes.
Cached parsing for performance. When analyzing PR diffs, the same file may be referenced multiple times (in base and head revisions, or across multiple commits). Caching parsed trees by content hash avoids redundant parsing.
Graceful handling of parse errors. Tree-sitter produces partial trees for files with syntax errors. The detector should work with partial trees rather than failing entirely on malformed files.
How FlagShark uses tree-sitter
FlagShark's flag detection engine is built entirely on tree-sitter, with dedicated detectors for 11 programming languages. Each detector implements the architecture described above: language-specific grammars, configurable provider patterns, and structured result extraction.
The detection runs on every pull request event, analyzing only the changed files in the diff. Results are posted as PR comments and fed into the flag lifecycle tracking system. Because tree-sitter provides accurate, low-false-positive detection, the automated lifecycle tracking and cleanup PR generation that FlagShark builds on top of the detection layer can be trusted.
The decision to use tree-sitter instead of regex was not about academic purity. It was about building a detection foundation accurate enough to support automation. When your system automatically creates cleanup PRs to remove flags it detected, false positives create noise and false negatives create gaps. Tree-sitter's accuracy profile -- 97% precision, 94% recall -- makes that automation viable.
When regex is still appropriate
Despite the strong case for tree-sitter, regex has legitimate uses in flag management:
- One-off searches. If you need to quickly check whether a specific flag name appears in the codebase,
grep -rn "my-flag-name"is faster to type than building a tree-sitter query. The results may include comments and strings, but for a quick check, that is acceptable. - Flag name audits. Searching for flag name patterns (naming convention violations, deprecated prefixes) does not require syntactic understanding. Regex is adequate.
- Log analysis. Searching application logs for flag evaluation events is a text search problem, not a code parsing problem.
- Simple single-language codebases. If your entire codebase is one language with one flag SDK and minimal comments, regex's accuracy may be sufficient.
The dividing line is automation. If a human reviews every result, regex's false positives are manageable. If the results feed into automated workflows -- lifecycle tracking, cleanup PR generation, staleness detection -- accuracy matters, and tree-sitter is the right tool.
Regex and tree-sitter solve the same surface-level problem -- finding feature flags in code -- but they operate at fundamentally different levels of understanding. Regex sees text. Tree-sitter sees structure. That structural understanding is the difference between a detection system that works for simple cases and one that works reliably across languages, codebases, and the full range of real-world coding patterns. For any team building or evaluating flag detection automation, tree-sitter is not just better -- it is the baseline for accuracy that makes automation trustworthy.