perf: some python optimizations#21
Draft
jonmmease wants to merge 5 commits intofix/reindent-quadratic-perffrom
Draft
perf: some python optimizations#21jonmmease wants to merge 5 commits intofix/reindent-quadratic-perffrom
jonmmease wants to merge 5 commits intofix/reindent-quadratic-perffrom
Conversation
Replace the per-position inner loop over ~43 individual regex patterns with a single combined alternation regex using named groups. This eliminates millions of Python-level re.Pattern.match calls on large inputs. The dollar-quoted string pattern (which uses a backreference) is handled separately since backreferences break in combined alternation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace O(n) insert_before + del mutations in _split_kwds and _split_statements with a two-pass scan-then-rebuild approach. Pass 1 scans the unmodified token list to collect all edit operations. Pass 2 builds the new token list in a single O(n) sweep. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make TokenList.value a computed property instead of a stale cache. Previously __init__ called str(self) which flattened all descendants O(n). Now value is computed lazily only when accessed. - Add value property to TokenList that computes from children - Use str(token) instead of token.value at sites with TokenList args - Fix StripCommentsFilter to not recurse into Comment groups (previously relied on stale .value cache) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace O(n) list slicing in group_tokens with None sentinels. Consumed tokens are set to None instead of deleted, keeping list length stable. Compact all sentinels after each grouping pass. - group_tokens: use None sentinels instead of slice assignment - Reverse-scan optimization for extend case avoids O(n^2) - _compact_all: recursive cleanup after each pass in group() - Add None guards in _token_matching, flatten, get_sublists - _group_matching/_group: use live-list iteration with None skips Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace generic imt() dispatch with inline isinstance/ttype checks at hot call sites in grouping.py and sql.py. Remove imt import from grouping.py. imt() is kept in utils.py for cold paths and external use. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
|
cc @stevephodgson and @glentakahashi, I'm not convinced this is beneficial enough to be worth a detailed review, but wanted to at least document it in case you all see it differently |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Had claude do some more performance investigations on top of #18. It found some wins, but nothing that improved benchmarks by more than %33, so I'm on the fence on whether it's worth the review effort. But wanted to document it here.
Summary
Five independent pure-Python performance optimizations, each in its own commit for easy bisecting:
lexer.py) — Replace per-pattern inner loop with single alternation regex. Dollar-quoted strings handled separately due to backreference.reindent.py) — Replace repeatedlist.insert/delmutations with two-pass scan-then-rebuild.sql.py) — Eliminate O(n) flatten during construction;valuebecomes a computed@property.sql.py,grouping.py) — UseNonesentinels instead of O(n) list slicing per group operation; compact once per pass.grouping.py,sql.py) — Inlineisinstance/ttypechecks at hot call sites to eliminate function call overhead.Benchmark results (base → optimized)
Biggest wins are on large flat-list benchmarks where OPT-8 (deferred compaction) and OPT-13 (batch reindent) eliminate O(n²) behavior.
🤖 Generated with Claude Code