Improve extract performance via ignoring directory early during os.walk by stkao05 · Pull Request #694 · python-babel/babel

stkao05 · 2020-02-11T07:14:44Z

Currently, the extraction code will do an os.walk to perform a deep file search. However, this file exploration could be very slow when there were directories that were deep and contain many files. Even if you have specified some directories to be ignored in the mapping file, the os.walk would explore these directories.

The PR improves the extract process performance via making sure to skip exploring those ignore directory early during os.walk.

stkao05 · 2020-02-11T07:19:21Z

Real-life scenario I have experience

When you are working with front-end, typically you would have a node_modules directory in your codebase which contains source codes of all 3rd party lib (similar to Python's /site-packages/), and this directory typically is very large.

codecov-io · 2020-02-11T07:19:40Z

Codecov Report

Merging #694 into master will decrease coverage by 0.03%.
The diff coverage is 77.77%.

@@            Coverage Diff             @@
##           master     #694      +/-   ##
==========================================
- Coverage   90.97%   90.94%   -0.04%     
==========================================
  Files          24       24              
  Lines        4176     4184       +8     
==========================================
+ Hits         3799     3805       +6     
- Misses        377      379       +2

Impacted Files	Coverage Δ
babel/messages/extract.py	`94.38% <77.77%> (-0.56%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0cfa69e...6ada6ee. Read the comment docs.

stkao05 · 2020-05-21T13:16:33Z

Ping @akx

akx · 2020-10-01T07:38:32Z

babel/messages/extract.py

+        if dirname.startswith('.') or dirname.startswith('_'):
+            return False
+
+        absdir = os.path.join(root, dirname).replace(os.sep, '/')


Since the logic for ignoring filenames uses

filename = relpath(filepath, dirpath)

I think this should also use a relative path to the root. Otherwise this might end up ignoring paths that happen to contain an ignored fragment outside the relative root.

That is, if your project lives in /foobars/myproject/, and you've ignored *foobar* (as it has a special meaning within the myproject directory), and you invoke Babel from within /foobars/myproject/, this would ignore all files.

Thanks for the feedbacks. Just applied your suggestion in 252323a

akx · 2022-01-28T15:13:38Z

Hi @stkao05 – #832 landed today, so this would need to be rebased :)

stkao05 requested a review from akx February 11, 2020 10:40

akx requested changes Oct 1, 2020

View reviewed changes

Improve extract performance via ignoring directory early during os.walk

3a8587a

stkao05 force-pushed the faster_extract branch from 6ada6ee to 3a8587a Compare December 18, 2020 06:40

Ingore dir base on relative path pattern

252323a

stkao05 requested a review from akx December 18, 2020 12:52

akx mentioned this pull request Jan 27, 2022

Implement directory filter for extract #832

Merged

akx mentioned this pull request Feb 7, 2023

Improve extract performance via ignoring directories early during os.walk #968

Merged

akx closed this in #968 Dec 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve extract performance via ignoring directory early during os.walk#694

Improve extract performance via ignoring directory early during os.walk#694
stkao05 wants to merge 2 commits intopython-babel:masterfrom
stkao05:faster_extract

stkao05 commented Feb 11, 2020

Uh oh!

stkao05 commented Feb 11, 2020

Uh oh!

codecov-io commented Feb 11, 2020 •

edited

Loading

Uh oh!

stkao05 commented May 21, 2020

Uh oh!

akx Oct 1, 2020

Uh oh!

stkao05 Dec 18, 2020

Uh oh!

akx commented Jan 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stkao05 commented Feb 11, 2020

Uh oh!

stkao05 commented Feb 11, 2020

Uh oh!

codecov-io commented Feb 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

stkao05 commented May 21, 2020

Uh oh!

akx Oct 1, 2020

Choose a reason for hiding this comment

Uh oh!

stkao05 Dec 18, 2020

Choose a reason for hiding this comment

Uh oh!

akx commented Jan 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-io commented Feb 11, 2020 •

edited

Loading