-
-
Notifications
You must be signed in to change notification settings - Fork 33.9k
gh-96346: Use double caching for re._compile() #96347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
9151549
5fe3232
d619f6d
077e6dd
bd618bf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -229,6 +229,7 @@ def compile(pattern, flags=0): | |
| def purge(): | ||
| "Clear the regular expression caches" | ||
| _cache.clear() | ||
| _cache2.clear() | ||
| _compile_repl.cache_clear() | ||
|
|
||
| def template(pattern, flags=0): | ||
|
|
@@ -266,40 +267,61 @@ def escape(pattern): | |
| # -------------------------------------------------------------------- | ||
| # internals | ||
|
|
||
| _cache = {} # ordered! | ||
|
|
||
| # Use the fact that dict keeps the insertion order. | ||
| # _cache2 uses the simple FIFO policy which has better latency. | ||
| # _cache uses the LRU policy which has better hit rate. | ||
| # OrderedDict is not used because it adds a new dependence, and | ||
| # performance difference is negligible. | ||
| _cache = {} # LRU | ||
| _cache2 = {} # FIFO | ||
| _MAXCACHE = 512 | ||
| _MAXCACHE2 = 256 # Must be less than _MAXCACHE. | ||
serhiy-storchaka marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| def _compile(pattern, flags): | ||
| # internal: compile pattern | ||
| if isinstance(flags, RegexFlag): | ||
| flags = flags.value | ||
| try: | ||
| return _cache[type(pattern), pattern, flags] | ||
| return _cache2[type(pattern), pattern, flags] | ||
| except KeyError: | ||
| pass | ||
| if isinstance(pattern, Pattern): | ||
| if flags: | ||
| raise ValueError( | ||
| "cannot process flags argument with a compiled pattern") | ||
| return pattern | ||
| if not _compiler.isstring(pattern): | ||
| raise TypeError("first argument must be string or compiled pattern") | ||
| if flags & T: | ||
| import warnings | ||
| warnings.warn("The re.TEMPLATE/re.T flag is deprecated " | ||
| "as it is an undocumented flag " | ||
| "without an obvious purpose. " | ||
| "Don't use it.", | ||
| DeprecationWarning) | ||
| p = _compiler.compile(pattern, flags) | ||
| if not (flags & DEBUG): | ||
|
|
||
| key = (type(pattern), pattern, flags) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can't this be defined before the I also wonder if an
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All this would add a small but measurable overhead in the common case. I tested this when I wrote the current implementation. |
||
| p = _cache.pop(key, None) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does it remove it from the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because we need to move it to the end. |
||
| if p is None: | ||
| if isinstance(pattern, Pattern): | ||
| if flags: | ||
| raise ValueError( | ||
| "cannot process flags argument with a compiled pattern") | ||
| return pattern | ||
| if not _compiler.isstring(pattern): | ||
| raise TypeError("first argument must be string or compiled pattern") | ||
| if flags & T: | ||
| import warnings | ||
| warnings.warn("The re.TEMPLATE/re.T flag is deprecated " | ||
| "as it is an undocumented flag " | ||
| "without an obvious purpose. " | ||
| "Don't use it.", | ||
| DeprecationWarning) | ||
| p = _compiler.compile(pattern, flags) | ||
| if flags & DEBUG: | ||
| return p | ||
| if len(_cache) >= _MAXCACHE: | ||
| # Drop the oldest item | ||
| # Drop the least recently used item | ||
| try: | ||
| del _cache[next(iter(_cache))] | ||
serhiy-storchaka marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| except (StopIteration, RuntimeError, KeyError): | ||
| pass | ||
| _cache[type(pattern), pattern, flags] = p | ||
| # Append to the end | ||
| _cache[key] = p | ||
|
|
||
| if len(_cache2) >= _MAXCACHE2: | ||
| # Drop the oldest item | ||
| try: | ||
| del _cache2[next(iter(_cache2))] | ||
| except (StopIteration, RuntimeError, KeyError): | ||
| pass | ||
| _cache2[key] = p | ||
| return p | ||
|
|
||
| @functools.lru_cache(_MAXCACHE) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Use double caching for compiled RE patterns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would omit the OrderedDict part of the comment. It is debatable and doesn't need to be in the code. The important part is the two lines before that explain the two caches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an explanation why OrderedDict is not used at first place, no?
Also, I was afraid that some new contributor passing through the code and not aware about the history of this code can submit a PR with an "obvious" improvement, and it can be merged while I am not here. The history of this code contains many changes and reversions.
I removed this, and hope the new comment about
next(iter(_cache))will be enough.