Improve using // in XPath performance#249
Conversation
5bf858e to
119ed65
Compare
|
I have some concerns about maintainability because cache invalidation bug is hard to debug. Here are some ideas of cache strategy.
def node_test
namespaces_cache = {}
# Cache scope is enclosed in this method.
# There is no need to worry about cache invalidation.
call node.namespaces(namespaces_cache) to use the cache
end
def namespaces(cache = nil)
if cache
cache[self] ||= calculate_namespaces(cache)
else
calculate_namespaces
end
end
def namespaces
parent_namespaces = parent&.namespaces
# If parent.namespaces or @attributes is changed, cache is automatically invalidated
return @namespaces if parent_namespaces.equal?(@parent_namespaces) && @attrubutes.frozen?
@attributes.freeze # Need to unfreeze before modifying @attributes
@parent_namespaces = parent_namespaces
@namespaces = @parent_namespaces.merge(@attributes).freeze
end |
119ed65 to
34cca4c
Compare
For maintainability, I have changed my mind about adopting policy 1 above. |
|
I have a bit concern about adding a I feel that it may a bit ad-hoc approach. I'm not object this approach but can we improve our API? For example, how about keeping a cache per document? class XPathParser
def parse path, nodeset
path_stack = @parser.parse( path )
nodeset.first.document.enable_cache do
# New document level cache is created and available in this block.
# This API is thread unsafe. Users can't change this document in this block.
match( path_stack, nodeset )
end
end
end |
34cca4c to
cea9bb8
Compare
|
I add |
cea9bb8 to
ffe6f9a
Compare
|
I have corrected the points you pointed out. |
ffe6f9a to
24cfc1b
Compare
|
I have corrected the points you pointed out. |
ff07142 to
acf7ecf
Compare
|
I have corrected the points you pointed out. |
#252) `XPath.match`, `XPath.first`, `XPath.each`, `XPathParser#parse` and `XPathParser#match` accepted nodeset as element. This pull request changes the first parameter of these method to be an element instead of nodeset. Passing nodeset will be deprecated. ```ruby # Documented usage. OK REXML::XPath.match(element, xpath) # Undocumented usage. Deprecate in this pull request nodeset = [element] REXML::XPath.match(nodeset, xpath) ``` ### Background #249 will introduce a temporary cache. ```ruby def parse path, nodeset path_stack = @parser.parse( path ) nodeset.first.document.send(:enable_cache) do match( path_stack, nodeset ) end end ``` But the signature `XPathParser#match(path, nodeset)` does not guarantee that all nodes in the nodeset has the same root document. So cache does not work in the code below. It's still slow. ```ruby REXML::XPath.match(2.times.map { REXML::Document.new('<a>'*400+'</a>'*400) }, 'a//a') ``` The interface is holding our back, so I propose to drop accepting array as element. This change is a backward incompatibility, but it just drops undocumented feature. I think only the test code was unintentionally using this feature. ### XPath.match with array XPath.match only traverse the first element of the array for some selectors. ```ruby nodeset = [REXML::Document.new("<a><b/></a>"), REXML::Document.new("<a><c/></a>")] REXML::XPath.match(nodeset, "a/*") #=> [<b/>, <c/>] REXML::XPath.match(nodeset, "//a/*") #=> [<b/>] # I expect [<b/>, <c/>] but the second document is ignored ``` It indicates that `XPath.match` is not designed to search inside multiple nodes/documents. --------- Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
acf7ecf to
7279481
Compare
|
#252 merged, so I rebased this PR and updated. |
7279481 to
88ec36f
Compare
|
I have corrected the points you pointed out. |
88ec36f to
1f67fbd
Compare
|
I have corrected the points you pointed out. |
1f67fbd to
b28407e
Compare
|
I have corrected the points you pointed out. |
## Why? for fix get_namespace performance > # FIXME: This DOUBLES the time XPath searches take Co-authored-by: tomoya ishida <tomoyapenguin@gmail.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com>
## Benchmark (Comparison with rexml 3.4.1)
```
$ benchmark-driver benchmark/xpath.yaml
Calculating -------------------------------------
rexml 3.4.1 master 3.4.1(YJIT) master(YJIT)
REXML::XPath.match(REXML::Document.new(xml), 'a//a') 29.215 234.909 108.945 492.410 i/s - 100.000 times in 3.422925s 0.425697s 0.917898s 0.203083s
Comparison:
REXML::XPath.match(REXML::Document.new(xml), 'a//a')
master(YJIT): 492.4 i/s
master: 234.9 i/s - 2.10x slower
3.4.1(YJIT): 108.9 i/s - 4.52x slower
rexml 3.4.1: 29.2 i/s - 16.85x slower
```
- YJIT=ON : 4.52x faster
- YJIT=OFF : 8.04x faster
b28407e to
6c573e0
Compare
|
I have corrected the points you pointed out. |
|
Thanks. |
When using
//in XPath, the deeper the tag hierarchy, the slower it becomes due to the namespace acquisition process.Caching namespace information improves performance when using
//with XPath.Benchmark (Comparison with rexml 3.4.1)