Minor: Refactor memory size estimation for HashTable#10748
Minor: Refactor memory size estimation for HashTable#10748alamb merged 8 commits intoapache:mainfrom
Conversation
yyy1000
left a comment
There was a problem hiding this comment.
I see it's good abstraction.
My only concern is, it seems that the original goal was kind of ideal, which just pass the hashtable and get the estimate size. But now before using this function, user have to calculate the fixed_size themself, and also the calculation is not always the same.
Though the abstraction is not pretty enough, I think given the current code this PR is still an improvement. :)
Yes, I totally agree with you on this. However, as discussed in the issue, it still might be worth it due to the improved documentation. |
Agree! I left a comment which may improve but it's not necessary. Thanks! |
alamb
left a comment
There was a problem hiding this comment.
Thank you so much @marvinlanhenke -- this looks like a great step forward in readability 🙏
Thank you @yyy1000 for your review
I left some suggestions / comments. Let me know what you think
|
@alamb I changed |
|
Thanks again @marvinlanhenke |
* refactor: extract estimate_memory_size * refactor: cap at usize::MAX * refactor: use estimate_memory_size * chore: add examples * refactor: return Result<usize>; add testcase * fix: docs * fix: remove unneccessary checked_div * fix: remove additional and_then
Which issue does this PR close?
Closes #8764.
Rationale for this change
As stated in the issue here the goal is to consolidate estimation logic into a single function. This allows for better documentation as well as better maintainability.
Also there was an issue with the existing implementation that could still overflow if the
checked_mulwas capped at usize::MAX. I tried to fix this by checking for overflows and returning usize::MAX as a 'max cap'.What changes are included in this PR?
count_distinct/native.rsandhash_join.rsAre these changes tested?
Yes.
Are there any user-facing changes?
No.