Conversation
|
LGTM - merge or continue as you choose. |
| "System`N": process_assign_n, | ||
| "System`NumericQ": process_assign_numericq, | ||
| "System`MessageName": process_assign_messagename, | ||
| "System`$RandomState": process_assign_random_state, |
There was a problem hiding this comment.
Something to keep in mind for when the time is right...
The key should be a Symbol not a str. This will probably change the code a bit though (but for the better).
There was a problem hiding this comment.
I have been oscillating on what is the best. Last year I had the idea that the key for Definitions should be a Symbol, instead of a str. However, for the way in which Python dict works, each time we call the get method of the dict, we have to compute a hash value. But then, to evaluate a hash value for a native type is much faster than doing it from a custom class. This is why here and in several other places of the code, the keys are str instead of Symbols.
There was a problem hiding this comment.
If the Symbols are unique, can you use an approach like this?
https://stackoverflow.com/a/17054416/24819
if the id could be reused it won't work.
There was a problem hiding this comment.
Symbols should not be a problem, because are singletons and are not being destroyed during a session. However, in any case, my only concern here is not to go back to the spaghetti version of this code...
There was a problem hiding this comment.
I have been oscillating on what is the best. Last year I had the idea that the key for
Definitionsshould be aSymbol, instead of astr. However, for the way in which Pythondictworks, each time we call thegetmethod of the dict, we have to compute a hash value. But then, to evaluate a hash value for a native type is much faster than doing it from a custom class. This is why here and in several other places of the code, the keys arestrinstead ofSymbols.
This is penny wise and pound foolish.
For now, the overriding concern should be clarity, not efficiency.
A lot of damage to this code base has been done in the name of efficiency, and the code is neither clear nor efficient.
So let us start out with what is the intent. The intent is that the action to perform to "set", varies based on the left-hand side (Symbol in the WL sense, Function, or Expression) rather than some arbitrary string as it appears now.
As I am coming to learn though, we shouldn't be going down the dictionary-lookup of a name/symbol path at all because the intent is that an assignment changes depending on the kind of LHS or head item and its underlying type.
One problem benchmarking dictionary .get() of a string str versus our current Symbol class, is that Python is constantly changing.
Here are timing results for Python 3.6.15 versus 3.10.8
First 3.6.15:
$ python
Python 3.6.15 (default, Sep 21 2021, 05:10:37)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> from mathics.core.systemsymbols import SymbolContext
>>> special_cases1 = {"System`Context": 1}
>>> special_cases2 = {SymbolContext: 1}
>>> timeit.timeit(lambda: special_cases1.get("System`Context"))
0.23206797917373478
>>> timeit.timeit(lambda: special_cases2.get(SymbolContext))
0.6045385750476271
>>> timeit.timeit(lambda: SymbolContext.get_name())
0.27853861986659467
>>>Now Python 3.10.8:
$ python
Python 3.10.8 (main, Oct 23 2022, 18:47:59) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> from mathics.core.systemsymbols import SymbolContext
>>> special_cases1 = {"System`Context": 1}
>>> special_cases2 = {SymbolContext: 1}
>>> timeit.timeit(lambda: special_cases1.get("System`Context"))
0.13024636614136398
>>> timeit.timeit(lambda: special_cases2.get(SymbolContext))
0.3871238350402564
>>> timeit.timeit(lambda: SymbolContext.get_name())
0.22654549591243267There are a couple of things to notice. Yes, the way the code as written now would be slightly slower. But notice how that gap decreases as we get to more modern Python. And this I don't think is an accident: we should let Python worry about the implementation of dict.get.
One of our problems in the code is that there are a lot of conversions into and out of strings. And this is part of a general looseness in how things are represented. Since too often have to go with the simple but very inefficient Expression representation we have a lot of conversions into that as well.
So in the timings, note that you just can't compare get of a Symbol with get of a str, but you also need to count the time to do the conversion to a string which happens all over the place and makes for conceptually more difficult code.
More importantly, the reason dict.get() with our Symbol class is slow is because there is this custom __hash()__ function. (the suggestion by @axkr also points in this direction).
The current Mathics code is "too clever by a half". If we were to have a Symbol object that functioned like a Lisp Symbol object you'd find that a get on a dictionary is faster than a get on a string.
For Python 3.10.8:
>>> class Context(): pass
...
>>> special_cases3 = {Context: 1}
>>> timeit.timeit(lambda: special_cases3.get(Context))
0.12829051516018808This takes less time than dict.get on a str (0.13024636614136398) and it doesn't require the conversion.
In sum, we have this very unfortunate sequence of events.
The code, as it is written, is written in a vague, simplistic and sometimes sloppy way; and its intent isn't made clear, tending to be "clever" than clear.
The next generation comes around and says I'll just try to speed it up with low-level tricks without trying to make sense of it. That is not only not working out but in some cases makes the situation worse.
Instead of trying to figure out how to make dict.get run faster, we should be trying to understand how to get the Symbol class just be about Lisp-like Symbols and nothing else. Improvements to the Python interpreter are not going to address this bigger problem that is the source of slowness and conceptual code complexity.
Things like computing sort order on Atomic Elements, or the Mathics binary Equal operator (which is distinct from SameQ) should not exist on Lisp-like Symbols, but instead some other class should do that.
@rocky, these are my suggestions for your PR.