Skip to content

my suggestions#592

Merged
mmatera merged 1 commit intoautoload-formsfrom
autoload-forms-mm
Nov 4, 2022
Merged

my suggestions#592
mmatera merged 1 commit intoautoload-formsfrom
autoload-forms-mm

Conversation

@mmatera
Copy link
Contributor

@mmatera mmatera commented Nov 4, 2022

@rocky, these are my suggestions for your PR.

@rocky
Copy link
Member

rocky commented Nov 4, 2022

LGTM - merge or continue as you choose.

"System`N": process_assign_n,
"System`NumericQ": process_assign_numericq,
"System`MessageName": process_assign_messagename,
"System`$RandomState": process_assign_random_state,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to keep in mind for when the time is right...

The key should be a Symbol not a str. This will probably change the code a bit though (but for the better).

Copy link
Contributor Author

@mmatera mmatera Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been oscillating on what is the best. Last year I had the idea that the key for Definitions should be a Symbol, instead of a str. However, for the way in which Python dict works, each time we call the get method of the dict, we have to compute a hash value. But then, to evaluate a hash value for a native type is much faster than doing it from a custom class. This is why here and in several other places of the code, the keys are str instead of Symbols.

Copy link
Collaborator

@axkr axkr Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Symbols are unique, can you use an approach like this?
https://stackoverflow.com/a/17054416/24819

if the id could be reused it won't work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symbols should not be a problem, because are singletons and are not being destroyed during a session. However, in any case, my only concern here is not to go back to the spaghetti version of this code...

Copy link
Member

@rocky rocky Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been oscillating on what is the best. Last year I had the idea that the key for Definitions should be a Symbol, instead of a str. However, for the way in which Python dict works, each time we call the get method of the dict, we have to compute a hash value. But then, to evaluate a hash value for a native type is much faster than doing it from a custom class. This is why here and in several other places of the code, the keys are str instead of Symbols.

This is penny wise and pound foolish.

For now, the overriding concern should be clarity, not efficiency.

A lot of damage to this code base has been done in the name of efficiency, and the code is neither clear nor efficient.

So let us start out with what is the intent. The intent is that the action to perform to "set", varies based on the left-hand side (Symbol in the WL sense, Function, or Expression) rather than some arbitrary string as it appears now.

As I am coming to learn though, we shouldn't be going down the dictionary-lookup of a name/symbol path at all because the intent is that an assignment changes depending on the kind of LHS or head item and its underlying type.

One problem benchmarking dictionary .get() of a string str versus our current Symbol class, is that Python is constantly changing.

Here are timing results for Python 3.6.15 versus 3.10.8

First 3.6.15:

$ python
Python 3.6.15 (default, Sep 21 2021, 05:10:37)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> from mathics.core.systemsymbols import SymbolContext
>>> special_cases1 = {"System`Context": 1}
>>> special_cases2 = {SymbolContext: 1}
>>> timeit.timeit(lambda: special_cases1.get("System`Context"))
0.23206797917373478
>>> timeit.timeit(lambda: special_cases2.get(SymbolContext))
0.6045385750476271
>>> timeit.timeit(lambda: SymbolContext.get_name())
0.27853861986659467
>>>

Now Python 3.10.8:

$ python
Python 3.10.8 (main, Oct 23 2022, 18:47:59) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> from mathics.core.systemsymbols import SymbolContext
>>> special_cases1 = {"System`Context": 1}
>>> special_cases2 = {SymbolContext: 1}
>>> timeit.timeit(lambda: special_cases1.get("System`Context"))
0.13024636614136398
>>> timeit.timeit(lambda: special_cases2.get(SymbolContext))
0.3871238350402564
>>> timeit.timeit(lambda: SymbolContext.get_name())
0.22654549591243267

There are a couple of things to notice. Yes, the way the code as written now would be slightly slower. But notice how that gap decreases as we get to more modern Python. And this I don't think is an accident: we should let Python worry about the implementation of dict.get.

One of our problems in the code is that there are a lot of conversions into and out of strings. And this is part of a general looseness in how things are represented. Since too often have to go with the simple but very inefficient Expression representation we have a lot of conversions into that as well.

So in the timings, note that you just can't compare get of a Symbol with get of a str, but you also need to count the time to do the conversion to a string which happens all over the place and makes for conceptually more difficult code.

More importantly, the reason dict.get() with our Symbol class is slow is because there is this custom __hash()__ function. (the suggestion by @axkr also points in this direction).

The current Mathics code is "too clever by a half". If we were to have a Symbol object that functioned like a Lisp Symbol object you'd find that a get on a dictionary is faster than a get on a string.

For Python 3.10.8:

>>> class Context(): pass
...
>>> special_cases3 = {Context: 1}
>>> timeit.timeit(lambda: special_cases3.get(Context))
0.12829051516018808

This takes less time than dict.get on a str (0.13024636614136398) and it doesn't require the conversion.

In sum, we have this very unfortunate sequence of events.

The code, as it is written, is written in a vague, simplistic and sometimes sloppy way; and its intent isn't made clear, tending to be "clever" than clear.

The next generation comes around and says I'll just try to speed it up with low-level tricks without trying to make sense of it. That is not only not working out but in some cases makes the situation worse.

Instead of trying to figure out how to make dict.get run faster, we should be trying to understand how to get the Symbol class just be about Lisp-like Symbols and nothing else. Improvements to the Python interpreter are not going to address this bigger problem that is the source of slowness and conceptual code complexity.

Things like computing sort order on Atomic Elements, or the Mathics binary Equal operator (which is distinct from SameQ) should not exist on Lisp-like Symbols, but instead some other class should do that.

@mmatera mmatera merged commit d9b9edc into autoload-forms Nov 4, 2022
@mmatera mmatera deleted the autoload-forms-mm branch November 4, 2022 13:16
rocky pushed a commit that referenced this pull request Nov 6, 2022
rocky pushed a commit that referenced this pull request Nov 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants