my suggestions by mmatera · Pull Request #592 · Mathics3/mathics-core

mmatera · 2022-11-04T12:14:42Z

@rocky, these are my suggestions for your PR.

rocky · 2022-11-04T12:23:40Z

LGTM - merge or continue as you choose.

rocky · 2022-11-04T12:56:29Z

mathics/builtin/assignments/internals.py

-        "System`N": process_assign_n,
-        "System`NumericQ": process_assign_numericq,
-        "System`MessageName": process_assign_messagename,
+        "System`$RandomState": process_assign_random_state,


Something to keep in mind for when the time is right...

The key should be a Symbol not a str. This will probably change the code a bit though (but for the better).

I have been oscillating on what is the best. Last year I had the idea that the key for Definitions should be a Symbol, instead of a str. However, for the way in which Python dict works, each time we call the get method of the dict, we have to compute a hash value. But then, to evaluate a hash value for a native type is much faster than doing it from a custom class. This is why here and in several other places of the code, the keys are str instead of Symbols.

If the Symbols are unique, can you use an approach like this?
https://stackoverflow.com/a/17054416/24819

if the id could be reused it won't work.

Symbols should not be a problem, because are singletons and are not being destroyed during a session. However, in any case, my only concern here is not to go back to the spaghetti version of this code...

I have been oscillating on what is the best. Last year I had the idea that the key for Definitions should be a Symbol, instead of a str. However, for the way in which Python dict works, each time we call the get method of the dict, we have to compute a hash value. But then, to evaluate a hash value for a native type is much faster than doing it from a custom class. This is why here and in several other places of the code, the keys are str instead of Symbols.

This is penny wise and pound foolish.

For now, the overriding concern should be clarity, not efficiency.

A lot of damage to this code base has been done in the name of efficiency, and the code is neither clear nor efficient.

So let us start out with what is the intent. The intent is that the action to perform to "set", varies based on the left-hand side (Symbol in the WL sense, Function, or Expression) rather than some arbitrary string as it appears now.

As I am coming to learn though, we shouldn't be going down the dictionary-lookup of a name/symbol path at all because the intent is that an assignment changes depending on the kind of LHS or head item and its underlying type.

One problem benchmarking dictionary .get() of a string str versus our current Symbol class, is that Python is constantly changing.

Here are timing results for Python 3.6.15 versus 3.10.8

First 3.6.15:

$ python Python 3.6.15 (default, Sep 21 2021, 05:10:37) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import timeit >>> from mathics.core.systemsymbols import SymbolContext >>> special_cases1 = {"System`Context": 1} >>> special_cases2 = {SymbolContext: 1} >>> timeit.timeit(lambda: special_cases1.get("System`Context")) 0.23206797917373478 >>> timeit.timeit(lambda: special_cases2.get(SymbolContext)) 0.6045385750476271 >>> timeit.timeit(lambda: SymbolContext.get_name()) 0.27853861986659467 >>>

Now Python 3.10.8:

$ python Python 3.10.8 (main, Oct 23 2022, 18:47:59) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import timeit >>> from mathics.core.systemsymbols import SymbolContext >>> special_cases1 = {"System`Context": 1} >>> special_cases2 = {SymbolContext: 1} >>> timeit.timeit(lambda: special_cases1.get("System`Context")) 0.13024636614136398 >>> timeit.timeit(lambda: special_cases2.get(SymbolContext)) 0.3871238350402564 >>> timeit.timeit(lambda: SymbolContext.get_name()) 0.22654549591243267

There are a couple of things to notice. Yes, the way the code as written now would be slightly slower. But notice how that gap decreases as we get to more modern Python. And this I don't think is an accident: we should let Python worry about the implementation of dict.get.

One of our problems in the code is that there are a lot of conversions into and out of strings. And this is part of a general looseness in how things are represented. Since too often have to go with the simple but very inefficient Expression representation we have a lot of conversions into that as well.

So in the timings, note that you just can't compare get of a Symbol with get of a str, but you also need to count the time to do the conversion to a string which happens all over the place and makes for conceptually more difficult code.

More importantly, the reason dict.get() with our Symbol class is slow is because there is this custom __hash()__ function. (the suggestion by @axkr also points in this direction).

The current Mathics code is "too clever by a half". If we were to have a Symbol object that functioned like a Lisp Symbol object you'd find that a get on a dictionary is faster than a get on a string.

For Python 3.10.8:

>>> class Context(): pass ... >>> special_cases3 = {Context: 1} >>> timeit.timeit(lambda: special_cases3.get(Context)) 0.12829051516018808

This takes less time than dict.get on a str (0.13024636614136398) and it doesn't require the conversion.

In sum, we have this very unfortunate sequence of events.

The code, as it is written, is written in a vague, simplistic and sometimes sloppy way; and its intent isn't made clear, tending to be "clever" than clear.

The next generation comes around and says I'll just try to speed it up with low-level tricks without trying to make sense of it. That is not only not working out but in some cases makes the situation worse.

Instead of trying to figure out how to make dict.get run faster, we should be trying to understand how to get the Symbol class just be about Lisp-like Symbols and nothing else. Improvements to the Python interpreter are not going to address this bigger problem that is the source of slowness and conceptual code complexity.

Things like computing sort order on Atomic Elements, or the Mathics binary Equal operator (which is distinct from SameQ) should not exist on Lisp-like Symbols, but instead some other class should do that.

my suggestions

df45e8c

rocky reviewed Nov 4, 2022

View reviewed changes

mmatera merged commit d9b9edc into autoload-forms Nov 4, 2022

mmatera deleted the autoload-forms-mm branch November 4, 2022 13:16

rocky pushed a commit that referenced this pull request Nov 6, 2022

my suggestions (#592)

8713078

rocky pushed a commit that referenced this pull request Nov 6, 2022

my suggestions (#592)

5a1d0de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

my suggestions#592

my suggestions#592
mmatera merged 1 commit intoautoload-formsfrom
autoload-forms-mm

mmatera commented Nov 4, 2022 •

edited by gitpod-io bot

Loading

Uh oh!

rocky commented Nov 4, 2022

Uh oh!

rocky Nov 4, 2022

Uh oh!

mmatera Nov 4, 2022 •

edited

Loading

Uh oh!

axkr Nov 4, 2022 •

edited

Loading

Uh oh!

mmatera Nov 4, 2022

Uh oh!

rocky Nov 4, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mmatera commented Nov 4, 2022 • edited by gitpod-io bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocky commented Nov 4, 2022

Uh oh!

rocky Nov 4, 2022

Choose a reason for hiding this comment

Uh oh!

mmatera Nov 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

axkr Nov 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmatera Nov 4, 2022

Choose a reason for hiding this comment

Uh oh!

rocky Nov 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mmatera commented Nov 4, 2022 •

edited by gitpod-io bot

Loading

mmatera Nov 4, 2022 •

edited

Loading

axkr Nov 4, 2022 •

edited

Loading

rocky Nov 4, 2022 •

edited

Loading