Skip to content

Pass **kwargs through Form functions so rendering has access to a CharacterEncoding option set, e.g. in ToString#1749

Open
rocky wants to merge 5 commits intomasterfrom
makeboxes-kwargs-pass-down
Open

Pass **kwargs through Form functions so rendering has access to a CharacterEncoding option set, e.g. in ToString#1749
rocky wants to merge 5 commits intomasterfrom
makeboxes-kwargs-pass-down

Conversation

@rocky
Copy link
Member

@rocky rocky commented Mar 20, 2026

Also, add CharacterEncoding="ASCII" in the test helper

Makes ToString[x >= b, CharacterEncoding="ASCII"] not convert >= to Unicode.

@rocky rocky marked this pull request as draft March 20, 2026 22:39
@rocky rocky requested a review from mmatera March 20, 2026 22:39
@rocky
Copy link
Member Author

rocky commented Mar 20, 2026

While this causes some tests to fail, it fills one path where options are not getting used. Subsequent commits address this

Noticed in trying to understand what's up with #1735

@rocky rocky force-pushed the makeboxes-kwargs-pass-down branch from 938f1a2 to fbfee4e Compare March 23, 2026 14:59
@rocky rocky marked this pull request as ready for review March 23, 2026 16:48
@rocky
Copy link
Member Author

rocky commented Mar 23, 2026

One thing observed in this PR... I had hoped that with these changes, we could remove setting MATHICS_CHARACTER_ENCODING="ASCII".

Because symbols can appear in error messages that reside outside of the ToString-wrapped evaluation, we can't. Possibly, tests that expect errors can be grouped in an environment that sets this up.

@mmatera
Copy link
Contributor

mmatera commented Mar 23, 2026

@rocky, again, I think CharacterEncoding is not a parameter used by MakeBoxes, but is something that happends at the render level. This is one experiment in WMA that shows that
This produces a Boxes representation of the expression Cross[a,b]

In[1]:= $CharacterEncoding                                                                                                                                                    

Out[1]= UTF-8

In[2]:= boxes1=Cross[a,b]//MakeBoxes                                                                                                                                          

Out[2]= RowBox[{a, , b}]

In[3]:= op1=boxes1[[1,2]]                                                                                                                                                     

Out[3]= 

In[4]:= ToCharacterCode[op1]                                                                                                                                                  

Out[4]= {62624}

Notice that MakeBoxes uses the operator "\[Cross]" with UFT-8 representation "" (character 62624).

Now I set another encoding, and do the same:

In[5]:= $CharacterEncoding="ASCII"                                                                                                                                            

Out[5]= ASCII

In[6]:= boxes2=Cross[a,b]//MakeBoxes                                                                                                                                          

Out[6]= RowBox[{a, x, b}]

In[7]:= op2=boxes2[[1,2]]                                                                                                                                                     

Out[7]= x

In[8]:= ToCharacterCode[op2]                                                                                                                                                  

Out[8]= {62624}

Notice that internally, op2 is still the character 62624, but when it is rendered, now is shown as x.

Besides, since internally MakeBoxes does not look into the encoding,

In[9]:= op1==op2                                                                                                                                                             

Out[9]= True

In[10]:= boxes1==boxes2                                                                                                                                                       

Out[10]= True

the resulting Box expressions are considered equal, even the strings obtained from rendering are different.

@mmatera
Copy link
Contributor

mmatera commented Mar 23, 2026

Also,

In[1]:= $CharacterEncoding                                                                                                                                                    

Out[1]= UTF-8

In[2]:= s1=OutputForm["a\[Cross]b"]//MakeBoxes                                                                                                                                

Out[2]= InterpretationBox[PaneBox["ab"], ab, Editable -> False]

In[3]:= $CharacterEncoding="ASCII"                                                                                                                                            

Out[3]= ASCII

In[4]:= s2=OutputForm["a\[Cross]b"]//MakeBoxes                                                                                                                                

Out[4]= InterpretationBox[PaneBox["axb"], axb, Editable -> False]

In[5]:= s1==s2                                                                                                                                                                

Out[5]= True

In[6]:= StringLength["a\[Cross]b"]                                                                                                                                            

Out[6]= 3

So the same happens when OutputForm is used: CharacterEncoding affects special characters just at render time, not in its internal representation. Otherwise, Out[5] must be False.

@rocky
Copy link
Member Author

rocky commented Mar 23, 2026

@rocky, again, I think CharacterEncoding is not a parameter used by MakeBoxes, but is something that happends at the render level.
[Lots of low-level commands to support this idea]...

I should have made it clear earlier that I agree with that. That's not what's going on here.

There is no code inside this PR or elsewhere in MakeBoxes that decides to set an encoding parameter.

This PR addresses the issue that certain built-in functions, such as ToString, accept a CharacterEncoding parameter. In order to have that filter down to a render routine, it passes this information through Form routines. It does not have direct access to the render routine, just the Form.

Setting an optional encoding parameter, I suspect, is the most natural way for ToString to let the render routine know (via the form) that an encoding has been set in this expression.

As we have seen too many times, yes, there are numerous bugs. Mathics3/Mathics3-scanner#166 was opened and fixed as a result of investigating this issue.

The changes here, do not change the behavior of any of the examples you give, either to fix them or, as far as I can tell, to make it harder to fix.

What this PR addresses is what was reported initially:

Makes ToString[x >= b, CharacterEncoding="ASCII"] not convert >= to Unicode.

I don't see any of the other stuff is related to this. I am not seeing how these changes preclude others that fix those other bugs.

@mmatera
Copy link
Contributor

mmatera commented Mar 23, 2026

@rocky, again, I think CharacterEncoding is not a parameter used by MakeBoxes, but is something that happends at the render level.
[Lots of low-level commands to support this idea]...

I should have made it clearer earlier that I agree with that. That's not what motivated this, and that's not what's going on here.

There is no code inside MakeBoxes that decides to set an encoding parameter.

This PR is to address the problem that certain built-in functions like ToString accept a CharacterEncoding parameter. In order to have that filter down to a render routine, it passes this information through Form routines. It does not have direct access to the render routine, just the Form.

Setting an optional encoding parameter, I suspect, is the most natural way for ToString to let the render routine know (via the form) that an encoding has been set in this expression.

Notice that ToString calls boxes_to_text, which calls the text render function. That function accepts an encoding parameter via kwargs. What is missing is the function that takes a String in the internal encoding and maps it to the final representation. This is what I was doing in #1735.

@rocky
Copy link
Member Author

rocky commented Mar 23, 2026

@rocky, again, I think CharacterEncoding is not a parameter used by MakeBoxes, but is something that happends at the render level.
[Lots of low-level commands to support this idea]...

I should have made it clearer earlier that I agree with that. That's not what motivated this, and that's not what's going on here.
There is no code inside MakeBoxes that decides to set an encoding parameter.
This PR is to address the problem that certain built-in functions like ToString accept a CharacterEncoding parameter. In order to have that filter down to a render routine, it passes this information through Form routines. It does not have direct access to the render routine, just the Form.
Setting an optional encoding parameter, I suspect, is the most natural way for ToString to let the render routine know (via the form) that an encoding has been set in this expression.

Notice that ToString calls boxes_to_text, which calls the text render function. That function accepts an encoding parameter via kwargs.

Right. And what we saw in #1735 is that a conversion in there had already taken place via a prior rendering through a Form format. This corrects that problem so that in #1735 the "unconversion" is not needed.

What is missing is the function that takes a String in the internal encoding and maps it to the final representation. This is what I was doing in #1735.

I am not 100% sure that this is exactly what's missing. I suspect a bit of #1735, however, is needed

@mmatera
Copy link
Contributor

mmatera commented Mar 23, 2026

Again, in WMA neither Format or Makeboxes takes that argument, for the reason I tried to explain before. The problem right now is that ToString in master is not taking the CharacterEncoding parameter. I will try to go over this tomorrow.

@rocky
Copy link
Member Author

rocky commented Mar 23, 2026

Again, in WMA neither Format or Makeboxes takes that argument, for the reason I tried to explain before.

This PR does not add a user-settable parameter in any MakeBox builtin function.

The encoding parameter exists only at eval_... inside mathics.format.box.outputforms, and it appears after the evaluation parameter, which can't occur in a class method call. I did this intentionally to suggest that this is not just a direct call from the builtin class's eval method.

The problem right now is that ToString in master is not taking the CharacterEncoding parameter.

And it does with this PR :-)

I will try to go over this tomorrow.

Sure. There may be another approach or solution. We should discuss and pick the best one.

@rocky rocky changed the title Pass **kwargs for encoding Pass **kwargs through Form function so rendering has access to any CharacterEncoding option set Mar 23, 2026
@rocky rocky changed the title Pass **kwargs through Form function so rendering has access to any CharacterEncoding option set Pass **kwargs through Form functions so rendering has access to any CharacterEncoding option set Mar 23, 2026
@rocky rocky changed the title Pass **kwargs through Form functions so rendering has access to any CharacterEncoding option set Pass **kwargs through Form functions so rendering has access to any CharacterEncoding option set Mar 23, 2026
@rocky rocky changed the title Pass **kwargs through Form functions so rendering has access to any CharacterEncoding option set Pass **kwargs through Form functions so rendering has access to a CharacterEncoding option set, e.g. in ToString Mar 23, 2026
rocky added 5 commits March 24, 2026 18:19
Also, add CharacterEncoding="ASCII" in test helper
In particular, add examples with the CharacterEncoding option.
due to a character code page problem
@rocky rocky force-pushed the makeboxes-kwargs-pass-down branch from 300b9e5 to 8ccf13b Compare March 24, 2026 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants