Skip to content

Fix lexing for unterminated strings/heredocs etc.#3924

Merged
kddnewton merged 1 commit intoruby:mainfrom
Earlopain:unterminated-heredoc-v2
Feb 13, 2026
Merged

Fix lexing for unterminated strings/heredocs etc.#3924
kddnewton merged 1 commit intoruby:mainfrom
Earlopain:unterminated-heredoc-v2

Conversation

@Earlopain
Copy link
Collaborator

When we hit EOF and still have lex modes left, it means some content was unterminated. Heredocs specifically have logic that needs to happen when the body finished lexing. If we don't reset the mode back to how it was before, it will not continue lexing at the correct place.

Followup to #3918. We can't call into parser_lex since it resets token locations. So I went back to goto.

Closes #3911

When we hit EOF and still have lex modes left, it means some content was unterminated.
Heredocs specifically have logic that needs to happen when the body finished lexing.
If we don't reset the mode back to how it was before, it will not continue lexing at the correct place.

Followup to ruby#3918.
We can't call into `parser_lex` since it resets token locations.
pm_statements_node_t *statements = NULL;

if (!match1(parser, PM_TOKEN_EMBEXPR_END)) {
if (!match3(parser, PM_TOKEN_EMBEXPR_END, PM_TOKEN_HEREDOC_END, PM_TOKEN_EOF)) {
Copy link
Collaborator Author

@Earlopain Earlopain Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#{ currently gives the interpolation a statement with MissingNode. That's not correct, the missing end token leads to syntax error. It also messes up the locations when it finds the synthetic heredoc end token. I don't think there are any other tokens to consider here, should be all hopefully.

# Before
Prism.parse("\"\#{").value.statements.body[0]
=> 
@ InterpolatedStringNode (location: (1,0)-(1,3))
├── flags: newline
├── opening_loc: (1,0)-(1,1) = "\""
├── parts: (length: 1)
   └── @ EmbeddedStatementsNode (location: (1,1)-(1,3))
       ├── flags: 
       ├── opening_loc: (1,1)-(1,3) = "\#{"
       ├── statements:
          @ StatementsNode (location: (1,1)-(1,3))
          ├── flags: 
          └── body: (length: 1)
              └── @ MissingNode (location: (1,1)-(1,3))
                  └── flags: 
       └── closing_loc: (1,3)-(1,3) = ""
└── closing_loc: 

# After
Prism.parse("\"\#{").value.statements.body[0]
=> 
@ InterpolatedStringNode (location: (1,0)-(1,3))
├── flags: newline
├── opening_loc: (1,0)-(1,1) = "\""
├── parts: (length: 1)
   └── @ EmbeddedStatementsNode (location: (1,1)-(1,3))
       ├── flags: 
       ├── opening_loc: (1,1)-(1,3) = "\#{"
       ├── statements: 
       └── closing_loc: (1,3)-(1,3) = ""
└── closing_loc: 

@kddnewton kddnewton merged commit 27c24fd into ruby:main Feb 13, 2026
67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prism.lex_compat creates wrong on_sp token when used with heredoc and unclosed embexpr

2 participants