Skip to content

HTML API: Use locked tokens to implement safe fragment parsing#7912

Draft
sirreal wants to merge 26 commits intoWordPress:trunkfrom
sirreal:html-api/try-context-stacks
Draft

HTML API: Use locked tokens to implement safe fragment parsing#7912
sirreal wants to merge 26 commits intoWordPress:trunkfrom
sirreal:html-api/try-context-stacks

Conversation

@sirreal
Copy link
Member

@sirreal sirreal commented Nov 28, 2024

Address problems using the HTML specification for fragment parsing which can lead to documents that are impossible to represent in HTML.

As an HTML processor, documents that cannot be represented in HTML should be rejected. This includes and document fragment (a document with a context) where the fragment could leak out of the context.

This PR includes a number of tests with examples of problematic HTML, but a simple example is the HTML <p> in the context of a P element. If this is naively parsed, it would lead to a tree like P > P, which cannot be represented in HTML. This document would bail with an unsupported error upon encountering the <p> tag in the context of the P element.


Implementation

Instead of using a simple context element in the fragment parser, this change moves a copy of the stack of open elements, the stack of active formatting elements, and the head and form element pointers into the context processor. These elements have a new locked property set. The implementation is adapted to prevent locked items from being modified on the stack or element pointers from being modified.

The goal is to maintain a coherent HTML structure of the fragment document inside its context.

Trac ticket: https://core.trac.wordpress.org/ticket/62584


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@github-actions
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@sirreal sirreal force-pushed the html-api/try-context-stacks branch from 5067487 to 950bade Compare November 28, 2024 15:27
@sirreal sirreal force-pushed the html-api/try-context-stacks branch 2 times, most recently from 4c59182 to 5adcba3 Compare December 2, 2024 16:26
@sirreal
Copy link
Member Author

sirreal commented Dec 2, 2024

This approach makes set_inner_html relatively easy to implement in a safe way: #7932

@sirreal sirreal changed the title HTML API: Try using context stacks instead of a context element HTML API: Use locked tokens to implement safe fragment parsing Dec 2, 2024
@sirreal
Copy link
Member Author

sirreal commented Dec 10, 2024

When trying to control contexts like this, the insertion mode is also dangerous and is problematic in this PR as implemented.

The fragment implementation currently on trunk is correct where a BODY context element is not on the stack of open elements. This causes </body> tokens to be ignored. The same is true for </html>

In this PR, these checks do not behave as expected in the spec so could cause the parser to incorrectly move into after-body or after-after-body insertion modes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant