fix(paste): detect heading levels from Google Docs styled paragraphs#2178
fix(paste): detect heading levels from Google Docs styled paragraphs#2178ErickPetru wants to merge 4 commits intosuperdoc-dev:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 167f5aa820
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
packages/super-editor/src/core/inputRules/google-docs-paste/google-docs-paste.js
Outdated
Show resolved
Hide resolved
packages/super-editor/src/core/inputRules/google-docs-paste/google-docs-paste.js
Outdated
Show resolved
Hide resolved
caio-pizzol
left a comment
There was a problem hiding this comment.
nice work on this, thanks for picking it up! the code is clean, tests are thorough, and the prior bot feedback (bold false positives + list-item corruption) is both addressed in the current version.
two things worth discussing:
multi-span headings won't be detected
getHeadingStyleProps only reads styles from a single child <span> (children.length === 1). a heading that contains a link, bookmark, or line break will produce multiple child elements — and since Google Docs usually puts the styles on the spans (not the <p>), those headings will be silently skipped.
e.g. this wouldn't be detected:
<p>
<span style="font-size:20pt;font-weight:700">Heading with </span>
<a href="..."><span style="font-size:20pt;font-weight:700">a link</span></a>
</p>one option: check that all child elements share the same font-size and bold status instead of requiring exactly one span. up to you whether that's worth the added complexity for this PR or a follow-up.
minor: closest('li') instead of parentElement
the list-item filter only checks the immediate parent. the rest of the pipeline makes the same assumption (<li> > <p> direct child), so it's consistent. but !p.closest('li') would be a zero-cost safety net in case Google Docs ever wraps list content differently.
|
Thanks for the thorough review, @caio-pizzol! Both points addressed:
|
caio-pizzol
left a comment
There was a problem hiding this comment.
@ErickPetru the full-paragraph bold check and the closest('li') fix both look good — nice work on those.
one thing still worth a look: left an inline comment on the paragraph-level bold case in fromSpans. it's not a blocker since the pattern is uncommon in real Google Docs output, but an easy one-liner if you want to close the gap.
| const [firstSpanSize] = sizes; | ||
| if (sizes.some((size) => size !== firstSpanSize)) return notHeading; | ||
|
|
||
| return { fontSize: firstSpanSize, isBold: allSpansBold }; |
There was a problem hiding this comment.
fromSpans checks bold on the child spans, but if bold is set on the <p> itself (<p style="font-weight:700">) with font-size only on spans, allSpansBold will be false and the heading gets skipped. worth patching for consistency?
| return { fontSize: firstSpanSize, isBold: allSpansBold }; | |
| return { fontSize: firstSpanSize, isBold: allSpansBold || boldWeightRegex.test(el.style.fontWeight || '') }; |
There was a problem hiding this comment.
Good catch, applied your suggestion and also extended it so elIsBold (the root element's bold) is checked consistently in both fromElement and fromSpans paths.
Problem
Closes #2152
Cause
Google Docs converts most heading levels to
<p>tags with inlinefont-size/font-weightstyling instead of semantic<h1>–<h6>tags. ProseMirror'sparagraph.parseDOMhas rules forh1–h6but they never fire for these styled<p>elements, so the heading level is silently dropped.Fix
convertStyledHeadingsto the Google Docs paste pipeline, running before ProseMirror parses the pasted DOM<p>elements carrying bold + a recognised font-size (checked on both the<p>itself and its first child<span>, covering both Google Docs style placements) and replaces them with the corresponding<h1>–<h5>elementh1)