pr: don't convert to String when storing lines to print#11327
Merged
cakebaker merged 4 commits intouutils:mainfrom Mar 16, 2026
Merged
pr: don't convert to String when storing lines to print#11327cakebaker merged 4 commits intouutils:mainfrom
cakebaker merged 4 commits intouutils:mainfrom
Conversation
|
GNU testsuite comparison: |
Contributor
Author
|
the error on the test seems infrastructure issue unrelated to this change — the VM fails at SSH connection. Style and lint checks passed |
cakebaker
reviewed
Mar 15, 2026
| let formatted_line_number = get_formatted_line_number(options, file_line.line_number, index); | ||
|
|
||
| let mut complete_line = format!("{formatted_line_number}{}", file_line.line_content); | ||
| let content = String::from_utf8_lossy(&file_line.line_content); |
Contributor
There was a problem hiding this comment.
While this is an improvement to the previous code, it's not entirely correct. GNU pr also prints non-UTF8 characters.
You can see the difference with a tool like hexdump:
$ printf $'\xFFfoo' | cargo run -q pr | hexdump -C
[snipped]
*
00000040 20 20 20 20 50 61 67 65 20 31 0a 0a 0a ef bf bd | Page 1......|
00000050 66 6f 6f 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a |foo.............|
00000060 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a |................|
*
00000090
$ printf $'\xFFfoo' | pr | hexdump -C
[snipped]
*
00000040 20 20 20 20 50 61 67 65 20 31 0a 0a 0a ff 66 6f | Page 1....fo|
00000050 6f 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a |o...............|
00000060 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a |................|
*
00000080 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a |..............|
0000008e
My suggestion is to add a todo that support for non-UTF8 is not implemented yet.
cakebaker
reviewed
Mar 15, 2026
src/uu/pr/src/pr.rs
Outdated
Comment on lines
899
to
901
| /// # Errors | ||
| /// | ||
| /// Returns an error if the bytes are not a valid UTF-8 string. |
Contributor
There was a problem hiding this comment.
A detail: this part of the comment is no longer valid and can be removed.
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
Contributor
|
Thanks for your PR! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changed
FileLine.line_contentfromStringtoVec<u8>to avoidunnecessary byte → String → byte conversions.
Changes:
FileLine.line_contentis nowVec<u8>apply_expand_tabnow takes&mut Vec<u8>instead of&mut Stringfrom_bufno longer needs to returnResultsince UTF-8 validationis no longer needed
get_pagesalso no longer needs to returnResultas a cascade effectget_line_for_printingusesString::from_utf8_lossyonly at theformatting stage
Note:
test_dd::test_iso8859_1_case_conversionfails but is pre-existingand unrelated to this change.
Fixes #10333