Skip to content

pr: don't convert to String when storing lines to print#11327

Merged
cakebaker merged 4 commits intouutils:mainfrom
venoosoo:fix-pr-string-to-bytes
Mar 16, 2026
Merged

pr: don't convert to String when storing lines to print#11327
cakebaker merged 4 commits intouutils:mainfrom
venoosoo:fix-pr-string-to-bytes

Conversation

@venoosoo
Copy link
Contributor

@venoosoo venoosoo commented Mar 14, 2026

Changed FileLine.line_content from String to Vec<u8> to avoid
unnecessary byte → String → byte conversions.

Changes:

  • FileLine.line_content is now Vec<u8>
  • apply_expand_tab now takes &mut Vec<u8> instead of &mut String
  • from_buf no longer needs to return Result since UTF-8 validation
    is no longer needed
  • get_pages also no longer needs to return Result as a cascade effect
  • get_line_for_printing uses String::from_utf8_lossy only at the
    formatting stage

Note: test_dd::test_iso8859_1_case_conversion fails but is pre-existing
and unrelated to this change.

Fixes #10333

@github-actions
Copy link

GNU testsuite comparison:

Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.

@venoosoo
Copy link
Contributor Author

the error on the test seems infrastructure issue unrelated to this change — the VM fails at SSH connection. Style and lint checks passed

let formatted_line_number = get_formatted_line_number(options, file_line.line_number, index);

let mut complete_line = format!("{formatted_line_number}{}", file_line.line_content);
let content = String::from_utf8_lossy(&file_line.line_content);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is an improvement to the previous code, it's not entirely correct. GNU pr also prints non-UTF8 characters.

You can see the difference with a tool like hexdump:

$ printf $'\xFFfoo' | cargo run -q pr | hexdump -C
[snipped]
*
00000040  20 20 20 20 50 61 67 65  20 31 0a 0a 0a ef bf bd  |    Page 1......|
00000050  66 6f 6f 0a 0a 0a 0a 0a  0a 0a 0a 0a 0a 0a 0a 0a  |foo.............|
00000060  0a 0a 0a 0a 0a 0a 0a 0a  0a 0a 0a 0a 0a 0a 0a 0a  |................|
*
00000090
$ printf $'\xFFfoo' | pr | hexdump -C
[snipped]
*
00000040  20 20 20 20 50 61 67 65  20 31 0a 0a 0a ff 66 6f  |    Page 1....fo|
00000050  6f 0a 0a 0a 0a 0a 0a 0a  0a 0a 0a 0a 0a 0a 0a 0a  |o...............|
00000060  0a 0a 0a 0a 0a 0a 0a 0a  0a 0a 0a 0a 0a 0a 0a 0a  |................|
*
00000080  0a 0a 0a 0a 0a 0a 0a 0a  0a 0a 0a 0a 0a 0a        |..............|
0000008e

My suggestion is to add a todo that support for non-UTF8 is not implemented yet.

Comment on lines 899 to 901
/// # Errors
///
/// Returns an error if the bytes are not a valid UTF-8 string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A detail: this part of the comment is no longer valid and can be removed.

@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/misc/io-errors. tests/misc/io-errors is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/cut/bounded-memory (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/date/date-locale-hour (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/rm/many-dir-entries-vs-OOM is now being skipped but was previously passing.

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/date/date-locale-hour (fails in this run but passes in the 'main' branch)

@cakebaker cakebaker merged commit bc58d3d into uutils:main Mar 16, 2026
160 checks passed
@cakebaker
Copy link
Contributor

Thanks for your PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pr: don't convert to String when storing lines to print

2 participants