Fixed Process Watcher garbled text on Windows with non-UTF-8 locales. #9457#9749
Conversation
WalkthroughDetect subprocess output encoding at startup using Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can generate a title for your PR based on the changes with custom instructions.Set the |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
web/pgadmin/misc/bgprocess/process_executor.py (1)
215-223: Consider usingcodecs.lookup()for robust UTF-8 detection.The check
_subprocess_encoding.lower() not in ('utf-8', 'utf8')may miss other UTF-8 aliases. For example, on Windows,'cp65001'is UTF-8 but wouldn't match, causing unnecessary decode/encode cycles.Also, with the
'replace'error handler on line 219,UnicodeDecodeErrorshould never be raised—onlyLookupErrorcan occur if the encoding name is invalid.♻️ Suggested improvement using codecs normalization
+import codecs + +def _is_utf8_encoding(encoding): + """Check if encoding is a UTF-8 variant.""" + try: + return codecs.lookup(encoding).name == 'utf-8' + except LookupError: + return False + # In the log method: - if _subprocess_encoding and _subprocess_encoding.lower() not in ( - 'utf-8', 'utf8'): + if _subprocess_encoding and not _is_utf8_encoding( + _subprocess_encoding): try: msg = msg.decode( _subprocess_encoding, 'replace' ).encode('utf-8') - except (UnicodeDecodeError, LookupError): + except LookupError: # If decoding fails, write as-is pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/misc/bgprocess/process_executor.py` around lines 215 - 223, The current encoding check on _subprocess_encoding can miss UTF-8 aliases and unnecessarily re-encode; use codecs.lookup(_subprocess_encoding).name to get the canonical encoding name and compare that to 'utf-8' (or compare lowercased canonical name) before doing the decode/encode, and only catch LookupError from codecs.lookup (since using 'replace' prevents UnicodeDecodeError); update the block that decodes msg (the lines referencing _subprocess_encoding and msg.decode(...).encode('utf-8')) to perform a codecs.lookup check and remove UnicodeDecodeError from the except clause, keeping only LookupError handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@web/pgadmin/misc/bgprocess/process_executor.py`:
- Around line 215-223: The current encoding check on _subprocess_encoding can
miss UTF-8 aliases and unnecessarily re-encode; use
codecs.lookup(_subprocess_encoding).name to get the canonical encoding name and
compare that to 'utf-8' (or compare lowercased canonical name) before doing the
decode/encode, and only catch LookupError from codecs.lookup (since using
'replace' prevents UnicodeDecodeError); update the block that decodes msg (the
lines referencing _subprocess_encoding and msg.decode(...).encode('utf-8')) to
perform a codecs.lookup check and remove UnicodeDecodeError from the except
clause, keeping only LookupError handling.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 396bf3b0-c2b1-477e-a23c-c6636d52126a
📒 Files selected for processing (1)
web/pgadmin/misc/bgprocess/process_executor.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
web/pgadmin/misc/bgprocess/process_executor.py (3)
215-223: Consider handling the_subprocess_encoding is Nonecase more explicitly.While the current code safely falls through (no conversion) when
_subprocess_encodingisNone, this could happen if the module is imported/used outside theif __name__ == '__main__':block. A defensive comment or explicit handling might improve clarity.Optional: Add a comment for clarity
# Convert subprocess output from system encoding to UTF-8 # This fixes garbled text on Windows with non-UTF-8 locales # (e.g., Japanese CP932, Chinese GBK) msg = msg.lstrip(b'\r\n' if _IS_WIN else b'\n') - if _subprocess_encoding and \ + # _subprocess_encoding is set at startup; if None, skip conversion + if _subprocess_encoding and \ _subprocess_encoding.lower() not in ('utf-8', 'utf8'):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/misc/bgprocess/process_executor.py` around lines 215 - 223, The branch that converts msg based on _subprocess_encoding should explicitly handle the None case for clarity: update the conditional around _subprocess_encoding in process_executor.py (the block referencing _subprocess_encoding and the msg.decode(...).encode('utf-8') logic) to either add an explicit early-check for _subprocess_encoding is None (and skip conversion) or add a concise defensive comment explaining that None means no conversion is performed; ensure the check remains: if _subprocess_encoding and _subprocess_encoding.lower() not in ('utf-8', 'utf8'): and document the None behavior so future imports/usages outside __main__ are clear.
214-214: Verify the necessity oflstrip()on subprocess output.The
lstrip(b'\r\n' if _IS_WIN else b'\n')strips all leading newline characters. Sincereadline()in therun()method returns lines with trailing (not leading) newlines, this might be addressing a specific edge case. However,lstrip()could inadvertently remove meaningful empty lines if the subprocess output begins with intentional blank lines.Was this added to handle a specific scenario with garbled output, or could it be narrowed to a more targeted fix?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/misc/bgprocess/process_executor.py` at line 214, The code currently uses msg = msg.lstrip(b'\r\n' if _IS_WIN else b'\n') which strips all leading newlines and can remove intentional blank lines; in the run() method narrow this to only remove a single leading newline sequence when present (rather than all leading newline characters). Locate the msg variable in process_executor.py (inside run()) and replace the lstrip call with logic that checks startswith(b'\r\n') on Windows or startswith(b'\n') otherwise and then slices off only that one sequence (e.g., msg = msg[len(prefix):] when present), so you preserve multiple intentional leading blank lines while handling the specific garbled-leading-newline case.
191-229: Consider adding test coverage for encoding conversion.This fix addresses a platform-specific issue (Windows with non-UTF-8 locales like CP932, GBK). Manual testing across different locales is difficult. Consider adding unit tests that mock
_subprocess_encodingto verify:
- UTF-8 input passes through unchanged
- Non-UTF-8 input (e.g., CP932 bytes) is correctly converted to UTF-8
- Invalid/malformed bytes are handled gracefully with 'replace'
Would you like me to help draft test cases for this encoding conversion logic?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/misc/bgprocess/process_executor.py` around lines 191 - 229, Add unit tests for process_executor.log to cover encoding conversion: create a ProcessExecutor instance (or the class under test) with a writable in-memory logger and mock get_current_time to a fixed value, then set the module-global _subprocess_encoding to each scenario and call log(msg) with representative byte inputs; verify the logger content contains the fixed timestamp prefix and the UTF-8 payload. Test cases: (1) _subprocess_encoding set to 'utf-8' and msg already UTF-8 — assert bytes passed through unchanged (minus leading newline stripping); (2) set _subprocess_encoding to a non-UTF8 encoding (e.g., 'cp932') and provide bytes that decode under that encoding — assert output is converted to UTF-8; (3) provide malformed bytes and set _subprocess_encoding to a non-UTF8 value to force the except path — assert log still writes something (no exception) and uses replacement characters. Ensure tests restore/cleanup module globals (_subprocess_encoding, _IS_WIN if modified) and the in-memory logger between tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@web/pgadmin/misc/bgprocess/process_executor.py`:
- Around line 210-225: The log writer currently converts subprocess output using
_subprocess_encoding and then writes bytes via self.logger.write(msg), which
leads to log files encoded as UTF-8 but read back using sys.getdefaultencoding()
in read_log(), causing garbled text on Windows non-UTF-8 locales; fix by
normalizing to UTF-8 at the write side and making the reader explicitly use
UTF-8: in process_executor.py ensure the msg path always encodes to UTF-8 (i.e.,
after any decoding/replacement do .encode('utf-8') unconditionally before
calling self.logger.write), and update read_log() in processes.py to decode
using 'utf-8' (not sys.getdefaultencoding()) so both writer (msg handling) and
reader (read_log) consistently use UTF-8.
---
Nitpick comments:
In `@web/pgadmin/misc/bgprocess/process_executor.py`:
- Around line 215-223: The branch that converts msg based on
_subprocess_encoding should explicitly handle the None case for clarity: update
the conditional around _subprocess_encoding in process_executor.py (the block
referencing _subprocess_encoding and the msg.decode(...).encode('utf-8') logic)
to either add an explicit early-check for _subprocess_encoding is None (and skip
conversion) or add a concise defensive comment explaining that None means no
conversion is performed; ensure the check remains: if _subprocess_encoding and
_subprocess_encoding.lower() not in ('utf-8', 'utf8'): and document the None
behavior so future imports/usages outside __main__ are clear.
- Line 214: The code currently uses msg = msg.lstrip(b'\r\n' if _IS_WIN else
b'\n') which strips all leading newlines and can remove intentional blank lines;
in the run() method narrow this to only remove a single leading newline sequence
when present (rather than all leading newline characters). Locate the msg
variable in process_executor.py (inside run()) and replace the lstrip call with
logic that checks startswith(b'\r\n') on Windows or startswith(b'\n') otherwise
and then slices off only that one sequence (e.g., msg = msg[len(prefix):] when
present), so you preserve multiple intentional leading blank lines while
handling the specific garbled-leading-newline case.
- Around line 191-229: Add unit tests for process_executor.log to cover encoding
conversion: create a ProcessExecutor instance (or the class under test) with a
writable in-memory logger and mock get_current_time to a fixed value, then set
the module-global _subprocess_encoding to each scenario and call log(msg) with
representative byte inputs; verify the logger content contains the fixed
timestamp prefix and the UTF-8 payload. Test cases: (1) _subprocess_encoding set
to 'utf-8' and msg already UTF-8 — assert bytes passed through unchanged (minus
leading newline stripping); (2) set _subprocess_encoding to a non-UTF8 encoding
(e.g., 'cp932') and provide bytes that decode under that encoding — assert
output is converted to UTF-8; (3) provide malformed bytes and set
_subprocess_encoding to a non-UTF8 value to force the except path — assert log
still writes something (no exception) and uses replacement characters. Ensure
tests restore/cleanup module globals (_subprocess_encoding, _IS_WIN if modified)
and the in-memory logger between tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 006b5aef-d407-4a91-9c1a-c65905e12f47
📒 Files selected for processing (1)
web/pgadmin/misc/bgprocess/process_executor.py
Detail Info:- #9457
Summary by CodeRabbit