Fixed Process Watcher garbled text on Windows with non-UTF-8 locales. #9457 by RohitBhati8269 · Pull Request #9749 · pgadmin-org/pgadmin4

RohitBhati8269 · 2026-03-16T07:47:38Z

Detail Info:- #9457

Summary by CodeRabbit

Bug Fixes
- Fixed corrupted/garbled text in process logs on Windows with non-UTF-8 locales by detecting subprocess output encoding and ensuring logs are stored and displayed as UTF-8 (with a UTF-8 fallback), so subprocess output appears correctly across platforms.

coderabbitai · 2026-03-16T07:48:02Z

Walkthrough

Detect subprocess output encoding at startup using locale.getpreferredencoding(False) (with UTF-8 fallback) and store it in _subprocess_encoding. ProcessLogger now decodes incoming subprocess output bytes with that encoding (errors replaced), strips leading line breaks, and re-encodes to UTF-8 before writing.

Changes

Cohort / File(s)	Summary
Subprocess Output Encoding Handling `web/pgadmin/misc/bgprocess/process_executor.py`	Added `locale` import and module-level `_subprocess_encoding`. Initialize `_subprocess_encoding` on startup using `locale.getpreferredencoding(False)` with UTF-8 fallback. ProcessLogger log handling: strip leading line breaks, decode bytes using detected encoding with error replacement, then re-encode to UTF-8 before writing. Updated docstrings/comments to reflect bytes input and encoding conversion.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically identifies the main change: fixing garbled text in the Process Watcher on Windows with non-UTF-8 locales, which directly aligns with the code changes that implement subprocess encoding detection and UTF-8 conversion.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can generate a title for your PR based on the changes with custom instructions.

Set the reviews.auto_title_instructions setting to generate a title for your PR based on the changes in the PR with custom instructions.

coderabbitai

🧹 Nitpick comments (1)

web/pgadmin/misc/bgprocess/process_executor.py (1)

215-223: Consider using codecs.lookup() for robust UTF-8 detection.

The check _subprocess_encoding.lower() not in ('utf-8', 'utf8') may miss other UTF-8 aliases. For example, on Windows, 'cp65001' is UTF-8 but wouldn't match, causing unnecessary decode/encode cycles.

Also, with the 'replace' error handler on line 219, UnicodeDecodeError should never be raised—only LookupError can occur if the encoding name is invalid.

♻️ Suggested improvement using codecs normalization

+import codecs
+
+def _is_utf8_encoding(encoding):
+    """Check if encoding is a UTF-8 variant."""
+    try:
+        return codecs.lookup(encoding).name == 'utf-8'
+    except LookupError:
+        return False
+
 # In the log method:
-                if _subprocess_encoding and _subprocess_encoding.lower() not in (
-                        'utf-8', 'utf8'):
+                if _subprocess_encoding and not _is_utf8_encoding(
+                        _subprocess_encoding):
                     try:
                         msg = msg.decode(
                             _subprocess_encoding, 'replace'
                         ).encode('utf-8')
-                    except (UnicodeDecodeError, LookupError):
+                    except LookupError:
                         # If decoding fails, write as-is
                         pass

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/misc/bgprocess/process_executor.py` around lines 215 - 223, The
current encoding check on _subprocess_encoding can miss UTF-8 aliases and
unnecessarily re-encode; use codecs.lookup(_subprocess_encoding).name to get the
canonical encoding name and compare that to 'utf-8' (or compare lowercased
canonical name) before doing the decode/encode, and only catch LookupError from
codecs.lookup (since using 'replace' prevents UnicodeDecodeError); update the
block that decodes msg (the lines referencing _subprocess_encoding and
msg.decode(...).encode('utf-8')) to perform a codecs.lookup check and remove
UnicodeDecodeError from the except clause, keeping only LookupError handling.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@web/pgadmin/misc/bgprocess/process_executor.py`:
- Around line 215-223: The current encoding check on _subprocess_encoding can
miss UTF-8 aliases and unnecessarily re-encode; use
codecs.lookup(_subprocess_encoding).name to get the canonical encoding name and
compare that to 'utf-8' (or compare lowercased canonical name) before doing the
decode/encode, and only catch LookupError from codecs.lookup (since using
'replace' prevents UnicodeDecodeError); update the block that decodes msg (the
lines referencing _subprocess_encoding and msg.decode(...).encode('utf-8')) to
perform a codecs.lookup check and remove UnicodeDecodeError from the except
clause, keeping only LookupError handling.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 396bf3b0-c2b1-477e-a23c-c6636d52126a

📥 Commits

Reviewing files that changed from the base of the PR and between b301bdc and 91f8e8c.

📒 Files selected for processing (1)

web/pgadmin/misc/bgprocess/process_executor.py

…gadmin-org#9457

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

web/pgadmin/misc/bgprocess/process_executor.py (3)

215-223: Consider handling the _subprocess_encoding is None case more explicitly.

While the current code safely falls through (no conversion) when _subprocess_encoding is None, this could happen if the module is imported/used outside the if __name__ == '__main__': block. A defensive comment or explicit handling might improve clarity.

Optional: Add a comment for clarity

                 # Convert subprocess output from system encoding to UTF-8
                 # This fixes garbled text on Windows with non-UTF-8 locales
                 # (e.g., Japanese CP932, Chinese GBK)
                 msg = msg.lstrip(b'\r\n' if _IS_WIN else b'\n')
-                if _subprocess_encoding and \
+                # _subprocess_encoding is set at startup; if None, skip conversion
+                if _subprocess_encoding and \
                         _subprocess_encoding.lower() not in ('utf-8', 'utf8'):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/misc/bgprocess/process_executor.py` around lines 215 - 223, The
branch that converts msg based on _subprocess_encoding should explicitly handle
the None case for clarity: update the conditional around _subprocess_encoding in
process_executor.py (the block referencing _subprocess_encoding and the
msg.decode(...).encode('utf-8') logic) to either add an explicit early-check for
_subprocess_encoding is None (and skip conversion) or add a concise defensive
comment explaining that None means no conversion is performed; ensure the check
remains: if _subprocess_encoding and _subprocess_encoding.lower() not in
('utf-8', 'utf8'): and document the None behavior so future imports/usages
outside __main__ are clear.

214-214: Verify the necessity of lstrip() on subprocess output.

The lstrip(b'\r\n' if _IS_WIN else b'\n') strips all leading newline characters. Since readline() in the run() method returns lines with trailing (not leading) newlines, this might be addressing a specific edge case. However, lstrip() could inadvertently remove meaningful empty lines if the subprocess output begins with intentional blank lines.

Was this added to handle a specific scenario with garbled output, or could it be narrowed to a more targeted fix?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/misc/bgprocess/process_executor.py` at line 214, The code
currently uses msg = msg.lstrip(b'\r\n' if _IS_WIN else b'\n') which strips all
leading newlines and can remove intentional blank lines; in the run() method
narrow this to only remove a single leading newline sequence when present
(rather than all leading newline characters). Locate the msg variable in
process_executor.py (inside run()) and replace the lstrip call with logic that
checks startswith(b'\r\n') on Windows or startswith(b'\n') otherwise and then
slices off only that one sequence (e.g., msg = msg[len(prefix):] when present),
so you preserve multiple intentional leading blank lines while handling the
specific garbled-leading-newline case.

191-229: Consider adding test coverage for encoding conversion.

This fix addresses a platform-specific issue (Windows with non-UTF-8 locales like CP932, GBK). Manual testing across different locales is difficult. Consider adding unit tests that mock _subprocess_encoding to verify:

UTF-8 input passes through unchanged
Non-UTF-8 input (e.g., CP932 bytes) is correctly converted to UTF-8
Invalid/malformed bytes are handled gracefully with 'replace'

Would you like me to help draft test cases for this encoding conversion logic?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/misc/bgprocess/process_executor.py` around lines 191 - 229, Add
unit tests for process_executor.log to cover encoding conversion: create a
ProcessExecutor instance (or the class under test) with a writable in-memory
logger and mock get_current_time to a fixed value, then set the module-global
_subprocess_encoding to each scenario and call log(msg) with representative byte
inputs; verify the logger content contains the fixed timestamp prefix and the
UTF-8 payload. Test cases: (1) _subprocess_encoding set to 'utf-8' and msg
already UTF-8 — assert bytes passed through unchanged (minus leading newline
stripping); (2) set _subprocess_encoding to a non-UTF8 encoding (e.g., 'cp932')
and provide bytes that decode under that encoding — assert output is converted
to UTF-8; (3) provide malformed bytes and set _subprocess_encoding to a non-UTF8
value to force the except path — assert log still writes something (no
exception) and uses replacement characters. Ensure tests restore/cleanup module
globals (_subprocess_encoding, _IS_WIN if modified) and the in-memory logger
between tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/pgadmin/misc/bgprocess/process_executor.py`:
- Around line 210-225: The log writer currently converts subprocess output using
_subprocess_encoding and then writes bytes via self.logger.write(msg), which
leads to log files encoded as UTF-8 but read back using sys.getdefaultencoding()
in read_log(), causing garbled text on Windows non-UTF-8 locales; fix by
normalizing to UTF-8 at the write side and making the reader explicitly use
UTF-8: in process_executor.py ensure the msg path always encodes to UTF-8 (i.e.,
after any decoding/replacement do .encode('utf-8') unconditionally before
calling self.logger.write), and update read_log() in processes.py to decode
using 'utf-8' (not sys.getdefaultencoding()) so both writer (msg handling) and
reader (read_log) consistently use UTF-8.

---

Nitpick comments:
In `@web/pgadmin/misc/bgprocess/process_executor.py`:
- Around line 215-223: The branch that converts msg based on
_subprocess_encoding should explicitly handle the None case for clarity: update
the conditional around _subprocess_encoding in process_executor.py (the block
referencing _subprocess_encoding and the msg.decode(...).encode('utf-8') logic)
to either add an explicit early-check for _subprocess_encoding is None (and skip
conversion) or add a concise defensive comment explaining that None means no
conversion is performed; ensure the check remains: if _subprocess_encoding and
_subprocess_encoding.lower() not in ('utf-8', 'utf8'): and document the None
behavior so future imports/usages outside __main__ are clear.
- Line 214: The code currently uses msg = msg.lstrip(b'\r\n' if _IS_WIN else
b'\n') which strips all leading newlines and can remove intentional blank lines;
in the run() method narrow this to only remove a single leading newline sequence
when present (rather than all leading newline characters). Locate the msg
variable in process_executor.py (inside run()) and replace the lstrip call with
logic that checks startswith(b'\r\n') on Windows or startswith(b'\n') otherwise
and then slices off only that one sequence (e.g., msg = msg[len(prefix):] when
present), so you preserve multiple intentional leading blank lines while
handling the specific garbled-leading-newline case.
- Around line 191-229: Add unit tests for process_executor.log to cover encoding
conversion: create a ProcessExecutor instance (or the class under test) with a
writable in-memory logger and mock get_current_time to a fixed value, then set
the module-global _subprocess_encoding to each scenario and call log(msg) with
representative byte inputs; verify the logger content contains the fixed
timestamp prefix and the UTF-8 payload. Test cases: (1) _subprocess_encoding set
to 'utf-8' and msg already UTF-8 — assert bytes passed through unchanged (minus
leading newline stripping); (2) set _subprocess_encoding to a non-UTF8 encoding
(e.g., 'cp932') and provide bytes that decode under that encoding — assert
output is converted to UTF-8; (3) provide malformed bytes and set
_subprocess_encoding to a non-UTF8 value to force the except path — assert log
still writes something (no exception) and uses replacement characters. Ensure
tests restore/cleanup module globals (_subprocess_encoding, _IS_WIN if modified)
and the in-memory logger between tests.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 006b5aef-d407-4a91-9c1a-c65905e12f47

📥 Commits

Reviewing files that changed from the base of the PR and between 91f8e8c and 9b92c45.

📒 Files selected for processing (1)

web/pgadmin/misc/bgprocess/process_executor.py

web/pgadmin/misc/bgprocess/process_executor.py

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

Fixed Process Watcher garbled text on Windows with non-UTF-8 locales. p…

9b92c45

…gadmin-org#9457

RohitBhati8269 force-pushed the GH-9457 branch from 91f8e8c to 9b92c45 Compare March 16, 2026 10:24

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

web/pgadmin/misc/bgprocess/process_executor.py Show resolved Hide resolved

akshay-joshi merged commit 74b1a3b into pgadmin-org:master Mar 16, 2026
33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed Process Watcher garbled text on Windows with non-UTF-8 locales. #9457#9749

Fixed Process Watcher garbled text on Windows with non-UTF-8 locales. #9457#9749
akshay-joshi merged 1 commit intopgadmin-org:masterfrom
RohitBhati8269:GH-9457

RohitBhati8269 commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 16, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RohitBhati8269 commented Mar 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RohitBhati8269 commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 16, 2026 •

edited

Loading