Fix race condition in new_session() by avoiding list-sessions query#625
Fix race condition in new_session() by avoiding list-sessions query#625neubig wants to merge 5 commits intotmux-python:masterfrom
Conversation
0f803dd to
e81a44f
Compare
|
Thanks @tony! Signed the CLA. I checked and the version is And in terms of CI where this failed, the reason why we were able to identify this is because this CI run failed: https://github.com/OpenHands/software-agent-sdk/actions/runs/21638051841/job/62369267088 We were able to trace it back to the upgrade from Python 3.12 to 3.13, and after reverting to 3.12, the consistent failures due to the issue I cited here disappeared (although this particular run still has a few failures): https://github.com/OpenHands/software-agent-sdk/actions/runs/21710957108 |
Previously, new_session() would run 'tmux new-session -P -F#{session_id}'
then immediately query 'tmux list-sessions' to fetch full session data.
This created a race condition in PyInstaller + Python 3.13+ + Docker
environments where list-sessions might not see the newly created session.
The fix expands the -F format string to include all Obj fields, parsing
the output directly into a Session object without a separate query.
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Tony Narlock <tony@git-pull.com>
Co-authored-by: Tony Narlock <tony@git-pull.com>
8b00212 to
058ee6d
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #625 +/- ##
==========================================
+ Coverage 45.39% 45.43% +0.03%
==========================================
Files 22 22
Lines 2249 2256 +7
Branches 360 360
==========================================
+ Hits 1021 1025 +4
- Misses 1082 1085 +3
Partials 146 146 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@neubig Thank you very much! I'll look at merging this as a pre-release tomorrow so we can test if it works on your side. |
|
Great, thank you! |
why: Graham's race-condition fix introduced get_output_format() and parse_output() in neo.py; fetch_objs() had duplicate logic doing the same thing. Functions also lacked the NumPy-style docstrings and doctests required by project standards. what: - Add NumPy-style docstrings with doctests to get_output_format(), parse_output(), and fetch_objs() - Make parse_output() call get_output_format() instead of duplicating the field-list computation - Refactor fetch_objs() to use get_output_format() and parse_output() instead of inline format/parse logic - Cache get_output_format() with @functools.cache (fields are static) - Remove unused formats import from server.py - Simplify format_string = get_output_format()[1] in new_session()
e672929 to
bc33be0
Compare
why: The race condition fix changed new_session() to construct Session from -P output instead of a follow-up list-sessions query, but had no dedicated test verifying session_id and session_name are populated. what: - Add test asserting session_id is not None and session_name matches
bc33be0 to
6d202e0
Compare
|
@neubig (when github is back online): what's the best way for you to test this? Can you pin the branch? Do you need a PyPI release (and would an alpha do?) Any disagreement / opinion on the changes I made in the PR? |
|
Thanks @tony! I can pin the branch and I'm testing at the moment, will try to get this done this week. |
|
@neubig If you have a chance let me know and if you prefer I go for a prerelease on PyPI (if that's easier let me know as well) |
|
Hi @tony, thanks a lot! I was able to test this, and there were two issues (another being the locale settings that were used by pyinstaller). But this solved part of the problem, so I think it's good! |
Fixes: #624
Previously, new_session() would:
This created a race condition in some environments where list-sessions might not see the newly created session yet, causing TmuxObjectDoesNotExist errors.
The fix:
This is more efficient (one fewer subprocess call) and eliminates the race condition by making session creation atomic.
I tested this using the script in the original issue and it works in all three cases: