Skip to content

Epoll#116

Merged
sgerbino merged 1 commit intocppalliance:developfrom
sgerbino:pr/epoll
Feb 10, 2026
Merged

Epoll#116
sgerbino merged 1 commit intocppalliance:developfrom
sgerbino:pr/epoll

Conversation

@sgerbino
Copy link
Collaborator

@sgerbino sgerbino commented Feb 9, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of non-blocking socket readiness and cancellation to reduce missed or delayed I/O completions and ensure more reliable error propagation.
  • Chores

    • Consolidated internal event-processing and operation-routing logic to improve stability, predictability, and efficiency of asynchronous socket operations.

@coderabbitai
Copy link

coderabbitai bot commented Feb 9, 2026

📝 Walkthrough

Walkthrough

Consolidates epoll readiness and op handling: descriptor mutex is acquired unconditionally and connect/read/write ops are processed or reassigned under the guard; EAGAIN/EWOULDBLOCK is normalized, cancellations post/finish deterministically, and completed ops are moved to a local queue for scheduling.

Changes

Cohort / File(s) Summary
Epoll socket I/O
src/corosio/src/detail/epoll/sockets.cpp
Replaced per-operation perform_now checks with a single mutex-guarded path that handles readiness, EAGAIN/EWOULDBLOCK normalization, cancellation, and reassignment of connect/read/write ops in-place.
Epoll acceptor
src/corosio/src/detail/epoll/acceptors.cpp
Removed split perform_now logic: acquire descriptor mutex, inline perform_io when read_ready, handle EAGAIN/cancellation explicitly, and store or post read_op under the guard.
Epoll scheduler / descriptor state
src/corosio/src/detail/epoll/scheduler.cpp
Consolidated EPOLLIN/EPOLLOUT/error paths to process read_op, connect_op, and write_op under the mutex; normalize retries, drain/complete ops on error, and move completed ops to a local queue via pointer exchanges.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client as Client / Socket
participant Descriptor as descriptor_state
participant Scheduler as Scheduler / Reactor
participant LocalOps as LocalOps Queue
Client->>Descriptor: epoll event (EPOLLIN / EPOLLOUT / ERR)
Descriptor->>Descriptor: lock mutex; inspect read_op/write_op/connect_op and ready flags
alt op exists and ready
Descriptor->>Client: call op.perform_io()
Client-->>Descriptor: result (ok / EAGAIN / error / cancelled)
alt EAGAIN
Descriptor->>Descriptor: normalize errno; set *_ready or reassign op pointer
else cancelled or error
Descriptor->>LocalOps: exchange pointer → enqueue (post/finish)
else ok
Descriptor->>LocalOps: move completed op to queue
end
else no op ready
Descriptor->>Descriptor: set read_ready/write_ready
end
Descriptor->>Scheduler: unlock mutex
Scheduler->>LocalOps: schedule/dispatch enqueued ops

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 In a burrow of sockets I hop and spin,
I grab the mutex, tidy ops within,
If EAGAIN nudges, I pause and then queue,
Cancel or finish — I post it through,
Hopping events forward, neat and trim 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The pull request title 'Epoll' is too vague and does not clearly convey the main objective of the changes, which is to fix a lost-wakeup window in epoll event handling. Consider a more descriptive title such as 'Fix lost-wakeup window in epoll edge-triggered mode' or 'Add edge-triggered retry loop to epoll perform_now paths' to better reflect the actual changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cppalliance-bot
Copy link

cppalliance-bot commented Feb 9, 2026

An automated preview of the documentation is available at https://116.corosio.prtest3.cppalliance.org/index.html

If more commits are pushed to the pull request, the docs will rebuild at the same URL.

2026-02-10 01:51:17 UTC

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/corosio/src/detail/epoll/sockets.cpp`:
- Around line 376-395: The do_write_io retry loop misses a cancellation check
after it re-parks the operation into desc_state_.write_op: after the block that
sets desc_state_.write_ready = false / desc_state_.write_op = &op and breaks,
add the same cancellation-check performed elsewhere (check op.cancelled or
whatever cancellation flag the code uses, and if cancelled call svc_.post(&op)
and svc_.work_finished() then return) so we don't skip handling a cancellation
when we re-park the write; locate the logic around op.perform_io(), op.errn,
svc_.post, svc_.work_finished, and desc_state_.write_op in do_write_io and
insert the check immediately after re-parking/breaking from the retry loop.
- Around line 287-306: The read retry loop that calls op.perform_io() and then
parks the operation by setting desc_state_.read_op = &op lacks the same
cancellation check used in the non-perform_now path; before returning after
breaking out of that loop (i.e., after desc_state_.read_op is set and control
would hit the return), check op.cancelled and if true, clear
desc_state_.read_op, call svc_.post(&op) and svc_.work_finished() (matching the
cancellation handling used elsewhere), then return; update the logic around the
loop that sets desc_state_.read_op to mirror the cancel-handling in the other
path so cancelled ops are completed immediately.
- Around line 187-206: After the loop that sets desc_state_.connect_op = &op and
breaks, add the same cancellation re-check used in the non-perform_now path:
after parking the op (when op.errn was EAGAIN/EWOULDBLOCK and
desc_state_.write_ready was false) check op.cancelled under the same
locking/claiming logic and, if cancelled, clear desc_state_.connect_op, post the
op to svc_ and call svc_.work_finished() (or otherwise reclaim/complete the
operation) before returning std::noop_coroutine(); this mirrors the cancel()
claim semantics and prevents stuck ops—look for the connect handling around
op.perform_io(), desc_state_.connect_op and the cancel()/op.cancelled logic to
copy the behavior exactly.
🧹 Nitpick comments (1)
src/corosio/src/detail/epoll/sockets.cpp (1)

187-205: Consider extracting the duplicated retry-loop logic into a helper.

All three retry loops (connect, read, write) follow an identical structure: for(;;)perform_io() → check errn → post/finish or re-check readiness → park or continue. The only differences are the readiness flag name (write_ready/read_ready) and the op slot (connect_op/read_op/write_op). A small template or lambda helper taking the flag and slot as parameters would eliminate the triplication and reduce the risk of one path diverging from the others (as demonstrated by the cancellation gap above).

Also applies to: 287-305, 376-394

@cppalliance-bot
Copy link

cppalliance-bot commented Feb 9, 2026

GCOVR code coverage report https://116.corosio.prtest3.cppalliance.org/gcovr/index.html
LCOV code coverage report https://116.corosio.prtest3.cppalliance.org/genhtml/index.html
Coverage Diff Report https://116.corosio.prtest3.cppalliance.org/diff-report/index.html

Build time: 2026-02-10 01:54:13 UTC

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/corosio/src/detail/epoll/scheduler.cpp`:
- Around line 238-255: The error-only drain currently runs only when err && !(ev
& (EPOLLIN | EPOLLOUT)) which can leave write/connect ops stranded if EPOLLERR
arrives with partial direction flags; change the logic that handles err to
unconditionally drain and complete any pending read_op, write_op, and connect_op
when err is set (i.e., check if (err) and for each of read_op, write_op,
connect_op call complete(err, 0) and push via std::exchange as done today), so
all directions are cleared defensively while preserving the existing completion
semantics and use of local_ops.

Move perform_io() calls inside the per-descriptor mutex in both
the scheduler dispatch (descriptor_state::operator()()) and the
initiator perform-now paths (connect, do_read_io, do_write_io,
accept). This eliminates race windows by construction, removing
the 3-phase lock-claim/unlock-IO/lock-repark pattern, the
perform_now boolean, and separate post-park cancel re-checks.
@sgerbino sgerbino changed the title Add edge-triggered retry loop to epoll perform_now paths Epoll Feb 10, 2026
@codecov
Copy link

codecov bot commented Feb 10, 2026

Codecov Report

❌ Patch coverage is 45.00000% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.14%. Comparing base (b058adb) to head (6922bdb).
⚠️ Report is 7 commits behind head on develop.

Files with missing lines Patch % Lines
src/corosio/src/detail/epoll/sockets.cpp 33.33% 22 Missing ⚠️
src/corosio/src/detail/epoll/scheduler.cpp 58.33% 15 Missing ⚠️
src/corosio/src/detail/epoll/acceptors.cpp 36.36% 7 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #116      +/-   ##
===========================================
+ Coverage    79.70%   80.14%   +0.43%     
===========================================
  Files           65       65              
  Lines         5661     5640      -21     
===========================================
+ Hits          4512     4520       +8     
+ Misses        1149     1120      -29     
Files with missing lines Coverage Δ
src/corosio/src/detail/epoll/acceptors.cpp 81.85% <36.36%> (+2.18%) ⬆️
src/corosio/src/detail/epoll/scheduler.cpp 80.86% <58.33%> (+0.66%) ⬆️
src/corosio/src/detail/epoll/sockets.cpp 71.46% <33.33%> (+2.60%) ⬆️

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b058adb...6922bdb. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sgerbino sgerbino merged commit ad5dd27 into cppalliance:develop Feb 10, 2026
17 checks passed
@sgerbino sgerbino deleted the pr/epoll branch February 10, 2026 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants