Skip to content

TASK-209191 block global trace while inheriting thread data#12

Open
balupillai wants to merge 4 commits intomainfrom
TASK-209191
Open

TASK-209191 block global trace while inheriting thread data#12
balupillai wants to merge 4 commits intomainfrom
TASK-209191

Conversation

@balupillai
Copy link
Contributor

@balupillai balupillai commented Mar 4, 2026

Note

Medium Risk
Touches low-level coroutine lifecycle and concurrent GC/heap transfer logic; incorrect refcounting or lock ordering could cause leaks, deadlocks, or crashes under concurrency.

Overview
Prevents garbage-collection races around thread suspension and reclaimed-thread heap transfer.

lua_suspend now increments the coroutine GC refcount and lua_resume decrements it when resuming from LUA_SUSPEND, pinning suspended coroutines until they are resumed.

luaC_inherit_thread now takes the target thread lock and blocks the collector while moving objects/heaps, ensuring global trace cannot run concurrently with heap inheritance and avoiding lock contention/corruption during transfer.

Written by Cursor Bugbot for commit 13ee876. This will update automatically on new commits. Configure here.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent the global tracing GC from running while a thread’s heap/object data is being inherited into another thread, by adding an explicit collector barrier around the inheritance operation.

Changes:

  • Add per-OS-thread state lookup in luaC_inherit_thread.
  • Block the collector before inheriting a thread’s heap contents and unblock afterward.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

GCheader *steal, *tmp;
thr_State *pt;

lua_lock(th);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-locking th causes deadlock with existing callers

High Severity

The newly added lua_lock(th) inside luaC_inherit_thread double-locks th because at least two call sites already hold that lock. In reclaim_object (lgc.c), the caller does lua_lock(th) before calling luaC_inherit_thread(L, th). In lua_delrefthread (lapi.c), lua_lock(L) is called on the source thread which becomes the th parameter. If lua_lock is a non-recursive mutex, this will deadlock.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

} LUAI_TRY_END(L);
if (was_suspended) {
ck_pr_dec_32(&L->gch.ref);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref count decremented on failed resume while status unchanged

High Severity

When lua_resume is called on a suspended coroutine but fails with "C stack overflow," resume_error does not change L->status (it stays LUA_SUSPEND), yet ck_pr_dec_32(&L->gch.ref) still executes because was_suspended is true. On a subsequent retry of lua_resume, was_suspended is true again, causing another decrement — but the ref was only incremented once in lua_suspend. This ref count underflow can lead to premature GC collection of a still-live coroutine.

Additional Locations (1)
Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants