-
Notifications
You must be signed in to change notification settings - Fork 2k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
There are two bugs.
- Losing the event SchedulerServerEvent::JobSubmitted results in job no longer to be scheduled
- Concurrency issue of updating ExecutorData simultaneously.
-
For the first bug:
In the method of SchedulerServerEventAction::offer_resources, the returned available_executors may be all with 0 available_task_slots. In this case, there'll be no tasks to be scheduled for the job and no SchedulerServerEvent::JobSubmitted will be resent to the channel. As a result, the job will get stuck. -
For the second bug:
The operations of get_executor_data and save_executor_data are not atomic, which may result in concurrency issue.
To Reproduce
Run loading test with Push-based task scheduling policy as described in #1983.
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working