-
Notifications
You must be signed in to change notification settings - Fork 14.2k
server: prevent data race from HTTP threads #18263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Stress testing to reproduce the race condition with and without this PRNo PR-merged, results :The script seems to be working; perhaps a little too heavy-handed. we'll see what happens with the PR! |
With this PRI'm surprised by the improvement because I really overloaded the server with this LLM-made script. I think I need to narrow down, but it's a good proof the PR make way more reliable ! There is some strange behavior remaining like HTTP 000 error : |
|
I try to narrow down on HTTP 000 case |
|
quite interesting, I think this script can be useful to test changes related to batching too btw, looking at your report "No PR results", I suppose the 503 error was because the test don't wait until server starts, right? |
|
The HTTP 000 case has a timeout of exactly 10 seconds which seems a quite suspicious, probably |
The server was already started. I think the 503 errors in "No PR results" are the data race your PR fixes! |
So yes, the HTTP 000 at 10s is definitely curl timeout with error code defaulted to 000. The HTTP 000 at 2s is I think reverse proxy timeout |
|
Better script : |
|
I've already restarted the Windows runner. I'll have to test it on my Windows machine! I try a server-queue.cpp/h patch |
If that was for the |
|
prompt processing progress, n_tokens = 1355, batch.n_tokens = 1355, progress = 1.000000 from my smartphone. i check tomorrow morning |
|
I spotted some more bugs on the way, fixed in the last commit(s):
Edit: hmm, LoRA endpoint can also cause data race the same way as |
|
So turns out, the endpoints for lora hot-swap can also cause data race as it reads data directly off |
|
Compared to what I did from the phone (where the server wasn't working), after theses 5 commit this time everything works and the last DoS script no longer displays an HTTP 000 error, and server recovers after a while. It seems more robust |
Fix #18234
This includes quite many changes, but on a high level:
server_context_implclass member is now highly restricted. Only some pointers likevocab, model, mctxare exposed.server_context_meta. This is to prevent any accesses to non-thread-safe data insideserver_context_implserver_routes, the HTTP can only access some pointers likevocab, model, mctx. Any other data MUST be passed throughserver_context_metaAs a consequence:
/modelsand/v1/modelscan no longer be accessed during model loading. It is NOT thread-safe and can potentially cause data race/modelsand/v1/modelscan now be accessed during server sleeping. This is because it no longer accessesserver_context_impldirectlyAlso include some other fixes described in #18263 (comment) to make things safer
cc @ServeurpersoCom would appreciate if you can do some testing, thanks!