-
Notifications
You must be signed in to change notification settings - Fork 63
feat: gpt-5.2 & refactors & tools #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I’ve implemented a major update and would appreciate your help testing. I’ve done an initial pass, but since this touches core streaming and UI logic, I want to ensure no regressions were missed. Key Changes to Verify:
Please let me know if you find any bugs or if the UX doesn't feel right! |
|
@Junyi-99 quickcheck,
In |
4ndrelim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll continue testing, but these are all I have for now.
Great job for such a huge undertaking! The new file operations tools will present a more integrated environment and new models support offers great diversity.
| ); | ||
|
|
||
| return <Markdown options={markdownOptions}>{children}</Markdown>; | ||
| const MarkdownComponent = memo(({ children, animated }: MarkdownComponentProps) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| You are PaperDebugger, a large language model tweaked by PaperDebugger Inc. | ||
|
|
||
| ## tool_call_limit | ||
| You have a maximum of 20 tool calls per conversation turn. Please plan your tool usage carefully and avoid unnecessary tool calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might have to handle context better for weaker models or find a way to manage the cancellation of tool calls. Below shows gpt-4.1 calling the wrong tool, and then entering a frenzy loop as it repeatedly calls read_file (likely to try reading the full file). During this process, it is not possible to cancel it. Even creating a new conversation, the read_file tool operation persists (which should be cancelled).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edit: it seems the wrong tool calling is partially caused by staging environment not deployed with XtraMCP due to dns_rebinding issue. I will push a commit to staging to update this. But the issue of tool cancellation remains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the weaker models, they may trap into an infinite tool call. We have to mitigate this by
- refining our system prompt
- limiting the consecutive tool calling in the backend.
I think option 2 is more reliable.
| @@ -0,0 +1,299 @@ | |||
| /** | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's an awkward pause..... It's very unstable to reproduce.
@4ndrelim You mentioned a good point. BYOK is not affecting the Yes, we should consider warning those who want to self-host about this assumption. We can create a new issue to warn them, but I suggest not considering this in this pull request. |
|
@4ndrelim Thank you for your comprehensive test. These bugs are confirmed and fixed:Unsetting BYOK not greyed out models 5ea58f1 The greyed-out issue was fixed by updating the supported-models React Query cache when the API key is saved ( No tools for Yes, because implementing the Faded Bottom 262f6b7 Yes, the reasoning dropdown is intentionally designed to fade in near the edges, which indeed affects the readability. This feature has been removed because it will confuse user. embedded side bar messes up the markdown display 262f6b7 Yes, confirmed. This bug occurs because we currently do not support Dark Mode. This issue only happens when you enable Dark Mode, as the text converts to white, making it impossible to see against the white background of the paper debugger. I think we can add Dark Mode support; it should be relatively easy. These bugs are not confirmed:Generate Citations Tool Card
|










This pull request introduces several significant improvements and fixes across the chat service, focusing on model support, streaming logic, tool registration, and prompt instructions. The changes expand the list of supported models, enhance the streaming of reasoning and answer content, improve tool availability, and clarify system prompt instructions. Additionally, several bug fixes and refactorings improve consistency in message IDs and tool registration.
Model support and API logic:
ListSupportedModels, adding new models (e.g., GPT-5 series, Qwen3, GLM 4.5 Air, o1/o3/o4) and introducing logic to enable or disable models based on whether the user has provided their own OpenAI API key. Models that require a user key are now marked as disabled if the user hasn't configured one.Streaming and message handling improvements:
ChatCompletionStreamV2to correctly send aStreamPartBeginevent before any content (including reasoning content) is streamed, ensuring proper frontend behavior. Enhanced handling of reasoning content, supporting bothreasoning_contentandreasoningfields, and passing both reasoning and answer content to the handler. [1] [2] [3]"openai_"prefix in both V1 and V2 chat message handlers and utilities, ensuring consistency across the system. [1] [2] [3] [4]Tool registration and project handling:
read_file,list_folder,searchStringFromTheFile,search_file,get_document_structure,locate_section,read_section_source, andread_source_line_rangeavailable to the AI client.Prompt and instruction updates:
<PaperDebugger>tag rather than triple backticks. [1] [2]Dependency and import clean-up: