feat: gpt-5.2 & refactors & tools #107

Junyi-99 · 2026-01-28T09:12:18Z

This pull request introduces several significant improvements and fixes across the chat service, focusing on model support, streaming logic, tool registration, and prompt instructions. The changes expand the list of supported models, enhance the streaming of reasoning and answer content, improve tool availability, and clarify system prompt instructions. Additionally, several bug fixes and refactorings improve consistency in message IDs and tool registration.

Model support and API logic:

Refactored and expanded the supported model list in ListSupportedModels, adding new models (e.g., GPT-5 series, Qwen3, GLM 4.5 Air, o1/o3/o4) and introducing logic to enable or disable models based on whether the user has provided their own OpenAI API key. Models that require a user key are now marked as disabled if the user hasn't configured one.

Streaming and message handling improvements:

Improved streaming logic in ChatCompletionStreamV2 to correctly send a StreamPartBegin event before any content (including reasoning content) is streamed, ensuring proper frontend behavior. Enhanced handling of reasoning content, supporting both reasoning_content and reasoning fields, and passing both reasoning and answer content to the handler. [1] [2] [3]
Standardized assistant message IDs by removing the "openai_" prefix in both V1 and V2 chat message handlers and utilities, ensuring consistency across the system. [1] [2] [3] [4]

Tool registration and project handling:

Enabled and registered additional file and LaTeX tools in the toolkit client, making functions like read_file, list_folder, searchStringFromTheFile, search_file, get_document_structure, locate_section, read_section_source, and read_source_line_range available to the AI client.
Fixed a bug where project instructions could be incorrectly referenced in debug mode, ensuring the correct value is used based on the conversation type. [1] [2] [3]

Prompt and instruction updates:

Updated system prompt templates to add a tool call usage limit section and clarified the required format for revised LaTeX text, instructing the model to wrap revised text in a <PaperDebugger> tag rather than triple backticks. [1] [2]

Dependency and import clean-up:

Added missing imports for file and LaTeX tools, and removed unused imports to clean up the codebase. [1] [2]

#104)

…ls (#106)

Junyi-99 · 2026-01-29T07:04:34Z

@4ndrelim @kah-seng

I’ve implemented a major update and would appreciate your help testing. I’ve done an initial pass, but since this touches core streaming and UI logic, I want to ensure no regressions were missed.

Key Changes to Verify:

New Models: Confirm GPT-5.2 and Reasoning Models work as expected.
Reasoning Flow: Check the new reasoning_chunk handling. It should feel "smooth and elegant"—please flag it if the UI feels clunky during the thinking process.
Markdown & Textpatch: We’ve switched to streamdown. High priority: Please test the copy-paste/insert (textpatch) feature specifically, as the component swap may have affected it.
Tool Use: Verify that basic tool calls trigger and display correctly.
Streaming Logic: Ensure the refactored streaming remains stable under different network speeds.

Please let me know if you find any bugs or if the UX doesn't feel right!

4ndrelim · 2026-01-29T16:09:33Z

@Junyi-99 quickcheck,

Model support and API logic:

Refactored and expanded the supported model list in ListSupportedModels, adding new models (e.g., GPT-5 series, Qwen3, GLM 4.5 Air, o1/o3/o4) and introducing logic to enable or disable models based on whether the user has provided their own OpenAI API key. Models that require a user key are now marked as disabled if the user hasn't configured one.

In client_v2.go, we check for gpt-5-nano even for user-input BYOK API keys. This is fine for OpenRouter since its managed by us but unsure if its ok to assume the model is enabled by default for BYOK. But I believe NewAIClientV2 in client_v2.go seems to only be called during initialization. So users shouldn't encounter this problem, unless they are doing a local deployment right? Should we consider warning them about this assumption for those that wants to self-host?

4ndrelim · 2026-01-29T16:48:19Z

Key Changes to Verify:

New Models: Confirm GPT-5.2 and Reasoning Models work as expected.
✅ (models work) 🟡 (but removing API key still displays the model)

It seems that the unavailable models are not greyed out when user removes BYOK. Also, the model generates a completion, so is the key still being persisted by the backend? Might need to look into completion_v2.go or stream_v2.go to handle BYOK validation.

Below shows the models even after BYOK was removed:

I suspect after BYOK was removed, it is defaulting to OpenRouter gpt-4.1. Any selection of model afterwards will not apply properly. e.g. selecting gpt-5.1 reasoning model does not trigger reasoning.

Reasoning Flow: Check the new reasoning_chunk handling. It should feel "smooth and elegant"—please flag it if the UI feels clunky during the thinking process.
✅ seems ok for now but will test once the above is resolved. i currently can't trigger the reasoning model likely due to some caching of BYOK key or default model applied irrespective of selection.

Markdown & Textpatch: We’ve switched to streamdown. High priority: Please test the copy-paste/insert (textpatch) feature specifically, as the component swap may have affected it.
❌

textdiff no longer works. insert is still ok but foregoing textdiff might not be desirable for users that value visibility.

Tool Use: Verify that basic tool calls trigger and display correctly.
✅ (tool calling) 🟡 (markdown display)

Strangely, generate_citations isn't calling the proper frontend-card. The tool runs properly. I'll investigate this to see if its due to an underlying existing issue.

Streaming Logic: Ensure the refactored streaming remains stable under different network speeds.
[STILL TESTING]

Additional

Interestingly, listing tools also list GPT-5's inherent tooling. Your new LaTeX file operation tools are also nicely captured:

There are available tools for search_file, read_file but no explicit tool for write_file. But PD should be able to allow textpatch as existing behavior no, even without an explicit tool coded? or is this only available for main file?

Also, unsure if this is intended: But in the Reasoning drop down, it seems the text is very faded near the ends, especially the bottom line. Maybe add a newline / some padding for slightly clearer display?

4ndrelim

I'll continue testing, but these are all I have for now.

Great job for such a huge undertaking! The new file operations tools will present a more integrated environment and new models support offers great diversity.

4ndrelim · 2026-01-29T16:23:04Z

webapp/_webapp/src/components/markdown.tsx

-  );
-
-  return <Markdown options={markdownOptions}>{children}</Markdown>;
+const MarkdownComponent = memo(({ children, animated }: MarkdownComponentProps) => {


I'm unsure if this is intended but it seems now a newline is forced after every numbered entry:

Might be unrelated to this PR, but i noticed the embedded side bar messes up the markdown display:

4ndrelim · 2026-01-29T16:55:42Z

internal/services/system_prompt_debug.tmpl

 You are PaperDebugger, a large language model tweaked by PaperDebugger Inc.

+## tool_call_limit
+You have a maximum of 20 tool calls per conversation turn. Please plan your tool usage carefully and avoid unnecessary tool calls.


We might have to handle context better for weaker models or find a way to manage the cancellation of tool calls. Below shows gpt-4.1 calling the wrong tool, and then entering a frenzy loop as it repeatedly calls read_file (likely to try reading the full file). During this process, it is not possible to cancel it. Even creating a new conversation, the read_file tool operation persists (which should be cancelled).

edit: it seems the wrong tool calling is partially caused by staging environment not deployed with XtraMCP due to dns_rebinding issue. I will push a commit to staging to update this. But the issue of tool cancellation remains.

As for the weaker models, they may trap into an infinite tool call. We have to mitigate this by

refining our system prompt

limiting the consecutive tool calling in the backend.

I think option 2 is more reliable.

4ndrelim · 2026-01-29T17:48:46Z

webapp/_webapp/src/types/message.ts

@@ -0,0 +1,299 @@
+/**


In the past, there used to be some filler words like "thinking.." or "preparing.." as the backend prepares a response and streams. im not sure if this is removed, as there seems to be an awkward pause now:

Yes, it's an awkward pause..... It's very unstable to reproduce.

Junyi-99 · 2026-01-29T20:51:47Z

@Junyi-99 quickcheck,

Model support and API logic:

Refactored and expanded the supported model list in ListSupportedModels, adding new models (e.g., GPT-5 series, Qwen3, GLM 4.5 Air, o1/o3/o4) and introducing logic to enable or disable models based on whether the user has provided their own OpenAI API key. Models that require a user key are now marked as disabled if the user hasn't configured one.

In client_v2.go, we check for gpt-5-nano even for user-input BYOK API keys. This is fine for OpenRouter since its managed by us but unsure if its ok to assume the model is enabled by default for BYOK. But I believe NewAIClientV2 in client_v2.go seems to only be called during initialization. So users shouldn't encounter this problem, unless they are doing a local deployment right? Should we consider warning them about this assumption for those that wants to self-host?

@4ndrelim You mentioned a good point. BYOK is not affecting the client_v2.go because each BYOK request will generate a new OpenAI client via GetOpenAIClient, and the NewAIClientV2 function is only used when we start the backend.

Yes, we should consider warning those who want to self-host about this assumption. We can create a new issue to warn them, but I suggest not considering this in this pull request.

into staging

…mponents

Junyi-99 · 2026-01-29T22:43:20Z

@4ndrelim Thank you for your comprehensive test.

These bugs are confirmed and fixed:

Unsetting BYOK not greyed out models 5ea58f1

The greyed-out issue was fixed by updating the supported-models React Query cache when the API key is saved (queryClient.setQueryData in setting-text-input.tsx); the key persistence and selection behaviour likely need backend checks in completion_v2.go / stream_v2.go for BYOK validation and model routing.

No tools for write_file

Yes, because implementing the write_file tool requires a huge refactor of both backend and frontend. So I just skip this tool.

Faded Bottom 262f6b7

Yes, the reasoning dropdown is intentionally designed to fade in near the edges, which indeed affects the readability. This feature has been removed because it will confuse user.

embedded side bar messes up the markdown display 262f6b7

Yes, confirmed. This bug occurs because we currently do not support Dark Mode. This issue only happens when you enable Dark Mode, as the text converts to white, making it impossible to see against the white background of the paper debugger.

I think we can add Dark Mode support; it should be relatively easy.

These bugs are not confirmed:

TextPatch diff

Generate Citations Tool Card

Junyi-99 added 4 commits January 27, 2026 02:58

refactor: update styles and prompt selection logic (#103)

37218d9

feat: initialize projectInstructions variable in conversation prepara… (

d9dbc20

#104)

refactor: update MessageId handling to remove prefix (#105)

894532a

feat: gpt-5.2, refactored streaming, improved md rendering, basic too…

50a5f0b

…ls (#106)

Junyi-99 marked this pull request as ready for review January 29, 2026 06:50

Junyi-99 requested review from 4ndrelim and kah-seng January 29, 2026 06:50

Update xtramcp tag

3ffbb5e

4ndrelim reviewed Jan 29, 2026

View reviewed changes

Junyi-99 added 4 commits January 30, 2026 05:39

fix: update openaiApiKey handling to refresh model list and cache (#107)

5ea58f1

Merge branch 'staging' of https://github.com/PaperDebugger/paperdebugger

d83983b

into staging

refactor: remove gradient handling logic from GeneralToolCard component

74f44d5

feat: implement dark mode support and theme synchronization across co…

262f6b7

…mponents

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: gpt-5.2 & refactors & tools #107

feat: gpt-5.2 & refactors & tools #107

Junyi-99 commented Jan 28, 2026

Uh oh!

Junyi-99 commented Jan 29, 2026

Uh oh!

4ndrelim commented Jan 29, 2026 •

edited

Loading

Uh oh!

4ndrelim commented Jan 29, 2026 •

edited

Loading

Uh oh!

4ndrelim left a comment

Uh oh!

4ndrelim Jan 29, 2026

Uh oh!

4ndrelim Jan 29, 2026

Uh oh!

4ndrelim Jan 29, 2026

Uh oh!

Junyi-99 Jan 29, 2026

Uh oh!

4ndrelim Jan 29, 2026

Uh oh!

Junyi-99 Jan 29, 2026 •

edited

Loading

Uh oh!

Junyi-99 commented Jan 29, 2026

Uh oh!

Junyi-99 commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: gpt-5.2 & refactors & tools #107

Are you sure you want to change the base?

feat: gpt-5.2 & refactors & tools #107

Conversation

Junyi-99 commented Jan 28, 2026

Uh oh!

Junyi-99 commented Jan 29, 2026

Uh oh!

4ndrelim commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

4ndrelim commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Additional

Uh oh!

4ndrelim left a comment

Choose a reason for hiding this comment

Uh oh!

4ndrelim Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

4ndrelim Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

4ndrelim Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Junyi-99 Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

4ndrelim Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Junyi-99 Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Junyi-99 commented Jan 29, 2026

Uh oh!

Junyi-99 commented Jan 29, 2026

These bugs are confirmed and fixed:

These bugs are not confirmed:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

4ndrelim commented Jan 29, 2026 •

edited

Loading

4ndrelim commented Jan 29, 2026 •

edited

Loading

Junyi-99 Jan 29, 2026 •

edited

Loading