Skip to content

Conversation

@Junyi-99
Copy link
Member

This pull request introduces several significant improvements and fixes across the chat service, focusing on model support, streaming logic, tool registration, and prompt instructions. The changes expand the list of supported models, enhance the streaming of reasoning and answer content, improve tool availability, and clarify system prompt instructions. Additionally, several bug fixes and refactorings improve consistency in message IDs and tool registration.

Model support and API logic:

  • Refactored and expanded the supported model list in ListSupportedModels, adding new models (e.g., GPT-5 series, Qwen3, GLM 4.5 Air, o1/o3/o4) and introducing logic to enable or disable models based on whether the user has provided their own OpenAI API key. Models that require a user key are now marked as disabled if the user hasn't configured one.

Streaming and message handling improvements:

  • Improved streaming logic in ChatCompletionStreamV2 to correctly send a StreamPartBegin event before any content (including reasoning content) is streamed, ensuring proper frontend behavior. Enhanced handling of reasoning content, supporting both reasoning_content and reasoning fields, and passing both reasoning and answer content to the handler. [1] [2] [3]
  • Standardized assistant message IDs by removing the "openai_" prefix in both V1 and V2 chat message handlers and utilities, ensuring consistency across the system. [1] [2] [3] [4]

Tool registration and project handling:

  • Enabled and registered additional file and LaTeX tools in the toolkit client, making functions like read_file, list_folder, searchStringFromTheFile, search_file, get_document_structure, locate_section, read_section_source, and read_source_line_range available to the AI client.
  • Fixed a bug where project instructions could be incorrectly referenced in debug mode, ensuring the correct value is used based on the conversation type. [1] [2] [3]

Prompt and instruction updates:

  • Updated system prompt templates to add a tool call usage limit section and clarified the required format for revised LaTeX text, instructing the model to wrap revised text in a <PaperDebugger> tag rather than triple backticks. [1] [2]

Dependency and import clean-up:

  • Added missing imports for file and LaTeX tools, and removed unused imports to clean up the codebase. [1] [2]

@Junyi-99 Junyi-99 marked this pull request as ready for review January 29, 2026 06:50
@Junyi-99 Junyi-99 requested review from 4ndrelim and kah-seng January 29, 2026 06:50
@Junyi-99
Copy link
Member Author

@4ndrelim @kah-seng

I’ve implemented a major update and would appreciate your help testing. I’ve done an initial pass, but since this touches core streaming and UI logic, I want to ensure no regressions were missed.

Key Changes to Verify:

  • New Models: Confirm GPT-5.2 and Reasoning Models work as expected.
  • Reasoning Flow: Check the new reasoning_chunk handling. It should feel "smooth and elegant"—please flag it if the UI feels clunky during the thinking process.
  • Markdown & Textpatch: We’ve switched to streamdown. High priority: Please test the copy-paste/insert (textpatch) feature specifically, as the component swap may have affected it.
  • Tool Use: Verify that basic tool calls trigger and display correctly.
  • Streaming Logic: Ensure the refactored streaming remains stable under different network speeds.

Please let me know if you find any bugs or if the UX doesn't feel right!

@4ndrelim
Copy link
Member

4ndrelim commented Jan 29, 2026

@Junyi-99 quickcheck,

Model support and API logic:

  • Refactored and expanded the supported model list in ListSupportedModels, adding new models (e.g., GPT-5 series, Qwen3, GLM 4.5 Air, o1/o3/o4) and introducing logic to enable or disable models based on whether the user has provided their own OpenAI API key. Models that require a user key are now marked as disabled if the user hasn't configured one.

In client_v2.go, we check for gpt-5-nano even for user-input BYOK API keys. This is fine for OpenRouter since its managed by us but unsure if its ok to assume the model is enabled by default for BYOK. But I believe NewAIClientV2 in client_v2.go seems to only be called during initialization. So users shouldn't encounter this problem, unless they are doing a local deployment right? Should we consider warning them about this assumption for those that wants to self-host?

@4ndrelim
Copy link
Member

4ndrelim commented Jan 29, 2026

Key Changes to Verify:

  • New Models: Confirm GPT-5.2 and Reasoning Models work as expected.
    ✅ (models work) 🟡 (but removing API key still displays the model)

It seems that the unavailable models are not greyed out when user removes BYOK. Also, the model generates a completion, so is the key still being persisted by the backend? Might need to look into completion_v2.go or stream_v2.go to handle BYOK validation.

Below shows the models even after BYOK was removed:

Screenshot 2026-01-30 at 1 15 47 AM

I suspect after BYOK was removed, it is defaulting to OpenRouter gpt-4.1. Any selection of model afterwards will not apply properly. e.g. selecting gpt-5.1 reasoning model does not trigger reasoning.


  • Reasoning Flow: Check the new reasoning_chunk handling. It should feel "smooth and elegant"—please flag it if the UI feels clunky during the thinking process.
    ✅ seems ok for now but will test once the above is resolved. i currently can't trigger the reasoning model likely due to some caching of BYOK key or default model applied irrespective of selection.

  • Markdown & Textpatch: We’ve switched to streamdown. High priority: Please test the copy-paste/insert (textpatch) feature specifically, as the component swap may have affected it.

textdiff no longer works. insert is still ok but foregoing textdiff might not be desirable for users that value visibility.

Screenshot 2026-01-30 at 1 36 00 AM
  • Tool Use: Verify that basic tool calls trigger and display correctly.
    ✅ (tool calling) 🟡 (markdown display)

Strangely, generate_citations isn't calling the proper frontend-card. The tool runs properly. I'll investigate this to see if its due to an underlying existing issue.
Screenshot 2026-01-30 at 1 31 39 AM


  • Streaming Logic: Ensure the refactored streaming remains stable under different network speeds.
    [STILL TESTING]



Additional

Interestingly, listing tools also list GPT-5's inherent tooling. Your new LaTeX file operation tools are also nicely captured:

Screenshot 2026-01-30 at 12 35 34 AM

There are available tools for search_file, read_file but no explicit tool for write_file. But PD should be able to allow textpatch as existing behavior no, even without an explicit tool coded? or is this only available for main file?

Screenshot 2026-01-30 at 12 46 00 AM

Also, unsure if this is intended: But in the Reasoning drop down, it seems the text is very faded near the ends, especially the bottom line. Maybe add a newline / some padding for slightly clearer display?

Copy link
Member

@4ndrelim 4ndrelim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll continue testing, but these are all I have for now.

Great job for such a huge undertaking! The new file operations tools will present a more integrated environment and new models support offers great diversity.

);

return <Markdown options={markdownOptions}>{children}</Markdown>;
const MarkdownComponent = memo(({ children, animated }: MarkdownComponentProps) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if this is intended but it seems now a newline is forced after every numbered entry:

Image

Might be unrelated to this PR, but i noticed the embedded side bar messes up the markdown display:
Screenshot 2026-01-30 at 1 27 07 AM

You are PaperDebugger, a large language model tweaked by PaperDebugger Inc.

## tool_call_limit
You have a maximum of 20 tool calls per conversation turn. Please plan your tool usage carefully and avoid unnecessary tool calls.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have to handle context better for weaker models or find a way to manage the cancellation of tool calls. Below shows gpt-4.1 calling the wrong tool, and then entering a frenzy loop as it repeatedly calls read_file (likely to try reading the full file). During this process, it is not possible to cancel it. Even creating a new conversation, the read_file tool operation persists (which should be cancelled).

Image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: it seems the wrong tool calling is partially caused by staging environment not deployed with XtraMCP due to dns_rebinding issue. I will push a commit to staging to update this. But the issue of tool cancellation remains.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the weaker models, they may trap into an infinite tool call. We have to mitigate this by

  1. refining our system prompt
  2. limiting the consecutive tool calling in the backend.

I think option 2 is more reliable.

@@ -0,0 +1,299 @@
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past, there used to be some filler words like "thinking.." or "preparing.." as the backend prepares a response and streams. im not sure if this is removed, as there seems to be an awkward pause now:

Image

Copy link
Member Author

@Junyi-99 Junyi-99 Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's an awkward pause..... It's very unstable to reproduce.

@Junyi-99
Copy link
Member Author

@Junyi-99 quickcheck,

Model support and API logic:

  • Refactored and expanded the supported model list in ListSupportedModels, adding new models (e.g., GPT-5 series, Qwen3, GLM 4.5 Air, o1/o3/o4) and introducing logic to enable or disable models based on whether the user has provided their own OpenAI API key. Models that require a user key are now marked as disabled if the user hasn't configured one.

In client_v2.go, we check for gpt-5-nano even for user-input BYOK API keys. This is fine for OpenRouter since its managed by us but unsure if its ok to assume the model is enabled by default for BYOK. But I believe NewAIClientV2 in client_v2.go seems to only be called during initialization. So users shouldn't encounter this problem, unless they are doing a local deployment right? Should we consider warning them about this assumption for those that wants to self-host?

@4ndrelim You mentioned a good point. BYOK is not affecting the client_v2.go because each BYOK request will generate a new OpenAI client via GetOpenAIClient, and the NewAIClientV2 function is only used when we start the backend.

Yes, we should consider warning those who want to self-host about this assumption. We can create a new issue to warn them, but I suggest not considering this in this pull request.

@Junyi-99
Copy link
Member Author

@4ndrelim Thank you for your comprehensive test.

These bugs are confirmed and fixed:

Unsetting BYOK not greyed out models 5ea58f1

The greyed-out issue was fixed by updating the supported-models React Query cache when the API key is saved (queryClient.setQueryData in setting-text-input.tsx); the key persistence and selection behaviour likely need backend checks in completion_v2.go / stream_v2.go for BYOK validation and model routing.

No tools for write_file

Yes, because implementing the write_file tool requires a huge refactor of both backend and frontend. So I just skip this tool.

Faded Bottom 262f6b7

Yes, the reasoning dropdown is intentionally designed to fade in near the edges, which indeed affects the readability. This feature has been removed because it will confuse user.

embedded side bar messes up the markdown display 262f6b7

Yes, confirmed. This bug occurs because we currently do not support Dark Mode. This issue only happens when you enable Dark Mode, as the text converts to white, making it impossible to see against the white background of the paper debugger.

I think we can add Dark Mode support; it should be relatively easy.

These bugs are not confirmed:

TextPatch diff
image

Generate Citations Tool Card

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants