Skip to content

Conversation

@brendan-kellam
Copy link
Contributor

@brendan-kellam brendan-kellam commented Jan 27, 2026

This PR grew larger than it should of. It does a lot of things (see below). The two biggest things it does are the following:

  1. It implements server side offset pagination for /api/repos and /api/commits.
  2. It improves on the MCP server with better prompts, and takes advantage of server side pagination for more efficient retrieval of repos and commits.

In a follow-up PR, I will take some time to formalize our REST api into a openapi spec, which will allow us to generate some docs + (hopefully) allow us to more easily share types between MCP and the main package.

Sourcebot

Added

  • Added offset pagination to the /api/repos endpoint. #795
  • Added offset pagination to the /api/commits endpoint. #795

Changed

  • Made the code search lang: filter case insensitive. #795
  • Changed the /api/source endpoint from a POST request to a GET request. Repo, path, and ref are specified as query params. #795
  • Changed the /api/commits endpoint to be a POST request to a GET request. #795
  • Renamed webUrl to externalWebUrl for various apis. Moving forward, webUrl will be used for URLs that point to Sourcebot, and externalWebUrl will be used for URLs that point to external code hosts. #795
  • Renamed various fields on the /api/source endpoint response body. #795

@sourcebot/mcp

Added

  • Added server side pagination support for list_commits and list_repos. #795
  • Added filterByFilepaths and useRegex params to the search_code tool. #795

Changed

  • Renamed search_commits tool to list_commits. #795
  • Renamed gitRevision param to ref on search_code tool. #795
  • Generally improved tool and tool param descriptions for all tools. #795

Summary by CodeRabbit

  • New Features

    • Pagination for repository and commit listings; paged responses include total counts and Link headers
    • File-path filtering and regex option for code search
  • Improvements

    • Language filter is now case-insensitive
    • Repository links and “open in” behavior use external browse URLs for more consistent navigation
    • Strengthened API response validation and error handling for more reliable results

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 27, 2026

Walkthrough

Implements offset pagination and RFC5988 Link headers for repos/commits, converts several POST endpoints to GET with query params, centralizes response validation, renames webUrl→externalWebUrl and searchCommits→listCommits, and introduces pagination/schema/typing updates across MCP and web packages.

Changes

Cohort / File(s) Summary
Changelog & Package deps
CHANGELOG.md, packages/mcp/CHANGELOG.md, packages/mcp/package.json
Changelog updates; added dedent dependency to packages/mcp.
MCP schemas & types
packages/mcp/src/schemas.ts, packages/mcp/src/types.ts
Renamed schemas/types (listRepos*, listCommits*), added paging fields (page, perPage, sort, direction), renamed repo/path/ref fields and webUrl→externalWebUrl.
MCP client & tools
packages/mcp/src/client.ts, packages/mcp/src/index.ts
Added common parseResponse, switched several endpoints from POST→GET (query params), renamed searchCommits→listCommits, updated tool inputs/outputs to schema-driven JSON.
Web client layer
packages/web/src/app/api/(client)/client.ts
Renamed getRepos→listRepos, migrated /api/source to GET with query params, updated type imports to ListReposQueryParams/ListReposResponse.
Server routes — repos/commits/source
packages/web/src/app/api/(server)/repos/route.ts, .../commits/route.ts, .../source/route.ts
Moved to query-parameter validation, added X-Total-Count and Link headers, GET handlers for commits/repos/source, swapped schemaValidationError→query/request-specific helpers.
Server routes — other APIs
packages/web/src/app/api/(server)/*/{chat,files,find_definitions,find_references,search,stream_search,tree}/route.ts
Replaced schemaValidationError with requestBodySchemaValidationError for consistent request-body validation responses.
Pagination utility & errors
packages/web/src/lib/pagination.ts, packages/web/src/lib/serviceError.ts, packages/web/src/lib/errorCodes.ts
Added buildLinkHeader and PaginationParams; added queryParamsSchemaValidationError and INVALID_QUERY_PARAMS error code; renamed schemaValidationError→requestBodySchemaValidationError.
Web schemas & types
packages/web/src/lib/schemas.ts, packages/web/src/lib/types.ts, packages/web/src/features/search/types.ts, packages/mcp/src/schemas.ts
Renamed/added listRepos* schemas, made webUrl required, added externalWebUrl/pushedAt, updated exported types (ListReposResponse, ListReposQueryParams, listCommits schemas).
Git/commits implementation & tests
packages/web/src/features/search/gitApi.ts, .../gitApi.test.ts
Renamed searchCommits→listCommits, changed input repo field and return shape to { commits, totalCount }, added rev-list count for totalCount, updated tests.
Search parsing & zoekt
packages/web/src/features/search/parser.ts, packages/web/src/features/search/zoektSearcher.ts
Language filter made case-insensitive; zoekt responses now compute webUrl/getBrowsePath and use base URL/branch handling.
File source shape & APIs
packages/web/src/features/search/fileSourceApi.ts, packages/web/src/app/api/(server)/source/route.ts, packages/mcp/src/client.ts
Renamed repository-related fields to repo*, added repoExternalWebUrl/externalWebUrl fields, moved getFileSource API to GET with query params.
Front-end UI updates (webUrl → externalWebUrl)
multiple components under packages/web/src/app/[domain]/..., packages/web/src/features/chat/..., packages/web/src/ee/...
Replaced repo.webUrl with repo.externalWebUrl across PathHeader, codeHostInfo lookups, repository carousel, search results, and chat referenced sources; adjusted getBrowsePath handling (revision optional).
Utilities & exports
packages/mcp/src/utils.ts, packages/web/src/lib/utils.ts
Added ServiceErrorException; getCodeHostInfoForRepo and CodeHostInfo now use externalWebUrl instead of repoLink/webUrl.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client
participant WebServer
participant SearchService
participant GitStore
Client->>WebServer: GET /api/commits?repo=...&page=...&perPage=...
WebServer->>SearchService: listCommits(queryParams)
SearchService->>GitStore: git log (ref, skip, maxCount, filters)
GitStore-->>SearchService: commits[]
SearchService->>GitStore: git rev-list --count (same filters)
GitStore-->>SearchService: totalCount
SearchService-->>WebServer: { commits, totalCount }
WebServer-->>Client: 200 + JSON body + headers (X-Total-Count, Link)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

sourcebot-team

Suggested reviewers

  • msukkari
🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'feat(web): API & MCP improvements' is vague and overly broad, using generic terms that don't convey meaningful information about the primary changes in this large changeset. Consider a more specific title that highlights the main change, such as 'feat: Add server-side pagination for repos and commits' or 'refactor: Migrate API endpoints to query parameters with pagination'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@brendan-kellam brendan-kellam marked this pull request as ready for review January 28, 2026 03:55
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
packages/web/src/app/[domain]/search/components/filterPanel/index.tsx (1)

60-65: Rename webUrl to externalWebUrl in repositoryInfoSchema for consistency.

The repositoryInfoSchema at line 29 of packages/web/src/features/search/types.ts still defines the property as webUrl, while the code at line 64 of filterPanel/index.tsx reads from repo.webUrl but passes it as the externalWebUrl parameter to getCodeHostInfoForRepo. To align with the PR-wide rename of webUrlexternalWebUrl, update the schema to use externalWebUrl: z.string().optional() instead of webUrl. This will ensure naming consistency across the codebase and eliminate the property/parameter name mismatch.

packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx (1)

94-99: Use repo.externalWebUrl for the external link target.
externalWebUrl: repo.webUrl likely points at the Sourcebot URL after the rename, so the code-host icon will open the wrong destination.

🛠️ Proposed fix
-                        externalWebUrl: repo.webUrl,
+                        externalWebUrl: repo.externalWebUrl,
packages/web/src/ee/features/codeNav/components/exploreMenu/referenceList.tsx (1)

93-99: Use repoInfo.externalWebUrl for the external link target.
externalWebUrl: repoInfo.webUrl will likely resolve to the Sourcebot URL after the rename, so the icon link won’t go to the external code host.

🛠️ Proposed fix
-                                        externalWebUrl: repoInfo.webUrl,
+                                        externalWebUrl: repoInfo.externalWebUrl,
🤖 Fix all issues with AI agents
In `@CHANGELOG.md`:
- Line 21: Fix the grammar in the CHANGELOG entry for the /api/commits endpoint:
change the phrase "to be a POST request to a GET request" to "from a POST
request to a GET request" so it reads that the /api/commits endpoint changed
from POST to GET; update the single line mentioning "/api/commits" accordingly.

In `@packages/mcp/CHANGELOG.md`:
- Line 13: Update the changelog entry text "Added server side pagination support
for `list_commits` and `list_repos`." to hyphenate the compound adjective by
changing "server side" to "server-side" so the line reads "Added server-side
pagination support for `list_commits` and `list_repos`."

In `@packages/mcp/src/index.ts`:
- Around line 29-37: The current zod schema unconditionally transforms query
(the query .transform(...) on the query schema) to a quoted literal and doesn't
escape backslashes, which breaks regex searches and can corrupt queries; remove
that unconditional transform and instead handle quoting/escaping in the request
handler where useRegex (the useRegex z.boolean().optional()) is available: when
useRegex is true pass the raw query through, and when false produce a literal
phrase by escaping double-quotes and backslashes then wrapping in quotes; apply
the same change to the second occurrence of the query transform (the other block
noted around lines 67–77) so both code paths behave the same.
- Around line 46-49: The schema field filterByFilepaths currently claims to
accept regexes but the code escapes input (via escapeStringRegexp), so either
update the description to state "literal path match" or stop escaping so real
regexes are allowed; to fix, locate the z.array(z.string()) definition named
filterByFilepaths and either change its .describe(...) text to "Scope the search
to the provided filepaths. Interpreted as literal path match." or remove the
escapeStringRegexp call where those values are later converted to regexes (so
the raw strings are passed through), and make the same change for the other
identical field instance referenced around the second occurrence.
- Around line 121-126: Update the snippet that builds the `text` string to
prefer `file.externalWebUrl` over `file.webUrl` (with `file.webUrl` as a
fallback) so the output matches the tool contract; locate the `text`
construction (the `dedent` block referencing `file.webUrl`, `numMatches`,
`file.repository`, `file.language`) and change the URL insertion to use
`file.externalWebUrl ?? file.webUrl` (or equivalent null/undefined-safe
fallback) while leaving the other fields unchanged.

In `@packages/web/src/app/api/`(server)/commits/route.ts:
- Around line 1-6: Add the missing 'use server' directive as the very first
statement in the file so the route runs in server context; place it before the
imports and keep the rest of the module (including the export GET handler and
uses of listCommits, buildLinkHeader, serviceErrorResponse, etc.) unchanged to
mirror other server routes like tree/route.ts and search/route.ts.

In `@packages/web/src/app/api/`(server)/repos/route.ts:
- Around line 77-82: The Link header built by buildLinkHeader is missing the
search query so pagination loses filters; update the call that constructs
linkHeader (where buildLinkHeader(request, { page, perPage, totalCount,
extraParams: { sort, direction } })) to include the active query parameter
(e.g., add query from request.searchParams or request.url into extraParams: {
sort, direction, query }) so next/prev links preserve the search term when using
the repos route.
- Around line 41-43: The totalCount calculation uses prisma.repo.count without
the search filter, so it returns an unfiltered total and breaks pagination;
update the prisma.repo.count call to use the same where clause used for
prisma.repo.findMany (include orgId plus the search conditions/filters derived
from query, e.g., name contains/insensitive or whatever filter logic is used) so
X-Total-Count and pagination reflect the filtered result set (modify the count
call at prisma.repo.count to mirror the findMany where object).

In `@packages/web/src/features/search/gitApi.ts`:
- Around line 99-105: The rev-list args currently append ref twice because
sharedOptions includes a computed key equal to ref; when building countArgs in
the loop over sharedOptions skip the entry whose key equals the ref variable (or
filter sharedOptions to remove that key) so only the explicit ref pushed at the
start is used; update the iteration around countArgs (the loop over
Object.entries(sharedOptions>) to continue/ignore when key === ref, leaving
git.raw(countArgs) unchanged.
🧹 Nitpick comments (6)
packages/web/src/actions.ts (1)

481-495: Consider if getRepos should guard headers() for potential future non-request contexts.

headers() requires a request context and will throw outside of it (e.g., background jobs). While all current call sites are server-side page components within requests, if getRepos is ever invoked in a background context, the function will hard-fail. A fallback to env.AUTH_URL would provide robustness:

Optional defensive guard
-        const headersList = await headers();
-        const baseUrl = getBaseUrl(headersList);
+        let baseUrl = env.AUTH_URL;
+        try {
+            const headersList = await headers();
+            baseUrl = getBaseUrl(headersList);
+        } catch {
+            // No request context; use fallback
+        }
packages/mcp/src/types.ts (2)

35-36: Type naming inconsistency.

ListCommitsQueryParamsSchema includes "Schema" in the type name, which is inconsistent with other type naming conventions in this file (e.g., ListReposQueryParams, SearchRequest). Consider renaming to ListCommitsQueryParams for consistency.

Suggested fix
-export type ListCommitsQueryParamsSchema = z.infer<typeof listCommitsQueryParamsSchema>;
+export type ListCommitsQueryParams = z.infer<typeof listCommitsQueryParamsSchema>;

26-26: Inconsistent use of z.input vs z.infer.

ListReposQueryParams uses z.input while ListCommitsQueryParamsSchema uses z.infer. For schemas with defaults:

  • z.input makes defaulted fields optional (useful for input/request types)
  • z.infer makes defaulted fields required (useful for output/parsed types)

If both are intended for client-side request building, consider using z.input consistently. If both are for parsed responses, use z.infer consistently.

Also applies to: 35-35

packages/web/src/features/search/zoektSearcher.ts (1)

386-388: Verify request-scoped headers usage is safe in all call paths.

headers() is request-scoped in Next.js; if transformZoektSearchResponse is ever invoked outside a route handler/server action (e.g., background tasks), this will throw. Consider passing baseUrl from the caller if that’s a possibility.

packages/web/src/app/api/(client)/client.ts (1)

54-68: Consider removing unnecessary Content-Type header and extracting URL builder.

  1. The Content-Type: application/json header on Line 62-64 is unnecessary for a GET request with no body (note: getFileSource correctly omits it).

  2. The URL construction pattern (lines 55-58) is duplicated from getFileSource. Consider extracting a helper.

♻️ Suggested refactor
+const buildUrl = (path: string, params: Record<string, unknown>): URL => {
+    const url = new URL(path, window.location.origin);
+    for (const [key, value] of Object.entries(params)) {
+        if (value !== undefined) {
+            url.searchParams.set(key, String(value));
+        }
+    }
+    return url;
+};
+
 export const listRepos = async (queryParams: ListReposQueryParams): Promise<ListReposResponse | ServiceError> => {
-    const url = new URL("/api/repos", window.location.origin);
-    for (const [key, value] of Object.entries(queryParams)) {
-        url.searchParams.set(key, value.toString());
-    }
-
+    const url = buildUrl("/api/repos", queryParams);
     const result = await fetch(url, {
         method: "GET",
-        headers: {
-            "Content-Type": "application/json",
-        },
     }).then(response => response.json());

     return result as ListReposResponse | ServiceError;
 }
packages/web/src/features/search/gitApi.ts (1)

78-90: Consider consolidating --regexp-ignore-case handling.

When both author and query are provided, --regexp-ignore-case is set twice in the spread. While JavaScript objects deduplicate keys (last wins), the intent is clearer if case-insensitivity is handled once.

♻️ Suggested refactor
             const sharedOptions: Record<string, string | number | null> = {
                 [ref]: null,
                 ...(gitSince ? { '--since': gitSince } : {}),
                 ...(gitUntil ? { '--until': gitUntil } : {}),
-                ...(author ? {
-                    '--author': author,
-                    '--regexp-ignore-case': null /// Case insensitive
-                } : {}),
-                ...(query ? {
-                    '--grep': query,
-                    '--regexp-ignore-case': null /// Case insensitive
-                } : {}),
+                ...(author ? { '--author': author } : {}),
+                ...(query ? { '--grep': query } : {}),
+                ...((author || query) ? { '--regexp-ignore-case': null } : {}),
             };

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@CHANGELOG.md`:
- Line 22: The CHANGELOG line uses "apis" in lowercase; update the sentence to
consistently capitalize it as "APIs" (preserve the rest of the text and the
identifiers `webUrl` and `externalWebUrl` unchanged) so the line reads with
"APIs" instead of "apis".
- Line 19: Update the changelog entry that reads "Made the code search `lang:`
filter case insensitive.
[`#795`](https://github.com/sourcebot-dev/sourcebot/pull/795)" by hyphenating
"case-insensitive" so it reads "Made the code search `lang:` filter
case-insensitive. [`#795`](https://github.com/sourcebot-dev/sourcebot/pull/795)";
locate the exact line containing that text in CHANGELOG.md and replace the
unhyphenated phrase with the hyphenated form to maintain style consistency.

In `@packages/web/src/app/api/`(server)/commits/route.ts:
- Around line 8-17: The schema listCommitsQueryParamsSchema currently allows an
empty repo string; update the repo validator to reject empty values by replacing
z.string() with a non-empty string check (e.g., z.string().nonempty() or
z.string().min(1)) so requests with repo == "" fail validation earlier and avoid
downstream "Repository \"\" not found" errors.
🧹 Nitpick comments (1)
packages/web/src/features/search/gitApi.ts (1)

107-107: Consider adding a type guard or mapper instead of double assertion.

The as unknown as Commit[] assertion bypasses TypeScript's type checking. While simple-git's DefaultLogFields likely contains the same fields as Commit, this assertion hides any potential mismatches.

♻️ Optional: Add explicit field mapping for type safety
-            return { commits: log.all as unknown as Commit[], totalCount };
+            const commits: Commit[] = log.all.map(entry => ({
+                hash: entry.hash,
+                date: entry.date,
+                message: entry.message,
+                refs: entry.refs,
+                body: entry.body,
+                author_name: entry.author_name,
+                author_email: entry.author_email,
+            }));
+            return { commits, totalCount };

This makes the mapping explicit and would surface any field mismatches at compile time.

@brendan-kellam brendan-kellam merged commit c00c67b into main Jan 28, 2026
9 checks passed
@brendan-kellam brendan-kellam deleted the bkellam/repos-pagination branch January 28, 2026 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants