Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Conversation

@camdencheek
Copy link
Member

@camdencheek camdencheek commented Jun 9, 2023

This adds incremental stats reporting to our embedding indexing jobs and displays it in the UI while the job is processing.

Closes https://github.com/sourcegraph/sourcegraph/issues/52358

Screenshot 2023-06-08 at 18 50 28

Test plan

Manually tested clarity and usability. Added database test. Added unit test that ensures we update periodically.

@cla-bot cla-bot bot added the cla-signed label Jun 9, 2023
@camdencheek camdencheek force-pushed the cc/embeddings-progress branch from 5dcab3b to 0ea0281 Compare June 9, 2023 17:43
Comment on lines +320 to +321
// JSONMessage wraps a value that can be encoded/decoded as JSON so that
// it implements db.Scanner and db.Valuer.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat unrelated to the PR, but it makes working with JSON a little less painful. Normally, we'd scan to a []byte, then unmarshal separately and do the opposite during insert, but this wrapper allows you to just use a JSON-serializable type directly.

@camdencheek camdencheek changed the title Cc/embeddings progress Embeddings: add progress updates during indexing Jun 9, 2023
Comment on lines +221 to +236
"""
The number of files scheduled to be embedded.
"""
filesScheduled: Int!

"""
The number of files we generated embeddings for.
This will be updated periodically while the embeddings job is processing.
"""
filesEmbedded: Int!

"""
The number of files skipped.
This will be updated periodically while the embeddings job is processing.
"""
filesSkipped: Int!
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to expose more granular information, but I'd like to keep it simple for now.

@camdencheek camdencheek marked this pull request as ready for review June 9, 2023 21:48
@sourcegraph-bot
Copy link
Contributor

sourcegraph-bot commented Jun 9, 2023

Codenotify: Notifying subscribers in CODENOTIFY files for diff 26de97c...a25bd51.

Notify File(s)
@efritz enterprise/cmd/worker/internal/embeddings/repo/handler.go
internal/database/dbutil/dbutil.go

@camdencheek camdencheek requested a review from a team June 9, 2023 21:50
@sourcegraph-bot
Copy link
Contributor

sourcegraph-bot commented Jun 9, 2023

📖 Storybook live preview

@jtibshirani jtibshirani requested a review from efritz June 12, 2023 19:57
Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Just one remaining comment: the new table will still grow at the same rate as the main job table (and we don't prune old records), so it'd still be good to reduce the number of columns it has.

@camdencheek camdencheek merged commit 2f1faff into main Jun 13, 2023
@camdencheek camdencheek deleted the cc/embeddings-progress branch June 13, 2023 17:24
ErikaRS pushed a commit that referenced this pull request Jun 22, 2023
This adds incremental stats reporting to our embedding indexing jobs and
displays it in the UI while the job is processing.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Show progress in job list

5 participants