-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
component:cliCommand-line interfaceCommand-line interfaceepicLarge feature containing multiple storiesLarge feature containing multiple storiespriority:mediumNormal priorityNormal priority
Milestone
Description
PDFVEC-004
Goals
- Sub-second startup time
- Unix pipeline friendly (stdin/stdout)
- Parallel processing of multiple files
- Progress indication for large jobs
Build a fast, user-friendly CLI tool for extracting text from PDFs. Should support single files, directories, and stdin/stdout for pipeline integration.
Acceptance Criteria
AC-1
- Given A PDF file path
- When pdfvec extract file.pdf is run
- Then Text is printed to stdout
AC-2
- Given PDF data on stdin
- When cat file.pdf | pdfvec extract - is run
- Then Text is printed to stdout
AC-3
- Given A directory of PDFs
- When pdfvec extract --recursive dir/ is run
- Then All PDFs are processed in parallel
Technical Context
Crates: clap, indicatif, rayon
Files:
src/bin/pdfvec.rssrc/cli/mod.rssrc/cli/extract.rs
Source: epics/03-cli.json
Content Hash: 87fae0ee33336c46
Child Issues: PDFVEC-040, PDFVEC-041
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
component:cliCommand-line interfaceCommand-line interfaceepicLarge feature containing multiple storiesLarge feature containing multiple storiespriority:mediumNormal priorityNormal priority