-
Notifications
You must be signed in to change notification settings - Fork 0
TIKA-4578 - Add GPG signed release support for future Tika 4.0.0+ releases #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…Store - Created build-from-branch.sh script to build Docker images from Git branches - Added Dockerfile.ignite for building with Ignite ConfigStore support - Added sample Ignite configuration and documentation - Updated main README with build-from-branch instructions This enables testing development features (like TIKA-4583 Ignite ConfigStore) before they are officially released, without needing to modify the main Tika repository per PR #2462. Usage: ./build-from-branch.sh -b TIKA-4583-ignite-config-store -i Features: - Builds from any Git branch or tag - Optional Ignite ConfigStore plugin inclusion - Supports custom repositories (forks) - Automatic testing after build - Optional push to registry Related to: apache/tika#2462, TIKA-4583
- Add republish-images.sh for rebuilding all versions from signed releases - Update README.md with pre-release status and clear build instructions - Document current build method (build-from-branch.sh for development) - Document future build method (docker-tool.sh with GPG verification) - Explain relationship with tika repo Maven builds - Add notes to Dockerfiles explaining they require official releases - Align with tika-docker GPG verification approach The Dockerfiles already contain GPG signature verification logic and will work automatically once tika-server-grpc-4.0.0.jar is published to Apache distribution mirrors. Until then, use build-from-branch.sh to build from source.
The actual Maven artifact is tika-grpc-${VERSION}.jar, not
tika-server-grpc-${VERSION}.jar. Updated all Dockerfiles and
documentation to use the correct artifact name.
🔧 Fixed JAR Artifact NamePushed an update to fix a critical issue - the Dockerfiles were referencing the wrong JAR artifact name. The Problem
The FixUpdated all references in:
The actual Maven artifact built by the tika-grpc module is Now when Tika 4.0.0 is released, the Dockerfiles will correctly download |
- Created full/Dockerfile.source with multi-stage build - Stage 1: Clones Git repo and builds tika-grpc with Maven - Stage 2: Creates runtime image with built JAR - Updated build-from-branch.sh to use Dockerfile.source by default - Successfully tested full circle build from TIKA-4578 branch - Dockerfile.ignite remains for Ignite-specific builds This enables users to easily build Docker images from any tika branch for development and testing without needing local tika checkout.
- Removed Ignite-specific handling (plugin loaded via config, not build-time) - Created Dockerfile.source for building from Git branches - Removed -i flag and Dockerfile.ignite (unnecessary complexity) - All plugins (including Ignite) are loaded at runtime via tika-config.xml - Successfully tested building from TIKA-4578 branch This simplifies the build process - there's only one way to build from source, and plugins are configured at runtime, not build-time.
✅ Full Circle Build Test - SUCCESSFULSuccessfully tested the complete workflow of building a Docker image from the TIKA-4578 source branch. Changes in Latest UpdateSimplified build-from-branch.sh:
Added Dockerfile.source:
Test Results./build-from-branch.sh -b TIKA-4578 -t tika-4578-test✅ Build successful:
✅ Image functional: $ docker run --rm apache/tika-grpc:tika-4578-test --help
Usage: <main class> [options]
Options:
-c, --config The tika config file
-l, --plugins The tika pipes plugins config file
-p, --port The grpc server port
...SummaryThe repository now supports two build paths:
Both paths tested and working! 🎉 |
- Added -l option to build-from-branch.sh for building from local tika directory - Created Dockerfile.local for building from pre-built local JAR - Script copies local JAR to build context and builds Docker image - Successfully tested building from local /home/user/tika directory - Successfully tested running container with tika-config.json - Added sample-configs/test-simple.json for testing Users can now: 1. Build from Git branch: ./build-from-branch.sh -b TIKA-4578 2. Build from local dir: ./build-from-branch.sh -l /path/to/tika -t my-build This is useful for rapid iteration during development without needing to push changes to a Git branch.
- Converted all XML configs to JSON format (Tika 4.x standard) - grobid/tika-config.json - Grobid journal parser config - customocr/tika-config-inline.json - OCR on inline PDF images - customocr/tika-config-rendered.json - OCR on rendered PDF pages - ner/tika-config.json - Named Entity Recognition config - vision/inception-rest.json - TensorFlow image recognition - vision/inception-rest-caption.json - TensorFlow image captioning - vision/inception-rest-video.json - TensorFlow video recognition XML files remain for backward compatibility but JSON is preferred for Tika 4.x.
🎉 Complete Implementation SummaryThis PR is now complete with full GPG signed release support and comprehensive development workflow improvements. Latest Updates1. Local Directory Build Support ✅
2. Sample Configs Converted to JSON ✅
Build Options SummaryUsers now have three ways to build tika-grpc Docker images: # 1. Build from Git branch (remote)
./build-from-branch.sh -b TIKA-4578
# 2. Build from local tika directory (for rapid development)
./build-from-branch.sh -l /home/user/tika -t my-test
# 3. Build from GPG-signed release (future - post 4.0.0)
./docker-tool.sh build 4.0.0 4.0.0What's Included✅ GPG signature verification Dockerfiles (ready for 4.0.0) Testing Completed✅ Built from TIKA-4578 Git branch - SUCCESS Ready for review and merge! 🚀 |
- Removed -i flag references (Ignite is just a plugin loaded via config) - Updated examples to show local directory build option - Simplified documentation to focus on core build methods - Ignite plugin can be used via tika-config.json, no special build needed
- Documented how tika-grpc-docker ensures reproducible builds - Explained GPG signature verification for release builds - Described Git-based traceability for development builds - Added verification instructions for both build types - Highlighted security and compliance benefits - Linked to reproducible-builds.org for more information This helps users understand the security guarantees and transparency provided by the build process.
🔒 Added Reproducible Builds DocumentationAdded a comprehensive section on Reproducible Builds to the README, documenting how tika-grpc-docker ensures transparency and security in the software supply chain. What's DocumentedFor Official Releases:
For Development Builds:
Verification Instructions:
Why This MattersReproducible builds are critical for:
This aligns with Apache best practices and modern security standards. 🔐 |
Summary
This PR prepares tika-grpc-docker for the future release of Tika 4.0.0 by implementing GPG signed release verification, aligning with the tika-docker repository's approach.
Background
tika-grpc has not been officially released yet - it only exists in Tika 4.0.0-SNAPSHOT. This PR prepares the repository to handle both:
Changes
Added Files
Modified Files
README.md - Major updates:
build-from-branch.shfor development)docker-tool.shwith GPG verification)minimal/Dockerfile - Added comment explaining it requires official Apache releases (Tika 4.0.0+)
full/Dockerfile - Same note as minimal/Dockerfile
How It Works
Current State (Pre-Release)
Build from source using:
Future State (Post Tika 4.0.0 Release)
Build from GPG-signed releases:
This will:
tika-server-grpc-4.0.0.jarfrom Apache distribution mirrors.ascGPG signature fileAlignment with tika-docker
This PR ensures tika-grpc-docker follows the EXACT same pattern as tika-docker:
Testing
The Dockerfiles already contain the GPG verification logic (inherited from tika-docker). They will work automatically once
tika-server-grpc-4.0.0.jaris published to Apache distribution mirrors.Development builds can be tested now with:
Related
This complements the work in apache/tika PR #2462 (TIKA-4578) which adds Maven-based Docker builds to the tika repository for development purposes.