This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is the Python client library for Project amp - a system for querying a amp server (Flight SQL based) and loading the returned data into various storage systems. The library focuses on zero-copy operations and high-performance data streaming.
# Install dependencies and build
uv build
# Activate virtual environment
uv venv
# Run Marimo notebook environment
uv run marimo edit# Quick unit tests (no database dependencies)
make test-unit
# All tests with coverage
make test-all
# Specific loader tests
make test-postgresql
make test-redis
# Run a single test file
uv run pytest tests/unit/test_postgresql_loader.py -v
# Run a single test function
uv run pytest tests/unit/test_postgresql_loader.py::TestPostgreSQLLoader::test_connection -v# Format code
make format
# or
uv run ruff format
# Lint code
make lint
# or
uv run ruff check .
# Auto-fix linting issues
uv run ruff check . --fixThe core architecture follows a plugin-based loader system with:
- Generic
DataLoader[TConfig]base class insrc/amp/loaders/base.py - Auto-discovery via registry in
src/amp/loaders/registry.py - Zero-copy operations using PyArrow for performance
- Built-in resilience (retry, backpressure), state management, and reorg handling
For detailed implementation instructions, see src/amp/loaders/NEW_LOADER_GUIDE.md
Files to create:
src/amp/loaders/implementations/xxx_loader.py- Main implementationtests/integration/loaders/backends/test_xxx.py- Integration tests
Files to modify:
src/amp/loaders/implementations/__init__.py- Add importtests/conftest.py- Add testcontainer and config fixtures
Required implementation:
@dataclass
class XxxConfig:
host: str = 'localhost'
port: int = 1234
# ... connection settings
class XxxLoader(DataLoader[XxxConfig]):
SUPPORTED_MODES = {LoadMode.APPEND}
SUPPORTS_TRANSACTIONS = False
def connect(self) -> None: ...
def disconnect(self) -> None: ...
def _load_batch_impl(self, batch, table_name, **kwargs) -> int: ...
def _create_table_from_schema(self, schema, table_name) -> None: ...
def table_exists(self, table_name) -> bool: ...Test implementation:
class XxxTestConfig(LoaderTestConfig):
loader_class = XxxLoader
config_fixture_name = 'xxx_test_config'
def get_row_count(self, loader, table_name) -> int: ...
def query_rows(self, loader, table_name, where, order_by) -> List[Dict]: ...
def cleanup_table(self, loader, table_name) -> None: ...
def get_column_names(self, loader, table_name) -> List[str]: ...
class TestXxxCore(BaseLoaderTests):
config = XxxTestConfig() # Inherits 6 generalized tests
class TestXxxStreaming(BaseStreamingTests):
config = XxxTestConfig() # Inherits 5 streaming tests- ClickHouse: OLAP, columnar, no transactions -
clickhouse_loader.py - PostgreSQL: OLTP, connection pooling, transactions -
postgresql_loader.py - Redis: Key-value, multiple data structures -
redis_loader.py - Snowflake: Cloud warehouse -
snowflake_loader.py - DeltaLake: File-based, ACID transactions -
deltalake_loader.py - Iceberg: Catalog-based, partitioned tables -
iceberg_loader.py - LMDB: Embedded, memory-mapped -
lmdb_loader.py
- Unit tests: Test pure logic and data structures WITHOUT mocking. Unit tests should be simple, fast, and test isolated components (dataclasses, utility functions, partitioning logic, etc.). Do NOT add tests that require mocking to
tests/unit/. - Integration tests: Use testcontainers for real database testing. Tests that require external dependencies (databases, Flight SQL server, etc.) belong in
tests/integration/. - Performance tests: Benchmark data loading operations
- Tests can be filtered using pytest markers (e.g.,
-m unitfor unit tests only)
- pyarrow: Core data processing and zero-copy operations
- Flight SQL: Protocol for querying amp server
- Database drivers: psycopg2 (PostgreSQL), redis
- Testing: pytest with testcontainers for integration tests
- Always use
uv runprefix for Python commands to ensure proper environment - Integration tests require Docker for database containers
- The project uses Ruff for both linting and formatting with specific per-file ignores
- Protocol buffer files (
FlightSql_pb2.py) are excluded from linting