This project reads data from a CSV file and populates two PostgreSQL tables in a DBeaver database.
The script maps CSV columns to two database tables:
- tasker_id
- metro_name
- job_id
- postal_code
- latitude
- longitude
- country_key
- latest_schedule_start_at
- time_zone
- is_job_bundle
- is_assigned
- is_accepted
- is_scheduled
- marketplace_key
- description
- duration_hours
- tasker_take_home_pay
- tasker_id
- name
- phone_number
- tenure_months
- lifetime_submitted_invoices_bucket
- Install dependencies:
pip install -r requirements.txt- Create a
.envfile based onenv_example.txtand fill in your database credentials:
cp env_example.txt .env- Update the
.envfile with your actual database connection details.
python db_populator.py --csv-path /path/to/your/csv/file.csvpython db_populator.py --csv-path /path/to/your/csv/file.csv --testpython db_populator.py --csv-path /path/to/your/csv/file.csv --env-file /path/to/custom/.env--csv-path: Required - Path to the CSV file to process--test: Optional - Use test tables instead of production tables--env-file: Optional - Path to custom .env file (default: .env)
The script uses environment variables for database configuration:
DB_HOST: Database host addressDB_PORT: Database port (default: 5432)DB_NAME: Database nameDB_USER: Database usernameDB_PASSWORD: Database password
- Automatic duplicate removal: Removes duplicates based on
tasker_idfor tasker data table only - Full job data: Keeps all job records in tasks table (job_id should be unique)
- Column validation: Validates that all required CSV columns are present
- Test/Production mode: Automatically selects appropriate table names
- Comprehensive logging: Detailed logging for troubleshooting
- Error handling: Graceful error handling with informative messages