Skip to content

happyrobot-ai/taskrabbit_tasker_assignment_db_populate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TaskRabbit Tasker Assignment Database Populator

This project reads data from a CSV file and populates two PostgreSQL tables in a DBeaver database.

CSV Data Mapping

The script maps CSV columns to two database tables:

Tasks Table (taskrabbit_tasks_1 or test_task_rabbit_tasks_1)

  • tasker_id
  • metro_name
  • job_id
  • postal_code
  • latitude
  • longitude
  • country_key
  • latest_schedule_start_at
  • time_zone
  • is_job_bundle
  • is_assigned
  • is_accepted
  • is_scheduled
  • marketplace_key
  • description
  • duration_hours
  • tasker_take_home_pay

Tasker Data Table (taskrabbit_tasker_data_1 or test_taskrabbit_tasker_data_1)

  • tasker_id
  • name
  • email
  • phone_number
  • tenure_months
  • lifetime_submitted_invoices_bucket

Setup

  1. Install dependencies:
pip install -r requirements.txt
  1. Create a .env file based on env_example.txt and fill in your database credentials:
cp env_example.txt .env
  1. Update the .env file with your actual database connection details.

Usage

Production Tables (Default)

python db_populator.py --csv-path /path/to/your/csv/file.csv

Test Tables

python db_populator.py --csv-path /path/to/your/csv/file.csv --test

Custom Environment File

python db_populator.py --csv-path /path/to/your/csv/file.csv --env-file /path/to/custom/.env

Command Line Arguments

  • --csv-path: Required - Path to the CSV file to process
  • --test: Optional - Use test tables instead of production tables
  • --env-file: Optional - Path to custom .env file (default: .env)

Configuration

The script uses environment variables for database configuration:

  • DB_HOST: Database host address
  • DB_PORT: Database port (default: 5432)
  • DB_NAME: Database name
  • DB_USER: Database username
  • DB_PASSWORD: Database password

Features

  • Automatic duplicate removal: Removes duplicates based on tasker_id for tasker data table only
  • Full job data: Keeps all job records in tasks table (job_id should be unique)
  • Column validation: Validates that all required CSV columns are present
  • Test/Production mode: Automatically selects appropriate table names
  • Comprehensive logging: Detailed logging for troubleshooting
  • Error handling: Graceful error handling with informative messages

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages