Skip to content

Question: How to optimize load_data-Operation #2960

@inkrement

Description

@inkrement

I want to copy my MySQL data (>200 Mio. rows) to BigQuery. Therefore I created a python script, which uses this library. At the moment it streams 1000 rows with one request and it generates about 1,1 requests/second. This is not really fast and it would take me days to transfer the whole dataset. I am sure that this can be optimized, but I don't know how. Would you have some suggestions? You can find my source code here

I thought about the following points:

  • Each request contains 1000 rows, should I choose a bigger number?
  • Does this library use gzip per default?

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the BigQuery API.type: questionRequest for information or clarification. Not an issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions