-
Notifications
You must be signed in to change notification settings - Fork 2k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I was playing around with the Datafusion CSV parser by using the example from https://duckdb.org/2025/09/08/duckdb-on-the-framework-laptop-13 but DataFusion refused to load it into parquet
To Reproduce
Get the data
wget https://blobs.duckdb.org/nl-railway/railway-services-80-months.zip
unzip railway-services-80-months.zipThen run
mkdir services-parquet
datafusion-cliConvert each file to parquet:
COPY 'services/services-2019.csv' TO 'services-parquet/services-2019.parquet';
COPY 'services/services-2020.csv' TO 'services-parquet/services-2020.parquet';
COPY 'services/services-2021.csv' TO 'services-parquet/services-2021.parquet';
COPY 'services/services-2022.csv' TO 'services-parquet/services-2022.parquet';
COPY 'services/services-2023.csv' TO 'services-parquet/services-2023.parquet';
COPY 'services/services-2024.csv' TO 'services-parquet/services-2024.parquet';
COPY 'services/services-2025-01.csv' TO 'services-parquet/services-2025-01.parquet';
COPY 'services/services-2025-02.csv' TO 'services-parquet/services-2025-02.parquet';
COPY 'services/services-2025-03.csv' TO 'services-parquet/services-2025-03.parquet';
COPY 'services/services-2025-04.csv' TO 'services-parquet/services-2025-04.parquet';
COPY 'services/services-2025-05.csv' TO 'services-parquet/services-2025-05.parquet';
COPY 'services/services-2025-06.csv' TO 'services-parquet/services-2025-07.parquet';
COPY 'services/services-2025-07.csv' TO 'services-parquet/services-2025-07.parquet';
COPY 'services/services-2025-08.csv' TO 'services-parquet/services-2025-08.parquet';And then run
DataFusion CLI v49.0.2
> select * from 'services-parquet' limit 10;
Arrow error: Schema error: Fail to merge schema field 'Stop:Arrival time' because the from data_type = Timestamp(Second, None) does not equal Utf8Expected behavior
I expect to be able to read the data corrrectly
Additional context
One error is that the the type of the Stop: ArrivalTime has been converted to something different in some of the different files. Sometimes it is a timestamp and sometimes a string:
> describe 'services-parquet/services-2020.parquet';
+------------------------------+-----------+-------------+
| column_name | data_type | is_nullable |
+------------------------------+-----------+-------------+
| Service:RDT-ID | Int64 | YES |
| Service:Date | Date32 | YES |
| Service:Type | Utf8View | YES |
| Service:Company | Utf8View | YES |
| Service:Train number | Int64 | YES |
| Service:Completely cancelled | Boolean | YES |
| Service:Partly cancelled | Boolean | YES |
| Service:Maximum delay | Int64 | YES |
| Stop:RDT-ID | Int64 | YES |
| Stop:Station code | Utf8View | YES |
| Stop:Station name | Utf8View | YES |
| Stop:Arrival time | Utf8View | YES |
| Stop:Arrival delay | Utf8View | YES |
| Stop:Arrival cancelled | Utf8View | YES |
| Stop:Departure time | Utf8View | YES |
| Stop:Departure delay | Utf8View | YES |
| Stop:Departure cancelled | Utf8View | YES |
+------------------------------+-----------+-------------+
17 row(s) fetched.
Elapsed 0.009 seconds.
> describe 'services-parquet/services-2021.parquet';
+------------------------------+-------------------------+-------------+
| column_name | data_type | is_nullable |
+------------------------------+-------------------------+-------------+
| Service:RDT-ID | Int64 | YES |
| Service:Date | Date32 | YES |
| Service:Type | Utf8View | YES |
| Service:Company | Utf8View | YES |
| Service:Train number | Int64 | YES |
| Service:Completely cancelled | Boolean | YES |
| Service:Partly cancelled | Boolean | YES |
| Service:Maximum delay | Int64 | YES |
| Stop:RDT-ID | Int64 | YES |
| Stop:Station code | Utf8View | YES |
| Stop:Station name | Utf8View | YES |
| Stop:Arrival time | Timestamp(Second, None) | YES |. <--- Note this field type is different
| Stop:Arrival delay | Int64 | YES |
| Stop:Arrival cancelled | Boolean | YES |
| Stop:Departure time | Utf8View | YES |
| Stop:Departure delay | Utf8View | YES |
| Stop:Departure cancelled | Utf8View | YES |
+------------------------------+-------------------------+-------------+
17 row(s) fetched.
Elapsed 0.008 seconds.Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working