Can't read a directory of CSV files / CSV schema evolution:  incorrect number of fields for line 1, expected 17 got 20

### Describe the bug

I was trying to see how fast the Datafusion CSV parser was by using the example from https://duckdb.org/2025/09/08/duckdb-on-the-framework-laptop-13 but DataFusion refused to load it for a few reasons

### To Reproduce

```shell
wget https://blobs.duckdb.org/nl-railway/railway-services-80-months.zip
unzip railway-services-80-months.zip
```

Then run 
```shell
datafusion-cli
```

```sql
andrewlamb@Andrews-MacBook-Pro-3:~/Downloads$ datafusion-cli
DataFusion CLI v49.0.2
> select * from 'services';
Arrow error: Csv error: incorrect number of fields for line 1, expected 17 got 20
```

### Expected behavior

I expect the directory to be treated as a single table, correctly

Note that selecting from individual files work fine:

```sql
> select * from 'services/services-2025-01.csv' limit 10;
+----------------+--------------+--------------+-----------------+----------------------+------------------------------+--------------------------+-----------------------+-------------+-------------------+-------------------+---------------------+--------------------+------------------------+---------------------------+----------------------+--------------------------+
| Service:RDT-ID | Service:Date | Service:Type | Service:Company | Service:Train number | Service:Completely cancelled | Service:Partly cancelled | Service:Maximum delay | Stop:RDT-ID | Stop:Station code | Stop:Station name | Stop:Arrival time   | Stop:Arrival delay | Stop:Arrival cancelled | Stop:Departure time       | Stop:Departure delay | Stop:Departure cancelled |
+----------------+--------------+--------------+-----------------+----------------------+------------------------------+--------------------------+-----------------------+-------------+-------------------+-------------------+---------------------+--------------------+------------------------+---------------------------+----------------------+--------------------------+
| 15122556       | 2025-01-07   | Intercity    | NS              | 3022                 | false                        | false                    | 0                     | 136517277   | AMRN              | Alkmaar Noord     | 2025-01-07T08:52:00 | 1                  | false                  | 2025-01-07T09:52:00+01:00 | 1                    | false                    |
...
| 15122557       | 2025-01-07   | Sprinter     | NS              | 6929                 | false                        | false                    | 0                     | 136517286   | ZTM               | Zoetermeer        | 2025-01-07T06:55:00 | 0                  | false                  | 2025-01-07T07:55:00+01:00 | 0                    | false                    |
+----------------+--------------+--------------+-----------------+----------------------+------------------------------+--------------------------+-----------------------+-------------+-------------------+-------------------+---------------------+--------------------+------------------------+---------------------------+----------------------+--------------------------+
10 row(s) fetched.
Elapsed 0.019 seconds.
```

Also, if you remove the few most recent
```shell
rm services/services-2025-04.csv
rm services/services-2025-05.csv
rm services/services-2025-06.csv
rm services/services-2025-07.csv
rm services/services-2025-08.csv
```

It works fine:
```sql
> select * from 'services' limit 10;
+----------------+--------------+--------------+-----------------+----------------------+------------------------------+--------------------------+-----------------------+-------------+-------------------+--------------------------+---------------------------+--------------------+------------------------+---------------------------+----------------------+--------------------------+
| Service:RDT-ID | Service:Date | Service:Type | Service:Company | Service:Train number | Service:Completely cancelled | Service:Partly cancelled | Service:Maximum delay | Stop:RDT-ID | Stop:Station code | Stop:Station name        | Stop:Arrival time         | Stop:Arrival delay | Stop:Arrival cancelled | Stop:Departure time       | Stop:Departure delay | Stop:Departure cancelled |
+----------------+--------------+--------------+-----------------+----------------------+------------------------------+--------------------------+-----------------------+-------------+-------------------+--------------------------+---------------------------+--------------------+------------------------+---------------------------+----------------------+--------------------------+
| 15566803       | 2025-03-15   | Intercity    | NS              | 3970                 | false                        | false                    | 0                     | 140690195   | UT                | Utrecht Centraal         | 2025-03-15T20:17:00+01:00 | 0                  | false                  | 2025-03-15T20:19:00+01:00 | 0                    | false                    |
...
| 15566803       | 2025-03-15   | Intercity    | NS              | 3970                 | false                        | false                    | 0                     | 140690204   | BKF               | Bovenkarspel Flora       | 2025-03-15T21:49:00+01:00 | 0                  | false                  | 2025-03-15T21:49:00+01:00 | 0                    | false                    |
+----------------+--------------+--------------+-----------------+----------------------+------------------------------+--------------------------+-----------------------+-------------+-------------------+--------------------------+---------------------------+--------------------+------------------------+---------------------------+----------------------+--------------------------+
10 row(s) fetched.
Elapsed 0.015 seconds.
```

```sql
> copy 'services' to 'services.parquet';
+-----------+
| count     |
+-----------+
| 135465619 |
+-----------+
1 row(s) fetched.
Elapsed 9.661 seconds.
```

### Additional context

I suspect that the issue is that 'services/services-2025-07.csv' and others has 20 columns while the other files have 17 columns and for some reason the CSV format is not adapting the schema the same way as the parquet can

```sql
> describe 'services/services-2024.csv';
+------------------------------+-----------+-------------+
| column_name                  | data_type | is_nullable |
+------------------------------+-----------+-------------+
| Service:RDT-ID               | Int64     | YES         |
| Service:Date                 | Date32    | YES         |
| Service:Type                 | Utf8      | YES         |
| Service:Company              | Utf8      | YES         |
| Service:Train number         | Int64     | YES         |
| Service:Completely cancelled | Boolean   | YES         |
| Service:Partly cancelled     | Boolean   | YES         |
| Service:Maximum delay        | Int64     | YES         |
| Stop:RDT-ID                  | Int64     | YES         |
| Stop:Station code            | Utf8      | YES         |
| Stop:Station name            | Utf8      | YES         |
| Stop:Arrival time            | Utf8      | YES         |
| Stop:Arrival delay           | Utf8      | YES         |
| Stop:Arrival cancelled       | Utf8      | YES         |
| Stop:Departure time          | Utf8      | YES         |
| Stop:Departure delay         | Utf8      | YES         |
| Stop:Departure cancelled     | Utf8      | YES         |
+------------------------------+-----------+-------------+
17 row(s) fetched.
Elapsed 0.008 seconds.

> describe 'services/services-2025-07.csv';
+------------------------------+-----------+-------------+
| column_name                  | data_type | is_nullable |
+------------------------------+-----------+-------------+
| Service:RDT-ID               | Int64     | YES         |
| Service:Date                 | Date32    | YES         |
| Service:Type                 | Utf8      | YES         |
| Service:Company              | Utf8      | YES         |
| Service:Train number         | Int64     | YES         |
| Service:Completely cancelled | Boolean   | YES         |
| Service:Partly cancelled     | Boolean   | YES         |
| Service:Maximum delay        | Int64     | YES         |
| Stop:RDT-ID                  | Int64     | YES         |
| Stop:Station code            | Utf8      | YES         |
| Stop:Station name            | Utf8      | YES         |
| Stop:Arrival time            | Utf8      | YES         |
| Stop:Arrival delay           | Utf8      | YES         |
| Stop:Arrival cancelled       | Utf8      | YES         |
| Stop:Departure time          | Utf8      | YES         |
| Stop:Departure delay         | Utf8      | YES         |
| Stop:Departure cancelled     | Utf8      | YES         |
| Stop:Platform change         | Boolean   | YES         | <-- New columns!
| Stop:Planned platform        | Utf8      | YES         |
| Stop:Actual platform         | Utf8      | YES         |
+------------------------------+-----------+-------------+
20 row(s) fetched.
Elapsed 0.005 seconds.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't read a directory of CSV files / CSV schema evolution: incorrect number of fields for line 1, expected 17 got 20 #17516

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can't read a directory of CSV files / CSV schema evolution: incorrect number of fields for line 1, expected 17 got 20 #17516

Description

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions