Skip to content

CSV output doesn't preserve order of columns in SELECT queries #65

@cameronfreer

Description

@cameronfreer

When @LuluIto and I run inferenceql.query.main -o csv, the output CSV is in an order that doesn't match the order returned by the SELECT queries in question (and specifying the order of columns explicitly rather than using SELECT * doesn't seem to help).

For example, with this query:

SELECT * FROM GENERATE race, gender, education, religion, registered_to_vote, party_allegiance, age, policy_support_ban_assault_rifles, used_social_media_in_past_24hrs, finished_school_last_year, received_a_raise_last_year, mental_health_status, policy_support_prohbit_all_abortions_after_week_20 UNDER lpm GIVEN race = "White" AND gender = "male" AND education = "(e) 4-year college" AND religion = "Roman Catholic" AND registered_to_vote = "yes" AND party_allegiance = "Repulican" AND age = 70 LIMIT 1

the command
inferenceql.query.main --table data=stlouis.csv --db db.edn --lang permissive
yields

|  race | gender |          education |       religion | registered_to_vote | party_allegiance | age | policy_support_ban_assault_rifles | used_social_media_in_past_24hrs | finished_school_last_year | received_a_raise_last_year | mental_health_status | policy_support_prohbit_all_abortions_after_week_20 |
|-------+--------+--------------------+----------------+--------------------+------------------+-----+-----------------------------------+---------------------------------+---------------------------+----------------------------+----------------------+----------------------------------------------------|
| White |   male | (e) 4-year college | Roman Catholic |                yes |        Repulican |  70 |                               yes |                             yes |                        no |                         no |             (c) Good |                                                yes |

as expected, while
inferenceql.query.main --table data=stlouis.csv --db db.edn --lang permissive -o csv
yields

gender,mental_health_status,policy_support_prohbit_all_abortions_after_week_20,religion,received_a_raise_last_year,age,registered_to_vote,used_social_media_in_past_24hrs,race,party_allegiance,finished_school_last_year,policy_support_ban_assault_rifles,education
male,(c) Good,no,Roman Catholic,no,70,yes,yes,White,Repulican,no,yes,(e) 4-year college

Changing the query to be explicit about order like this doesn't help:

SELECT race, gender, education, religion, registered_to_vote, party_allegiance, age, policy_support_ban_assault_rifles, used_social_media_in_past_24hrs, finished_school_last_year, received_a_raise_last_year, mental_health_status, policy_support_prohbit_all_abortions_after_week_20 FROM GENERATE race, gender, education, religion, registered_to_vote, party_allegiance, age, policy_support_ban_assault_rifles, used_social_media_in_past_24hrs, finished_school_last_year, received_a_raise_last_year, mental_health_status, policy_support_prohbit_all_abortions_after_week_20 UNDER lpm GIVEN race = "White" AND gender = "male" AND education = "(e) 4-year college" AND religion = "Roman Catholic" AND registered_to_vote = "yes" AND party_allegiance = "Repulican" AND age = 70 LIMIT 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions