Skip to content

fix: Fix sequence item 2: expected str instance, NoneType found exception when table output is set to markdown.#27

Merged
Filimoa merged 3 commits intoFilimoa:mainfrom
ic-xu:main
Apr 19, 2024
Merged

fix: Fix sequence item 2: expected str instance, NoneType found exception when table output is set to markdown.#27
Filimoa merged 3 commits intoFilimoa:mainfrom
ic-xu:main

Conversation

@ic-xu
Copy link

@ic-xu ic-xu commented Apr 19, 2024

behavior:

I get an exception as follows:

/python3.10/site-packages/openparse/tables/pymupdf/parse.py", line 25, in output_to_markdown
 markdown_output = "| " + " | ".join(headers) + " |\n"
TypeError: sequence item 2: expected str instance, NoneType found

When parsing PDF tables, the output format is set to

table_args={
 "parsing_algorithm": "pymupdf",
 "table_output_format": "markdown"
 }

After analysis, I found that the reason may be the following:
When the headers of the table are:

header = ['(See Note 11)', '', None, None]

Then execute the following code

 markdown_output = "| " + " | ".join(headers) + " |\n"
 markdown_output += "|---" * len(headers) + "|\n"

You will get the following error

/python3.10/site-packages/openparse/tables/pymupdf/parse.py", line 25, in output_to_markdown
 markdown_output = "| " + " | ".join(headers) + " |\n"
TypeError: sequence item 2: expected str instance, NoneType found

So my solution is to replace None with ' ' to solve this problem

@Filimoa Filimoa merged commit 106465d into Filimoa:main Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants