Header creates several extra columns for the rest of the table #1168
Replies: 3 comments 1 reply
-
Thanks for the kind words, @schoodic100! Depending on how consistent your PDFs/tables are, it might help to:
Does that work? If not, I can take a closer look at the specific PDF. (Thank you for attaching it.) |
Beta Was this translation helpful? Give feedback.
-
Thanks @jsvine. Yes I was able to swipe some code from another discussion post to deal with the "fake" rectangles. It also gave me a better insight into the various tactics that one can employ. Impressive stuff. |
Beta Was this translation helpful? Give feedback.
-
@dimavologzh here is what I more or less used to exclude "fake" rectangles. |
Beta Was this translation helpful? Give feedback.
-
Howdy,
Your tool is so good. For someone that has suffered through PDF extraction for a decade, I am so happy to have found your library.
This is most likely a newbie question, but I thought you might have a quick answer that would be hugely helpful. I am attaching a very long PDF along with the "display" from PDFPlumber for one of the pages (they all are the same, more or less). It appears that the majority of the table is getting thrown off by the column headings adding several "blank" columns to the output.
Is there a strategy for getting better behavior? I have played with table settings a bit but haven't found a combination that works too well.
Same output:
[['Prescription Drug Name', None, '', 'Drug', '', '', 'Coverage', ''], [None, None, None, 'Tier', None, None, 'Requirements/Limits', None], ['', 'ANTI-INFECTIVE AGENTS', None, None, None, None, None, ''], ['', 'ANTHELMINTICS', None, None, None, None, None, ''], ['albendazole tabs 200 mg', None, '1', None, None, '', None, None], ['BILTRICIDE TABS 600 MG [praziquantel]', None, '2', None, None, '', None, None], ['ivermectin tabs 3 mg', None, '1', None, None, '', None, None].....
Ideally the output would be (one drug example): ['ampicillin sodium solr 1 gm','1', 'MB']
What is being produced is TOTALLY usable, but I thought perhaps there was something I was missing to make it even better.
Thanks and have a great day,
Jeff
Kaiser_Commercial.pdf
Beta Was this translation helpful? Give feedback.
All reactions