Extract PDF tables to CSV

Drop a PDF, pick the tables you want, get clean CSV. No upload, no page cap, no Pro tier. Works on text-based PDFs.

When you'd extract tables from a PDF

  • Invoices and receipts. Vendors send PDFs; finance teams want spreadsheets. Extracting line items into CSV lets you load them into accounting software, or tally totals without rekeying.
  • Bank and credit card statements. Statements are almost always PDFs. CSV extraction makes reconciliation, expense categorisation, and personal-finance imports possible.
  • Annual reports and financial statements. Public-company filings publish key tables in PDFs. CSV gets the numbers into a spreadsheet for analysis.
  • Government data published as PDFs. Surprisingly common for census tables, statistical bulletins, and policy documents. Extracting to CSV is the only way to do real analysis.
  • Research papers and scientific articles. Tables of results, parameter values, sample sizes. Extracting to CSV supports meta-analysis and reproducibility.

Worked example: a multi-page invoice

A two-page invoice with a line-item table:

Page 1
+------+----------+----------+--------+
| #    | Item     | Quantity | Total  |
+------+----------+----------+--------+
| 1    | Widget A |   3      | 87.00  |
| 2    | Widget B |   1      | 45.00  |
| ...  | ...      |   ...    | ...    |

Page 2
+------+----------+----------+--------+
| #    | Item     | Quantity | Total  |   <- header repeated
+------+----------+----------+--------+
| 14   | Widget Q |   2      | 18.00  |
| 15   | Widget R |   5      | 75.00  |

ExploreMyData detects that the header on page 2 matches page 1 and concatenates the rows. The CSV output:

#,Item,Quantity,Total
1,Widget A,3,87.00
2,Widget B,1,45.00
...
14,Widget Q,2,18.00
15,Widget R,5,75.00

Format-specific gotchas

  • Text-based vs scanned. If you can select text in the PDF, it's text-based and we can extract directly. If selecting just selects a rectangle (the page is an image), the PDF is scanned and needs OCR first. We don't run OCR; use a dedicated OCR tool then come back.
  • Multi-page tables. If the header repeats on each page (most invoices do this), we detect it and merge. If it doesn't, you'll need to manually mark continuation pages or extract per-page and concatenate.
  • Borderless tables. Tables without explicit grid lines work, but column detection relies on whitespace alignment. If the auto-detection puts a column boundary in the wrong place, drag it in the preview before exporting.
  • Multi-line cells. Cells that wrap onto multiple lines within one row are joined back into a single value with a space. Address fields and item descriptions often hit this.
  • Footers, page numbers, captions. Page-level text outside the table is ignored if it sits clearly outside the detected region. Sometimes a footer line gets pulled into the table; you can deselect it in the preview.
  • Right-aligned numbers. PDFs often right-align numeric columns. If the column boundaries are slightly off, decimals can drift into the next column. The preview lets you verify before exporting.

How this differs from the alternatives

  • vs Sejda. Browser PDF-to-Excel/CSV with two table modes and password support. Free tier capped at 10 pages, 50 MB, and 3 tasks per hour. ExploreMyData has none of those caps.
  • vs Convertio (pdf-csv). Generic file-conversion service that uploads server-side, 100 MB free cap, and OCR for scanned PDFs is gated behind a separate paid tool. We don't upload anything.
  • vs Zamzar (pdf-to-csv). Server-side conversion with a 50 MB free cap and a server round-trip; converted files are stored 24 hours on Zamzar's infrastructure. We process locally with no retention.
  • vs Smallpdf (pdf-to-excel). Polished UX, but OCR for scanned PDFs is a Pro feature, and the free tier throttles after a few daily conversions. ExploreMyData has no daily quota.
  • vs Tabula (CLI/desktop). Excellent open-source extractor for text-based PDFs. Requires Java installed and manual area/column flags per PDF. ExploreMyData runs in any browser without an install.
  • vs Camelot (Python). Strong Python library with two extraction modes (lattice for grid lines, stream for whitespace). Needs Python, Ghostscript, and OpenCV installed, plus per-PDF tuning. We cover the typical case point-and-click.
  • vs Adobe Acrobat Pro. Industry standard, with built-in OCR. Requires a subscription (around USD 20 per month) and a multi-hundred-megabyte install. ExploreMyData is free and runs in a browser, with the trade-off that we don't include OCR.

Frequently Asked Questions

Does it work on scanned PDFs?

ExploreMyData's PDF table extractor works on text-based PDFs (the kind where you can select text in your PDF reader). Scanned PDFs are essentially images and need OCR first; we don't include OCR. If your file is scanned, run it through an OCR tool first, then drop the resulting text-based PDF in here.

How does it handle tables that span multiple pages?

If a table continues across pages with the same header row repeated, ExploreMyData detects the repetition and concatenates the rows into one CSV. If the header isn't repeated, you can manually mark continuation pages in the page picker.

What if the table has no grid lines?

Borderless tables are detected by analysing horizontal and vertical whitespace between text. It's less reliable than tables with explicit lines, but it works for most invoices and statements. Misaligned columns can be fixed by dragging the column-boundary markers in the preview before exporting.

Does it support password-protected PDFs?

Yes. If a PDF asks for a password, you'll be prompted to enter it. The password and the file content stay in your browser; nothing is uploaded.

Is there a page or file size limit?

No fixed cap. The PDF is parsed locally; large files (hundreds of pages) work, with the practical ceiling being your browser's memory.

Can I extract a specific table from a page that has several?

Yes. The page preview shows every detected table with a checkbox; tick only the ones you want. You can also draw a region manually if the auto-detection misses one.

Extract a PDF table to CSV

Drop a text-based PDF, pick tables, get CSV. No upload, no page cap, no Pro tier.

Open the converter