If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful this is — you can’t easily copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data in CSV format, through a simple interface. And now you can download Tabula and run it on your own computer, like you would with OpenRefine.
1. Download the version of Tabula for your operating system:
2. Extract the zip file using the file extractor of your choice (such as 7-zip).
3. Go into the folder you just extracted. Run the "Tabula" program inside.
4. In your web browser, go to http://localhost:8080/ . (This should automatically happen, actually.) There's Tabula!
5. Upload a file of your choice. Select a section of a table, and go.
1. Upload a file with tables you would like to copy.
2. Draw a box around the area of the table you would like to copy.
(Note: currently, Tabula can't select tables over multiple pages)
3. You will be given the option to copy the table as a CSV (comma-separated values) file or download the CSV or TSV (tab separated values). If you notice any errors in the table, you can make text edits to the selected text before copying or downloading.
4. Now you can work with your data in a spreadsheet or text file rather than a PDF!
Note: Tabula only works on text-based PDFs at this time, not scanned documents.
Want to contribute? Fork it on GitHub and check out the to-do list for ideas.