Tabula

Tabula is a tool for liberating data tables trapped inside PDF files.

View the Project on GitHub jazzido/tabula

Current Version: 0.9.3 (Jan 18 2014)

Other Versions: pre-releases & archives

Why Tabula?

If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful this is — you can’t easily copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data in CSV format, through a simple interface. And now you can download Tabula and run it on your own computer, like you would with OpenRefine.

Download and install Tabula

Note: You’ll need a copy of Java installed. You can download Java here.

1. Download the version of Tabula for your operating system:

2. Extract the zip file using the file extractor of your choice (such as 7-zip).

3. Go into the folder you just extracted. Run the "Tabula" program inside.

4. In your web browser, go to http://localhost:8080/ . (This should automatically happen, actually.) There's Tabula!

5. Upload a file of your choice. Select a section of a table, and go.

Using Tabula

1. Upload a file with tables you would like to copy.

2. Draw a box around the area of the table you would like to copy.
(Note: currently, Tabula can't select tables over multiple pages)

3. You will be given the option to copy the table as a CSV (comma-separated values) file or download the CSV or TSV (tab separated values). If you notice any errors in the table, you can make text edits to the selected text before copying or downloading.

4. Now you can work with your data in a spreadsheet or text file rather than a PDF!

Note: Tabula only works on text-based PDFs at this time, not scanned documents.

Authors and Contributors

Tabula was created by Manuel Aristarán with the help of ProPublica, La Nación DATA and Knight-Mozilla OpenNews.

Want to contribute? Fork it on GitHub and check out the to-do list for ideas.

Learn more about this project on Source: "Introducing Tabula"