Reading Data From a Spreadsheet with Roo

Have you ever been given a spreadsheet full of data that you need to use in an application? Were you super not excited about copying and pasting the data from each cell for 2 hours? The roo gem can be your bff when you have to read data from a spreadsheet.

What's This Roo Business?

Roo is a ruby gem that gives you a simple interface for reading data from a spreadsheet in .ods, .xls, .xlsx, and Google Doc spreadsheet format (and writing data in the case of a Google Docs spreadsheet). Cell data types are recognized, data can be output in YAML and CSV format, and formulas are recognized in .odt and Google doc spreadsheets. Roo is available on RubyForge and Thomas Preymesser is the gem maintainer. Note that if you come across the GitHub repository page, that repository is no longer maintaind by Hugh McGowan as mentioned in this issue comment.

Using Roo

Using Roo is fairly straightforward. First install the gem using:

gem install roo

Then, hopefully you can use the following example and the references section as a starting point for working with Roo. For this example, we'll use a stripped down version of the A List Apart 2010 Web Design Survey data. The raw data is kindly made available for further analysis; we'll just use it as sample data for our example. And instead of using the whole spreadsheet (which is relatively huge), we'll just use the first six columns and the first one hundred rows.

The below example is super simple; it reads data from an Excel spreadsheet and writes the data to standard out. Instead of printing data to standard out, you could of course do something more interesting like read data from a client-provided Excel spreadsheet, write the data out to YAML files, and use those files to populate HTML email templates built with Nanoc (i.e. what I used Roo for).

We start by opening the spreadsheet and setting the default worksheet we want to work with using the Excel.new class method. Then the sheets property gives us access to all the worksheets in the spreadsheet in an array. The sample spreadsheet has only one worksheet, so we set the default worksheet to the first worksheet in the array using sheets[0].

Next we create a hash of the column position and the header names to make it somewhat easier to work with the data. This hash will let us ask for data from the fields/columns in each row using the column headings from the first row. We can then iterate over all the rows in the spreadsheet using first_row and last_row methods. We iterate over the number of rows in the spreadsheet passing the row block variable to the row method, which returns an array of all the columns in the row. We use the headings hash we created earlier to access the columns by the heading name, and finally, we print the data to standard out.

References and Further Reading

  • Roo. Retrieved October 10, 2011.
    The project home page on RubyForge. There's a nice walkthrough of many of the methods and properties provided by Roo, and there is a similar reading data from a spreadsheet and printing the data to standard out example, though it's implemented in a slightly different way. There is also a discussion about cell data types, formulas, and writing data to Google Doc spreadsheets. The most detailed discussion on using Roo.
  • ruby-roo | Google Groups. Retrieved November 13, 2011.
    Google group for support using Roo. There's a lot of a good information here, so if you're having trouble with Roo, you can browse/search the archive, or post your own question. Preymesser, the gem author is active on the list.
  • roo-1.2.3 Documentation. Retrieved November 10, 2011.
    The Roo API documentation. Check here for more detailed information on the methods and properites provided by Roo.
  • Documentation spreadsheet 0.6.5.9. Retrieved November 13, 2011.
    Project home page for the spreadsheet gem, an alternate gem for working with Excel spreadsheets (support for Open Office and Google Doc spreadsheets is not yet available).
  • Mullet, David. RUBY ON WINDOWS. Retrieved November 15, 2011.
    David Mullet's blog on using Ruby on the Windows platform. A lot of great code samples for doing various tasks with Microsof Word, Excel, Outlook, etc. Has a short post on using Roo to read Excel spreadsheets, but also has a number of posts on using Win32OLE to work with Excel. Win32OLE comes with Ruby as of version 1.8, though it is not included if you're on Linux.
  • Sulc, David. Using Ruby and Win32OLE to manipulate Excel. Retrieved November 10, 2011.
    The first in a series of posts using Win32OLE to work with Excel documents. Along with Mullet's blog, we can see that Win32OLE provides more funcationality than Roo, but the big drawback is that you have to be on Windows for it to work (though there are some options like Wine if you are on Linux).