Thursday, April 14, 2016

For A Decent CSV Spreadsheet App

All I want is a decent spreadsheet app that does not insist on mangling my CSV files, which often have ID numbers in them which I might want to view as text and not numbers. Apple's Numbers is maddening (you have to export to CSV, extra steps, and it has a relatively low row limit, 65,535 I believe) and Microsoft's Excel is a little better but I'll use it as an example here of What You See Is Not What You Get.

I am doing some work on cities and (county-level) FIPS codes (so, in the US, FIPS codes are Federal level identifiers useful for a lot of things, they identify counties). Some cities are large and lie in more than one county. Some of the data I have deals with cities, and the income data is on the county level, so I need to map from cities to county FIPS.

Excel did not make this easy.

The file I grabbed off the net to help me map cities to FIPS (counties) quite correctly listed all the appropriate FIPS codes for each city. I needed to narrow this down to one (Wikipedia helped a lot, the geopolitical Wikipedians are nitpickers).

FIPS codes for counties have two parts, two leading digits for the state and then three digits for the county. So all FIPS codes that start with 36, for instance, are counties in New York state.

The format from my source file looked like this:

Raleigh, NC:    37063,183
Birmingham, AL: 01073,117
New York, NY:   36005,047,061,081,085

(I am pretty sure those 5 numbers for NYC are the 5 boroughs, I know Brooklyn is its own county, Kings county.)

Excel, however, would show the following in the main view, interpreting these IDs as numbers--errors are in the parentheses, A, B, and C:
Raleigh, NC:    37,063,183 (A)
Birmingham, AL: 1,073,117 (A,B)
New York, NY:   36,005,047,061,081,000 (A,C)

Errors:
  1. Added a comma that isn't there.
  2. Dropped leading zero.
  3. Rounded rightside digits.
So there are at least three issues there, but the most difficult one is that it put a comma in after the two digits for the state, initially making me think that indeed the source file had a comma after the state component of the FIPS code. It did not. Parsing the file did not work.

That was all extremely infuriating, and reminded me of Microsoft's Clippy, where the coders thought they always knew better than you. Granted, a lot of apps and even programming language packages try to be smart and guess formats, and yes this can be useful. But if there are leading zeros and commas in odd places (or not) and it's a CSV (text) file, there could be a default "read CSV as text". Of course it seems that neither of these two programs have been coded to play nice with CSV files.

As such, they are not overly useful data science tools.