Monday, November 19, 2012

R and Date Formats

There are many ways to represent dates in computer languages. You can have something human-readable, with a year, month, and day, in some order and some combination of text and numbers.

  • 2012-11-19, year month day.
  • 11/19/12, US month day year.
  • 11/12/11, Nov 12 US or Dec 11 EU.
  • Nov-19-2012, a combination of text and numbers, to a human.
  • "2012-11-19", which is probably text to a computer given the quotation marks, but numbers to a human.
  • 1353340800, a big number that means nothing to most people, but is today at 4pm in seconds since Jan 1, 1970. This is a standard time format, I've found.
We will gloss over leap years, leap seconds, and different historical calendar systems, but it's all a fascinating and nuanced topic.

But I ran into a huge (yet small) problem with time formats in R, and trying to convert from seconds since 1-1-1970 to some current dates, where I had 45,000 data points. If I tried to analyze them, they ended up as 45,000 different points and didn't aggregate at all. Hard to visualize. So I wanted to aggregate them all by month, and instead of 45,000 points I would have, say 120 (10 years, 12 months each). I'm taking averages, and this approach makes sense. This turned out to be harder than I thought -- essentially I was just trimming everything but the year and month (days, hours, minutes, seconds). Simple!

Edit: I'm including a screen shot of the code, so Google doesn't destroy it (due to the characters in it and how the R code is interpreted by the Google HTML parser).
"the_df" is the data frame.
"cm" is creation month.
"bday" is the seconds since 1-1-1970.


The "as POSIX" call converts the number of seconds as a number to number of seconds as time.
The "strftime" strips off the day (off the end) and makes it a text string (string-from-time, I think).
The paste is the key part, it won't work without it, you have to add the day back on (the 1st of the month for this analysis). I think this makes it a string value (not a date).
Then the "as date" call converts it to a date from a text string.
(Or something generally like that.)

The point is, if you use "as.Date" in R, you have to hand it a day. I tried just handing it year-month, but that ends up as NA, even though I can't find anything that says that kind of thing shouldn't work.



Friday, November 9, 2012

Election Numbers and Massachusetts

(I could reframe this in the following manner: The most telling vote of the election... was it how women voted? How minorities voted? Latinos and Hispanics? Regions? White people? Evangelicals? No. The most telling vote was instead from one small state in the corner of America: Massachusetts.)

Perhaps the most telling numbers from the election come from Massachusetts--the only state to have direct experience with both President Obama and Mitt Romney as chief executives. Massachusetts voters had four years of Mitt Romney as governor, and also had four years of President Obama as US President. Given this unparalleled experience, how did they vote?

The results are clear.

Massachusetts

Obama1,900,57560.8%
Romney1,177,37037.6%

That's a one-sided endorsement. (Percentages do not add to 100% because I didn't include the "Other" votes which are part of the total.)