Monday, November 19, 2012

R and Date Formats

There are many ways to represent dates in computer languages. You can have something human-readable, with a year, month, and day, in some order and some combination of text and numbers.

  • 2012-11-19, year month day.
  • 11/19/12, US month day year.
  • 11/12/11, Nov 12 US or Dec 11 EU.
  • Nov-19-2012, a combination of text and numbers, to a human.
  • "2012-11-19", which is probably text to a computer given the quotation marks, but numbers to a human.
  • 1353340800, a big number that means nothing to most people, but is today at 4pm in seconds since Jan 1, 1970. This is a standard time format, I've found.
We will gloss over leap years, leap seconds, and different historical calendar systems, but it's all a fascinating and nuanced topic.

But I ran into a huge (yet small) problem with time formats in R, and trying to convert from seconds since 1-1-1970 to some current dates, where I had 45,000 data points. If I tried to analyze them, they ended up as 45,000 different points and didn't aggregate at all. Hard to visualize. So I wanted to aggregate them all by month, and instead of 45,000 points I would have, say 120 (10 years, 12 months each). I'm taking averages, and this approach makes sense. This turned out to be harder than I thought -- essentially I was just trimming everything but the year and month (days, hours, minutes, seconds). Simple!

Edit: I'm including a screen shot of the code, so Google doesn't destroy it (due to the characters in it and how the R code is interpreted by the Google HTML parser).
"the_df" is the data frame.
"cm" is creation month.
"bday" is the seconds since 1-1-1970.


The "as POSIX" call converts the number of seconds as a number to number of seconds as time.
The "strftime" strips off the day (off the end) and makes it a text string (string-from-time, I think).
The paste is the key part, it won't work without it, you have to add the day back on (the 1st of the month for this analysis). I think this makes it a string value (not a date).
Then the "as date" call converts it to a date from a text string.
(Or something generally like that.)

The point is, if you use "as.Date" in R, you have to hand it a day. I tried just handing it year-month, but that ends up as NA, even though I can't find anything that says that kind of thing shouldn't work.