Monday, October 21, 2013

R and head() and tail()

So, if you're not careful using R's head() and tail() commands, you'll end up with a little surprise. Perhaps I should say not careful reading the documentation.

Head() and tail() do not return just one item from the list (or whatever), they return several. So head does not mean first, and tail does not mean last.

Read carefully: "Returns the first or last parts of a vector, matrix, table, data frame or function" [Italics added.] PARTS. Plural. An 's' on there.

Example:

our_list = list(3, 7)       # Makes a list with two items, the first is 3, the second is 7. Integers.


Note I used "=" since if I use a "less than" bracket, Google blogger freaks out. (The typical R code is "less than" followed by a dash, which make an arrow, representing "gets", the left side gets [is given] the right side.) It's giving me a hard time with formatting as it is.
If you type in "our_list", R will print our_list:

[[1]]
[1] 3

[[2]]
[1] 7

So, the first item in our_list [[1]] has one item [1], which is a 3.
The second item in our_list [[2]] has one item [1], which is a 7.

I don't fully understand the difference between [[x]] and [x], it seems mysterious.
Edit: The R Inferno, 8.1.54.... aha. Still is mysterious, though.

If you type:
head(our_list)


...it would be nice to get just the head, that is, the first item. But no, you get the whole list (since the list is small you get the entire lists, larger lists would only return the first few items).

What you want is:
head(our_list, n=1)

...where the 'n' gives the value of how many items you want. (You don't actually need the n=, I have noticed.)
When I try "n=3" for this two item list, it just gives the first two items (i.e., the entire list in this case) and does not give an error.

Note I made our_list have [[1]] == 3 and [[2]] == 7 since far too often [[1]] == 1 and [[2]] == 2 and really people that's just not clear. If you're trying to make a useful example, don't make it where the same symbols (1, 2) are being used to represent widely different things.

Also, Googling for info on R's "by" command is just impossible, as "r by" is not a specific enough search string (in-context it's fine though). That's why I like books (yes paper) sometimes, if the index is any good, there you go.