Sunday, January 25, 2015

Random R Notes - Factors, Rank v. Order, Unsplit

Some R issues I have run into recently...

I split a dataframe, then split it again, and the analysis was taking forever. Something was wrong. After inspection, the sub-split DF had all the factor levels from the original DF! Terrible since there were about 25,000 in one variable. It just took forever, which I don't think it should but hey I gave up on it.

I needed droplevels. You can apply it to the whole DF, removing unused levels from your old DF and then assign it (to a new one or just write over the old one).

your_new_df = droplevels(your_old_df)

(Google cannot handle the less than sign, used for "get" in R instead of =, with either the code tag or the pre tag, it just blows it up. Annoying.)

Then I could run the ordering code on dates. But no: there is, I learned, order and there is rank. There is also sort but I managed to avoid that somehow, so I won't discuss that here.

Note that for rank you need to figure out what to do with ties! (That is, when values are equal, how to rank them exactly.)

There was a really great post about it on Stackoverflow but I can't find it at the moment. This post might help, though.

Or, I made a nice little example! I use R: as the start of the input lines since the greater than symbol and blogger are not friends.

R: the_list = c('A', 'D', 'B', 'C')

R: order(the_list)

[1] 1 3 4 2

R; rank(the_list)

[1] 1 4 2 3


So, you see the two outputs are different.
Order says, put the first element first, then the third element would come next (B), then the fourth element (C), then the second element (D).
Rank says, the first element is first, the second element (D) is the fourth of them all, the third element (B) is the second overall, and the fourth element (C) is the third overall.

So after that, I wanted to unsplit. But, no, I had added the rank column, so instead of 400 rows I only got 100 (I had 100 df's with 4 rows each). Unsplit does not work well (or at all?) if you add (or subtract?) items. So stackoverflow told me I needed do.call and rbind ("row bind").

rejoined_df = do.call(rbind, splitted_df)


Note that the splitted_df is the result of the split() call, which I'm not showing, so is not actually a DF, I think it's a list of DFs maybe. Or maybe it's not a typical DF. But you can call it directly, and if you use split you should familiarize yourself a bit with the resulting object.

There you have it, some random R notes.

Tuesday, January 20, 2015

Apple's Audio Outputs and #Fail

If you want to stream sound on your Mac to your stereo jack speakers by the computer and to your Apple Airport Express with its audio, you can't if your sound stream isn't through iTunes and is, say, via the web (like with Pandora in my case). Not at the same time, regardless of what the web says. The stereo jack output doesn't work that way, apparently. (This does work quite well if your sound source is iTunes, note.)

A lot of sites say you can do this generally, and have detailed explanations about how you make a new device through the Audio Midi Setup app. An article at MacWorld hints at the problem but isn't at all clear: "Let’s say you have an Airplay device plus a USB, ethernet, or Firewire audio interface attached to your Mac..." Right. Note the article doesn't mention the stereo out port (the one that matches your little headphones on your iPhone, Android, or oldschool Walkman), because it won't work.

The best I got was stereo on my Mac and one horribly noisy speaker on the Airport Express (through a receiver). The easy solution, although I haven't actually tried it, is to use one of the other sound outs on the Mac (probably the monitor over Firewire) instead of the speakers, but I like my speakers here. The other, better and more expensive solution, is to go with Sonos -- a friend of mine has some Sonos devices and it is an awesome setup, very easy to use and understand, streams from lots of sources, and the app on the iPhone seems great to me.

Update: You can make an "aggregate" device or a "multi-output" device, and neither works for me at all. When I tried using the Airport and the Firewire monitor speakers, which seemed like it should work via all the online help, it didn't -- the best I got was one of the two speakers on the receiver via the Airport (both of which work fine through iTunes or System Preferences when the Airport is the only sound out device) and the monitor speakers had a large amount of noise (a hiss, to be specific). This is unbelievable. Apple is usually really good about making simple things easy, but not here. Weird given how great NeXT hardware and software were at sound.

Thursday, January 8, 2015

Amazon's Data Fail

So, I believe I have been an Amazon customer for well over ten years -- at least eight since I moved here to NYC, and I ordered stuff from them prior to that.

However, Amazon continues to ask me if I want the college student discount almost every time I check out. This is absurd. And there is no way to toggle it off, I had to get customer support in chat and not even he could do it, he had to bump it up to his supervisor.

Given all the data Amazon has on me, they should know better. It's not just a question of an algorithm, someone -- a team most likely -- was in charge of the implementation here, not just of the algorithm but of the page and its features.

They have the data, and the feature knowledge, yet they failed tremendously anyways. This is not the "Target knew a woman was pregnant before her parents did!" story.

Here is the actual screenshot I took, this is not some random illustrative image I grabbed from somewhere else on the web:

Update, Jan 21: Almost two full weeks later and I got the page again. Amazing and pathetic.

Friday, November 14, 2014

Marshmallows

A great take on the children, restraint, and marshmallows experiment over at Math Babe's blog. She does great work and everyone should learn from her.

All the Data? Yes.

Been quiet here since I took a three-month post over at Democracy Works for the Voting Information Project (you know, twelve hour days, weekends, all that, which still means a lot easier hours-wise than being a professor). It's an ongoing, yet outwardly an election-year, project with Google and the Pew Charitable Trust to help get voters information about when and where to vote as well as ballot information (ok technically that's the Ballot Information Project, but it's all bundled together into one end product).

So we had to get.... ALL THE DATA... for every polling location in the 50 US states and DC. Some states partner with the project, but for the ones that don't we had to make lots of web visits to county websites and phone calls to county clerks. LOTS.

It was awesome. But, it took all my time. And, you can't even see it anymore! If you Googled something to the effect of "where do I vote" you'd get an infobox, and could enter your address, geocoding magic would happen, and you could get a map to your polling location!

Awesome.


Sunday, September 7, 2014

Horrible Web Ads

I am tired of horrible web ads, I am tired of women's images being used to manipulate curiosity and click throughs, and I am tired of pathetically transparent false geo-targeting, but it's kind of funny when it doesn't work.


And misuse of quotes. 'Rattled', what does that even mean?

Saturday, August 30, 2014

(Mis)Information Propagation

"The One Dirty Little Secret About The Web You Don't Know!" Of course you probably do know it, and that's an intentionally horrible click-bait line. The reason you see Wikipedia's content scraped and represented in so many places is because people are too lazy to do the actual work required to do whatever it is they are trying to do (usually just make a buck). But this gets interesting, slightly, when the information is wrong.

My main, and it was going to be the sole, example was regarding the gas station / service station that does not exist around the corner from me here in Brooklyn, to which I was alerted by Apple maps. This is why I don't use Apple maps. I have reported it several times, and it is still there. They could use Google Street View to see very easily that there is no service station at that location (or anywhere near it), but whoever maintains that information does not care. Not at all. But first a quick few paragraphs about the iconic "ironworkers on a beam high above NYC" photo.

The Smithsonian Magazine, and note I love and respect the museum and a good friend of mine works at the museum, has an article about the photo. Which is great, except it's completely wrong. The man on the right of the beam isn't Patrick "Sonny" Glynn. "Pat Glynn is also the source for the identity of this worker, who he claims is his father, Patrick 'Sonny' Glynn." It's the grandfather of some friends of mine who I have known for over 30 years. (And if you look closely you can see he's missing part of a finger which he lost in a construction accident.)

To say that "for 80 years, the 11 ironworkers in the iconic photo have remained unknown" is horrible, because it's overselling, hype, and completely untrue. Just because the general public didn't know who those men were doesn't at all mean they were unknown. Just because there wasn't a source for the information didn't mean it was unknown. What does it mean to be unknown? By whom? Who gets to count as knowing?

As far as I can tell, the Smithsonian has not corrected this article at all, which is disappointing. However there was a fair amount of press one can find online from the same time, which I believe came about because a movie about the photo was released then.

Which brings us to Apple maps and the service station that is not in my neighborhood.

Here's the Apple maps image, from August 29th, 2014:



Yes the blue circle is approximately me. Note the "7th Ave Performance Center", at 121 7th ave there. There is no such commercial establishment there.

But the internet, well ok Google, will tell you there is:
(They're all purple because I clicked them.) These are the top ten results (somehow out of almost 5 million results, which makes no sense whatsoever). Nine of them are completely wrong. Only the second one gets it right (and I looked at this a few years ago when I had some small hope for Apple maps), because there is probably a service station down at 7121 7th avenue -- somewhere along the way, the leading 7 on the street address got lost, and site after site unthinkingly copies the error. (The second link there -- and I know it's a screenshot here -- also has the zip code correct.)

Which brings us to my overall annoyance. All these sites are just copying information. They don't particularly care if it's correct. That's really pathetic. Alright I have once again submitted it as an error, maybe I'll see one day if they correct it.