Friday, November 14, 2014


A great take on the children, restraint, and marshmallows experiment over at Math Babe's blog. She does great work and everyone should learn from her.

All the Data? Yes.

Been quiet here since I took a three-month post over at Democracy Works for the Voting Information Project (you know, twelve hour days, weekends, all that, which still means a lot easier hours-wise than being a professor). It's an ongoing, yet outwardly an election-year, project with Google and the Pew Charitable Trust to help get voters information about when and where to vote as well as ballot information (ok technically that's the Ballot Information Project, but it's all bundled together into one end product).

So we had to get.... ALL THE DATA... for every polling location in the 50 US states and DC. Some states partner with the project, but for the ones that don't we had to make lots of web visits to county websites and phone calls to county clerks. LOTS.

It was awesome. But, it took all my time. And, you can't even see it anymore! If you Googled something to the effect of "where do I vote" you'd get an infobox, and could enter your address, geocoding magic would happen, and you could get a map to your polling location!


Sunday, September 7, 2014

Horrible Web Ads

I am tired of horrible web ads, I am tired of women's images being used to manipulate curiosity and click throughs, and I am tired of pathetically transparent false geo-targeting, but it's kind of funny when it doesn't work.

And misuse of quotes. 'Rattled', what does that even mean?

Saturday, August 30, 2014

(Mis)Information Propagation

"The One Dirty Little Secret About The Web You Don't Know!" Of course you probably do know it, and that's an intentionally horrible click-bait line. The reason you see Wikipedia's content scraped and represented in so many places is because people are too lazy to do the actual work required to do whatever it is they are trying to do (usually just make a buck). But this gets interesting, slightly, when the information is wrong.

My main, and it was going to be the sole, example was regarding the gas station / service station that does not exist around the corner from me here in Brooklyn, to which I was alerted by Apple maps. This is why I don't use Apple maps. I have reported it several times, and it is still there. They could use Google Street View to see very easily that there is no service station at that location (or anywhere near it), but whoever maintains that information does not care. Not at all. But first a quick few paragraphs about the iconic "ironworkers on a beam high above NYC" photo.

The Smithsonian Magazine, and note I love and respect the museum and a good friend of mine works at the museum, has an article about the photo. Which is great, except it's completely wrong. The man on the right of the beam isn't Patrick "Sonny" Glynn. "Pat Glynn is also the source for the identity of this worker, who he claims is his father, Patrick 'Sonny' Glynn." It's the grandfather of some friends of mine who I have known for over 30 years. (And if you look closely you can see he's missing part of a finger which he lost in a construction accident.)

To say that "for 80 years, the 11 ironworkers in the iconic photo have remained unknown" is horrible, because it's overselling, hype, and completely untrue. Just because the general public didn't know who those men were doesn't at all mean they were unknown. Just because there wasn't a source for the information didn't mean it was unknown. What does it mean to be unknown? By whom? Who gets to count as knowing?

As far as I can tell, the Smithsonian has not corrected this article at all, which is disappointing. However there was a fair amount of press one can find online from the same time, which I believe came about because a movie about the photo was released then.

Which brings us to Apple maps and the service station that is not in my neighborhood.

Here's the Apple maps image, from August 29th, 2014:

Yes the blue circle is approximately me. Note the "7th Ave Performance Center", at 121 7th ave there. There is no such commercial establishment there.

But the internet, well ok Google, will tell you there is:
(They're all purple because I clicked them.) These are the top ten results (somehow out of almost 5 million results, which makes no sense whatsoever). Nine of them are completely wrong. Only the second one gets it right (and I looked at this a few years ago when I had some small hope for Apple maps), because there is probably a service station down at 7121 7th avenue -- somewhere along the way, the leading 7 on the street address got lost, and site after site unthinkingly copies the error. (The second link there -- and I know it's a screenshot here -- also has the zip code correct.)

Which brings us to my overall annoyance. All these sites are just copying information. They don't particularly care if it's correct. That's really pathetic. Alright I have once again submitted it as an error, maybe I'll see one day if they correct it.

When Google Was Better

Google used to be about finding information for people (search), now it's about finding information on people (advertising).

Facebook used to be about keeping track of your friends (social), now it's about keeping track of your habits (advertising).

tldr: Public sphere, corruption thereof by advertising, Habermas. (That was a very heavily coded sentence using an academic concept and its author.)

Saturday, August 2, 2014

Not Ok, Cupid

Regarding the recent Ok Cupid "study", there's a nice piece about both it and the horrible Facebook study you can read over at Kottke. One thing I like is that it essentially discusses the community in which FB posters exist, which is something I found an important and overlooked issue.

"It's not A/B testing. It's just being an asshole."

Thursday, July 31, 2014

Gephi and Importing an Edges List

So, I wanted to import a CSV I had made in Python into Gephi. Easy. But, no. Gephi wasn't importing it correctly. File: Open wasn't reading the file correctly, the headers were being counted as nodes, and the weights were also being counted as nodes.

To the Internets! They were horrible. They all said to use the "Import Spreadsheet" button in the Data Table tab in the Data Laboratory. I didn't have such a button in the Data Laboratory. Bastards.

Mine looked like this:

As you can see there are no buttons with "Import Spreadsheet" there at all.

The Internets (like here, here, and the wiki page) kept telling me it should look like this:

BUT IT DIDN'T. Come on. Killin' me.

Solution: File: New Project. (Or command-N.) Really. That's all.

So I tried to fix the wiki page about it, but you have to have an account and I don't see anywhere to make one. So I tried the contact for the main community person, but the link to his page went nowhere, and the link for the American community guy went to some old generic page that has been retired. (I can't easily find those pages now and it's not worth the time.)

Not useful, people, not useful. A shame too, since Gephi is fantastic.

Friday, July 11, 2014

Python, Multiprocessing, and Queues

Here's a little bit on what I've been working on lately, using Python's multiprocessing package to use the multiple cores on my machine but also using queues (in the multiprocessing package) to gather results from the processes. I'm writing it up because I found some info on queues that wasn't particularly helpful and in one case was just wrong.

So, basics first:

import multiprocessing

Ok good, got that step out of the way. Some things you will want:

num_cores = multiprocessing.cpu_count()

That gives you an integer of how many cores you have on your machine. If you have a newer machine where the cores are so awesome that they virtualize themselves into two cores each, it will return the virtual number (so, on my 6-core [hardware] Mac Pro it returns 12 [virtual!], which is twice as awesome as 6).

The multiprocessing seems a little weird at first. Here's some code:

proc_list = list()
    # This list is so you can "join" them. Important!
my_queue = multiprocessing.Queue()
   # Queues do q.put() and q.get() and other things. 

for a_range in chunk_ranges: 
      # The number of chunks in my code equals the number of processes.
      # "chunk_ranges" is a list of lists, [[index_a, index_b], ...]
      # indexes in a Pandas DF I am parsing. Easy to parallelize.
      # Your for loop will differ, depending on your data.
  a_proc = multiprocessing.Process(target=your_worker_function,
    args=(a_range, the_data, my_queue))
      # To your worker function, pass the args -- here it's the data range,
      # a pointer to that data, and the queue object. 
      # So you have a list of them and can iterate through the list
      # for join to end them nicely. 
  a_proc.start() # Starts one! 

# Waits for them to end and ends them nicely, cleanup. 
for p in proc_list: 
  print '%s.exitcode = %s' % (, p.exitcode)
    # You don't need this but it's nice to see them end
    # and their exit status number. 

print 'Getting elements from the queue.' 
while not my_queue.empty(): 
  one_procs_data = my_queue.get() 
  # then do something with that, maybe add it to a list, depends on what it is.

Ok so that's commented, but how does it work? How does the worker function ("worker" is the term of art, it appears, for the function that is multi-processed) deal with the queue object? Let's look at code for that. Note that this is a somewhat simple example with one queue, I've seen nice examples with an input queue and an output queue. This example here deals only with an output queue (because my project is chunking a Pandas DF into processable pieces, that is playing the role of the input queue essentially).

def your_worker_function(your_args, your_queue): 
  # Do your processing here! 
  # That's it, no "return" needed! 
# End of your_worker_function 

Not bad! Hand the worker function the queue object like you would any function's argument. You don't need to use the "return" call, since you use ".put" on the queue object and put the data you want into it. What is nice is that Python takes care of the worker functions all putting their data into the queue so you don't have to worry about them all getting smashed up (not a technical term) and your code barfing when/if they all try to access the object at the same time. No worries! Love it.

So how does that previous code work, the code that calls the worker function?

Declare a list object to populate with the processes (their names, I think). This is important, and is so you can end them all nicely and do cleanup. Also declare your queue object.

You use a looping function (here a for loop) to give out the jobs to the right number of processes. "target" is the worker function, and then "args" are the arguments you hand to that function, just as you would normally. Then, add the process that was just made to the list and start it.

The "join" for loop -- I have no idea why it is called "join" btw -- nicely cleans up after all the processes. I'm not discussing errors and such here, that's more advanced. The for loop does indeed loop through them all, and will get to all of them, waiting for them to finish in order (I think). I was a little curious about what if the first process in the list fails or doesn't stop or something, but somehow the looping will get to and join all the ones that have finished, even if they are after an infinitely looping one earlier in the list (yes I speak from experience oops).

Then, you can call your now-populated queue object and ".get" the items from it (whatever data type you put in). Deal with them appropriately -- so, maybe you made lists, and you don't want a list of lists when you're done, you want one list, so ".extend" to the outer main list object. Whatever is appropriate for your job.

There you have it. If you have 12 cores you could make it go to 11 instead of 12, but it's more fun (or at least faster) to say that you could and use 12 anyways.

Edit: Apparently, queue objects have a size limit, and things can grind to a halt and your processes will stop but won't complete (that is, your CPU won't be doing anything but your processes won't join) if you overload your queue. Oddly this doesn't crash, perhaps because exceptions don't get pickled and passed around the levels of multiprocessing. That was a very detail thin explanation. Suffice to say, I have some code that is fine but grinds to a halt even though the worker processes exit, however, the processes don't join. Right now I'm looking at the queue as the culprit.

Edit 2: I took out the my_queue.put(item) call in the worker process and replaced it with a file write (using the process name for unique filenames) and.... It worked! Actually first I took out the my_queue.put(item) and then they all joined, so, not happy with the queue. If you are doing multiprocessing, you might have a lot of data, but I guess the queue can't handle it. And, worse, in OSX you can't get the size of the queue since something isn't implemented, and also queue.full() (or whatever exactly) isn't totally accurate.

Friday, July 4, 2014

How To Circumvent Your IRB In 4 Easy Steps

  1. Design a study with a big company that is going to run it without you.
    • The study can manipulate emotions yet include people with mood disorders and people under the age of 18.
  2. Have the big company run it and have them collect the data.
    • You don't know how to code in their massive and distributed server environment anyways.
  3. Approach your IRB and ask to use data that has already been collected, and because it has already been collected no human subjects approval from the IRB is needed.
    • Hope your IRB is lazy and doesn't think about this too much.
  4. Publish and over-hype your results.
    • Issue lots of non-apologies when the research community calls you on your lack of ethics.

Sunday, June 29, 2014

That Facebook Study on Manipulating Emotions

My Summary
People are instinctually driven to be a part of communities (thanks to evolution). Facebook wants to be our go-to place for easier communication with our communities (notice the similarity between those two words). We know that being a community member means celebrating the good and giving support when things are bad. By taking away both positive and negative posts, Facebook took away our ability to do that, and in doing so threatened our ability to take part in our communities, which not incorrectly is seen as a threat to our livelihood. That is a big part of the reaction here, and it's not getting the attention it deserves.

Let's be clear: the study was completely unethical, and it is horrifying that everyone involved was apparently blind to this obvious fact. Yes, obvious fact, and no it doesn't make it either non-obvious or not a fact that so many educated people missed, and continue to miss, this important point.

Some Links

  1. The actual, rather short, paper about the study
  2. A response at the AV Club, the first thing I learned about it.
  3. A great piece at Tumbling Conduct
  4. Another great piece at The Laboratorium, by James Grimmelmann. 
  5. A great takedown of the methods at Psychcentral
  6. Forbes wrote about it, and included the (lame) Facebook explanation from one of the authors.
  7. A good blog post about the lack of informed consent and why it matters here.
  8. A great NYTimes opinion piece by Jaron Lanier.
  9. The not-quite retraction by the journal, an "Editorial Expression of Concern".
  10. A lengthy write up at Science Based Medicine, quite good.
  11. Statsblog has a guest post that is also worth reading.
(I am editing this over the course of Sunday, Monday, and Tuesday: reflection and thought are more important than speed of posting. And now Friday to add the "Editorial Expression of Concern" from PNAS.)

Terms of Service: They Don't Care
No one gave informed consent to this, and yes that matters. The Terms of Service is not informed consent. It is laughable to think it is. Some people are saying that because not all studies need informed consent that this one didn't, that's not true.

Now it turns out that the Facebook TOS didn't actually include the word "research" in it at the time. Let's be honest though, the only real weight of this discovery is that Facebook doesn't follow its own TOS, which isn't surprising.

And now (Tuesday, July 1) I am reading that there may have been Facebook users who were under the age of 18 in the study, in a followup at Forbes which links to a login-protected WSJ article. (I am guessing that under 18 is a different category for studies and there may be some legal issue about that, but I don't do A/B research on young people.)

Cornell's IRB: Oops
And it also looks like Cornell's IRB is trying to wash its hands of the IRB process: it looks like they just rubber stamped it because the experiment had already been run by the time it came to them. That is, the study was run without academic IRB approval. They actually have a statement about it.

Cornell's IRB statement is horrible and intentionally misleading. It says how the Cornell researchers': was limited to initial discussions, analyzing the research results and working with colleagues from Facebook to prepare the peer-reviewed paper.
What this means is that they did everything except run a bunch of extremely complicated code on the Facebook system, which would have selected user accounts for the study, manipulated the study conditions, and then data scraped the relevant data out of a big data cloud computing environment. The only people qualified to do that are the Facebook techies.

There is no "limited" part here, they did everything, from start to finish, with a bit of help on the technical side. This is a very large and total failure of the IRB process.

Furthermore, Cornell faculty member professor Hancock "was not directly engaged in human research," which is laughable. Cynically I could say that we see here neither Facebook nor Cornell considers us human. My real guess is that Cornell's IRB just rubber stamped this and they have a very poor oversight process, or have a very weak understanding of Facebook.

The researchers had a theory that they could indeed manipulate people's behavior, as shown by what they post in Facebook, by manipulating what people saw in their feed. Some say this is irrelevant because Facebook manipulates our feeds all the time, and this is apparently in part why IRB approval was given. This is irrelevant. Facebook manipulates (this word is used slightly differently in research communities and the rest of the real world where it is very creepy, as it should be) our news feed, yes, but by "most popular", and never before has it been suggested that it is by mood. This is totally different and an important distinction.

Effect Or Not
Some people also say that it is irrelevant because there was no effect (despite the authors of the paper claiming a finding, despite the difference being roughly equivalent to zero). But no, there was no real effect that could be measured in Facebook. We have no idea what the real world effects were, if any. And that's important. Don't confuse big data with real world. Big is not complete, as someone once said about big data.

That the finding was so small but statistically significant makes it a bit paradoxical to talk about. So the researchers can claim a finding -- they wrote in the paper that "We show, via a massive (N = 689,003) experiment on Facebook, that emotional states can be transferred to others" [italics added] but then Sheryl Sandberg, Facebook's COO, said "Facebook cannot control emotions of users." So much for being on the same page.

The Cornell Press Release department heavily stresses the effects, repeatedly quoting one of the authors.
“Online messages influence our experience of emotions, which may affect a variety of offline behaviors,” Hancock said.
But of course they didn't take any offline measures at all.

Professor Jessica Vitak pointed out, in a Facebook thread, that it is most likely they didn't measure emotion at all (since we can't say that Facebook posts are that representative of emotion all the time). What they could have measured was something along the lines of social acceptability of the emotional leaning of posts (she summarized it much better than I did there and had a better phrase for it). We know they measured post language, but we don't really know what that represents beyond Facebook posts, if anything. That's not good science.

The Sample: Representative? No
The sample, its representativeness, and who the (non) results apply to are also problematic. Facebook users are not representative of the population at large. They just aren't. They have internet access and computer skills. Not everyone has those two things. We are not really sure about the sample from the study, it's Facebook users whose posts were in English but that is all we know about them. It is scientifically unsound to then claim that the (non) results here apply to everyone else because we don't know enough about who the unwilling participants were and how they match up with other groups of people.

Of course if you only care about Facebook advertising, then the only relevant sample is Facebook users.

The Sample: Mental Health? Users Between 13-18?
The public health angle has only been explored by a few comments I've seen, and it's complex. I've seen one comment say how about 10% of people have a mental health disorder: ah here's the National Institute of Mental Health, which says 9.5%.

9.5% of 689,003 = 65,455 people in the study with a mood disorder (most likely -- this is statistics).

Could seeing fewer positive or negative posts cause problems? Yes. Will it, for any one person? We don't know, there are many many factors at play here. But if you're running a study where the point is to manipulate mood and you're going to have 65,455 people with a mood disorder in it, you need to be really clear about that and really careful, and this study comes nowhere near that standard.

Others have pointed out that, besides having no way to filter out those with moods disorders in a study meant to manipulate moods, we have no idea if the study filtered out young people.

Additionally, some people have pointed out the public health issues around this kind of experimentation and manipulation:

A/B Testing Is Done All The Time! So What?
Some have also said that it's ok because companies do A/B tests all the time (that is, tests with two conditions). Well does that make every A/B test ok? No, it does not. Also, Facebook is not like other companies -- other companies are not the home of our digital communities. Facebook likes to say how big and important they are because of this, but if these communities are so important to people then it is not okay to manipulate the emotional content in them at all. Yes, communities can be informational, but a lot of the time Facebook friends are also real world friends and family and the emotional content is really, really important.

Communication Is Community
In-group, out-group is important. This is Facebook, people who for most of us are out-group, manipulating the messaging in our in-groups. Facebook degraded our communication, and communication is community (they have the same root in English), and when out-groups do that I think it is rightly seen as a threat.

I want to stress the community angle. Communication forms community. This experiment reduced important, emotional communication in communities for hundreds of thousands of people. Taking part in emotional communication is a vital ritual for community members that both reenforces that community and affirms that person's membership in that community. This includes both our taking part in emotional support (replying to something negative) and our taking part in celebratory communication. To reduce our capability to take part in important community ritual is a direct threat to our social survival, and it is anathema for a company that wants to be, and currently is, the largest online community platform in the world. (Two of my favorite thinkers about community and ritual are Clifford Geertz, and on this topic see his chapter about a funeral in Java; and James Carey, who has written about community, communication, and ritual.)

Some people have said that because the researchers didn't add any negative posts, merely took away positive ones (in one of the test conditions) but that you could still see them on your friend's page, that this is ok. No it's not. (Do you really go to each and every one of your friend's pages every time you go to Facebook? Do you know anyone who does? I didn't think so.) Taking away a negative post is horrible, because it takes away my ability to support a friend in need, that is, doing so undermines my ability to act appropriately in my community, and that is hugely problematic. The same is true for my missing out on a positive post: I am denied the opportunity to take part in a positive celebration in one of my communities.

What Was The Purpose?
Some have said that the researchers weren't trying to manipulate people's emotions, just their behavior on Facebook. Well no, that's ridiculous for at least two reasons. One is the title, which contains the phrase "emotional contagion", so we know what they were thinking. The other one is of course researchers always want to have something larger to say about human behavior. You can't manipulate what people are doing in terms of their emotions without perhaps affecting their emotions. If you don't know, you are obligated to find out. But again, we have no idea how, if at all, this affected people who were in the study in their real world lives.

As some have pointed out, this was research done on a not very interesting question (this seems pretty obvious to me), on people who did not give consent, with an ineffective IRB, under academic auspices but lacking academic standards, with no consideration of real world effects, with faulty methods, and which could have been done somewhat differently looking for correlations in what people saw and what they posted using data mining and no manipulations at all.

I am actually debating quitting Facebook because of this. Google+, anyone?

John Gruber, long time computer industry expert, has a post about it with one line I'll cite: "Yes, this is creepy as hell, and indicates a complete and utter lack of respect for their users’ privacy or the integrity of their feed content. Guess what: that’s Facebook." [Italics in original.]

But it was also Cornell and two Cornell-affiliated researchers.

Friday, June 27, 2014

Picturephone Redux

I wrote a little about the Picturephone a few years ago, so it is cool to see it mentioned in the NYTimes today with a photo I hadn't seen before (this one I'm including).

NYTimes caption: "In New York on Dec. 21, 1965, Keum Ja Kim, 15, a soloist with the World Vision Orphan Choir, used the Picturephone to audition for Robert Merrill, a star with the Metropolitan Opera, who was in Washington to sing at the White House. Credit Bettmann/Corbis"

Wednesday, June 25, 2014

The Real Problem with the iPhone Fingerprint Sensor

It gets shmutzy and won't read your fingerprint until you clean it. This might involve pressing it hard enough to activate Siri. This is not a bad problem to have, "boo hoo I have a smart phone my life sucks."

When it came out, there were far too many breathless articles about how someone could make a rubber copy of your fingerprint and then hack your phone. This was stupid, and everyone knew it was stupid, but people wrote about it anyways. (And apparently a lot of people don't even have passwords at all on their phones, so....) In order to do this, you'd need to steal their phone and make a rubber copy of their fingerprint. How many times has this happened? Zero. Why? Because the internets would blow up if it did, and that hasn't happened.

(Photo credit Apple Inc.)

Wednesday, June 18, 2014


Went to a cool informative meetup at NYU's Center For Urban Science And Progress (CUSP) where Mike Flowers did a Q&A about the NYC government data field. Flowers has a very interesting and diverse background which, importantly, includes a lot of hands-on getting things done (and by "things" I do not meet writing academic papers like I mostly do). Currently he is CUSP’s first Urban Science Fellow, although I am not sure exactly what that means. (Perhaps it's, "person who can teach smart things but doesn't have a PhD since he's a practitioner.")

One of his comments that I found very accurate and liked very much was his focus on what the data means -- that is, the numbers do not speak for themselves, you have to know what they actually represent. You go out into the field with the people that generate those numbers and have a hands-on understanding of them. What are they quantifying? What do the measures mean? How are they measured?

One great quote, which I will paraphrase since I don't have it exactly, was that, in order to understand NYC data, you have to have an understanding of the history of the city.

Flowers is a person who understands how to understand data! History! Awesome.

An Obvious Realization

So, Habermas' concept of the public sphere -- that is, people who aren't in the government talk about government stuff and the ideas can be judged on merit -- I had been thinking recently about how Facebook apparently started out as a way to meet... ok let's be honest, stalk... women, and turned into a massive social advertising platform ("platform", I know, I know). Google too started as one thing (search for users) and became a massive social advertising platform (search on users). Just like I used to read about with television (at least the advertising supported model we rely on here in the US), you are the product, not the shows, those are just there to get you to watch so they can advertise at you, that is, sell you to advertisers. And that's what happened to the public sphere, it became corrupted by advertising. How's your belly fat, by the way? Did you know that moms in (your town) discovered one shocking secret that doctors don't want you to know? Or wait maybe the president changed laws in (your town) and you can get car insurance for $5. Lies, all lies. Not all advertising is terrible, but too much is.

Wednesday, June 11, 2014

Test Your Surveys

If you don't actually test your surveys, you roll out something for a major US telecommunications carrier that doesn't pick up the right value and then prints blanks and is clearly wrong, like this:

Monday, May 19, 2014

An Interesting Sentence

An interesting sentence, unlike one I've ever seen (well that I recall), from the New Yorker where the copy editors are intense:

He liked to roar, though also: he liked quiet.
I would probably have written "He liked to roar, though he also like quiet." but the colon puts a really hard stop in there. From The New Yorker, May 12, 2014, p. 72, by Jill Lepore.

I had more on this post but I had copy and pasted it and it got a bit messed up, then it was deleted accidentally. The New Yorker is very pedantic and precise about grammar is the point here.

Thursday, May 15, 2014

Excel and Date Formats

I hereby hate the people who programmed Excel and how it deals with date formats.

This error warning makes absolutely no sense. Sure, there are a ton of date formats. But the source file, which I made myself, is just a CSV file. The destination file is a brand new empty Excel file. Four years? You are kidding me. Approximately? What? This is not acceptable.

Edit: Aha! So this is what is going on. Not at all acceptable, since it means that Excel is interpreting the text as dates when I just want it to passively see everything as text. Microsoft has made Excel overdo it here, and it is not helping.

Friday, May 2, 2014

Ubisoft Toronto Visit

Lucky enough to be a part of CHI's game group (see also CHI Play) and headed along to Ubisoft's Toronto offices and their User Research Lab!

Yup. Ok we missed the "FT" part but hey.

Testing Station: Xbox One, PS4, PC under the table.

Testing stations.

Testing room with one-way mirror!

Through the looking glass.
You can see me in the mirror there.

This room is made to feel like a living room.


The Psychology of Seating

I'm sure research has been done in this area, but I never see it reflected in conference seating layouts... probably because there is "easy" and there is "this is what best practice and research articles show."

Indeed, Google Scholar has articles on seating, but most I see in a cursory glance are about children and education, but anyways.

As you can see in the following photo from CHI 2014 in Toronto (just ended this week!), there are people who haven't gone to a seat, yet there are seats open. Why? Well that seems mostly obvious (spatial norms, seats too small, no outside aisles are just a few possibilities). But what bothered me is, given that we know a thing or two about this kind of seating and human behavior, why can't we do better?

Monday, April 28, 2014


At CHI this week, aka the "ACM CHI Conference on Human Factors in Computing Systems". Very exciting. Looking forward to seeing some friends and some great projects and presentations!

Thursday, April 24, 2014

Facebook Is Still Not Everyone

Annoying article in the New York Times recently, one that held much promise: "Up Close on Baseball’s Borders." The authors use Facebook data to determine the boundaries of US baseball (MLB) team fan geography. Except this doesn't work, because Facebook is not everyone. Is Facebook statistically representative on this measure? We don't know. Is this only those who "like" a team, have their location entered, and have their accounts public? It appears so. That's a pretty specific group, even if it is numerically large.

Millions of [Facebook users] do make their preferences public on Facebook...
And, from our knowledge of Facebook and common sense, that means that millions don't.
We were able to create an unprecedented look at the geography of baseball fandom...
Well, no, not at all of baseball fandom, it is a look at:
  1. Facebook users....
  2. Who also make their profiles public....
  3. Who also "like" an MLB team.
This group does not even come close to equating with baseball fandom. Does it represent baseball fandom accurately in terms of geography in the US? That is unknown. Maybe, maybe not. But to claim that it does is simply inaccurate and shows a lack of solid understanding of samples, statistics, and data of all sizes ("big" in terms of big data does not mean better).

One group of die-hard baseball fans this data does not contain, I guarantee you, is children who are baseball fans, since children are not (usually) on Facebook. Young kids can be so into their teams ("their" teams, note!). Kids' affiliations will be influenced by their geography, their friends, and their parents, although my nephew likes Kevin Durant but my nephew lives in NYC and has never been to Oklahoma, so it is not always straightforward.

Dear New York Times people who wrote this article -- you have oversold your data! You can do better!

Monday, April 21, 2014

Research Presentation: CUHK, Communities in Online Games: Tools, Methods, Observations

Part of being unusually busy lately (in addition to the typical research, deadlines, and upcoming summer conference season) included giving an invited talk at the City University of Hong Kong in the Department of Media and Communication. Titled "Communities in Online Games: Tools, Methods, Observations", it was a great opportunity to talk for 50 minutes instead of the usual 12 and so I could knit together threads and themes that usually you don't get to address at all, such as larger overarching issues for research.

I mostly focused on the importance of theory for big data, and how big data might not be as big as you think, using online game communities and a couple of data sources as examples. It was really enjoyable and the audience was great.

The slides are available online, but they make much more sense in the context of the talk itself which is also available thanks to the great AV people at CUHK's Department of Media and Communication.

Tuesday, February 25, 2014

How The Mightily Hyped Have Fallen

Second Life inspired a wave of hype and fear not seen since... well ok we see that kind of thing all the time, but it was like Internet Hype 2.0, which of course was another version of Digital Utopia (version number too large to count really). So, every company had to get into Second Life, even when much of it turned out to be weird and porny. They got in like there was no tomorrow, massive announcements, and then quietly left, trying to hide their embarrassment. Even Wired left their offices.

I just ran into this ad on a wiki. Wow. They've really hit bottom. Even I am a little embarrassed for them. (Google will never do text flow correctly, drives me nuts.) "YES WE ARE SLUTASTIC." Nicely done, Lindens (if it is even still the Lindens), embrace your scumbaggery.

Thursday, January 30, 2014

Silver's "The Signal and the Noise"

There are some errors in Silver's (excellent? interesting? occasionally vexing?) book from a little over a year ago, The Signal and the Noise, about prediction and Bayesian theory. The two I noticed are strange, because they are very basic. (Other people have noticed these errors.)

On page 269, the book says how 20 x 20 is 4,000, when it is actually 400. This is pretty basic. Would Silver make this kind of error? Was he rushed? Did he and the editors miss it? Did someone else ghost write parts of the book? If this error is here, what are the errors that I didn't notice?

There is also an error or two with the arrows-length illusion image on page 367. For one, the description is not clear, and it seems that Silver is describing the illusion in the wrong way (that is, he says the one that appears to be longer appears to be shorter). However I measured the lines of the arrows in my printing, and indeed the one which is supposed to appear longer via the illusion is indeed longer by a millimeter. It looks like a printing error, where the lines may have been the same length, but when the arrow heads were added, they were added compactly to the "will look shorter" line but were not to the "will look longer line". The thickness of the added arrow heads is just enough to throw the whole illustration off. So not only does Silver's description get it wrong, but then the illusion isn't an illusion, one is actually longer than the other.

This really undermines the entire book, since I'm reading it to learn about things I don't know, and if Silver can't get right the things I do know then I know I have no idea if he is getting right all of the material I don't know.

Unrelated to these errors, I did like this review: