Monday, April 28, 2014


At CHI this week, aka the "ACM CHI Conference on Human Factors in Computing Systems". Very exciting. Looking forward to seeing some friends and some great projects and presentations!

Thursday, April 24, 2014

Facebook Is Still Not Everyone

Annoying article in the New York Times recently, one that held much promise: "Up Close on Baseball’s Borders." The authors use Facebook data to determine the boundaries of US baseball (MLB) team fan geography. Except this doesn't work, because Facebook is not everyone. Is Facebook statistically representative on this measure? We don't know. Is this only those who "like" a team, have their location entered, and have their accounts public? It appears so. That's a pretty specific group, even if it is numerically large.

Millions of [Facebook users] do make their preferences public on Facebook...
And, from our knowledge of Facebook and common sense, that means that millions don't.
We were able to create an unprecedented look at the geography of baseball fandom...
Well, no, not at all of baseball fandom, it is a look at:
  1. Facebook users....
  2. Who also make their profiles public....
  3. Who also "like" an MLB team.
This group does not even come close to equating with baseball fandom. Does it represent baseball fandom accurately in terms of geography in the US? That is unknown. Maybe, maybe not. But to claim that it does is simply inaccurate and shows a lack of solid understanding of samples, statistics, and data of all sizes ("big" in terms of big data does not mean better).

One group of die-hard baseball fans this data does not contain, I guarantee you, is children who are baseball fans, since children are not (usually) on Facebook. Young kids can be so into their teams ("their" teams, note!). Kids' affiliations will be influenced by their geography, their friends, and their parents, although my nephew likes Kevin Durant but my nephew lives in NYC and has never been to Oklahoma, so it is not always straightforward.

Dear New York Times people who wrote this article -- you have oversold your data! You can do better!

Monday, April 21, 2014

Research Presentation: CUHK, Communities in Online Games: Tools, Methods, Observations

Part of being unusually busy lately (in addition to the typical research, deadlines, and upcoming summer conference season) included giving an invited talk at the City University of Hong Kong in the Department of Media and Communication. Titled "Communities in Online Games: Tools, Methods, Observations", it was a great opportunity to talk for 50 minutes instead of the usual 12 and so I could knit together threads and themes that usually you don't get to address at all, such as larger overarching issues for research.

I mostly focused on the importance of theory for big data, and how big data might not be as big as you think, using online game communities and a couple of data sources as examples. It was really enjoyable and the audience was great.

The slides are available online, but they make much more sense in the context of the talk itself which is also available thanks to the great AV people at CUHK's Department of Media and Communication.