Tuesday, January 19, 2016

Meaningless Data Viz

This Google Trends data visualization is horrible. It does indeed show "top searched candidate by state", I would guess, but that doesn't at all mean what the map implies it means -- that is, positive popularity of that candidate and also a lead over the other candidates. It doesn't even come close to showing that.



The data underlying this map could be any one of these completely different scenarios, using just the first three listed candidates to show the problem:

Some Example Possibilities
CandidateState AState BState C
1. Trump11,000,0001,000,000
2. Cruz00999,999
3. Rubio00999,999

The order of the candidates in the image may be from the data, or it may be from polls, or it may be something else, we don't know.

In theoretical State A, Trump does lead, but it's meaningless and no one is searching.

In theoretical State B, Trump leads, in a statistically meaningful manner, and people are searching (but we don't know exactly on what terms, "Trump liar" and "Trump bankruptcy" and "Trump racist" are not endearing search times).

In theoretical State C, Trump leads, but it's a statistical tie, and lots of people are searching.

Each of these scenarios are massively different, yet they would all result in the same visualization.

There are other numerical combinations, this is just a sample of three.

This visualization also conflate geography for population, that is it doesn't have any state level per-capita correction. For this you need, I have learned, a cartogram (I think I've linked to that page before, it's really informative--here's one for the world with a slightly different approach). And, it only considers people who have internet access and who are using Google and who are actively searching during the debate. That leaves out lots of people.

And, it leaves out anything that isn't a state (such as Puerto Rico), although I assume Washington, DC, is in there (who can tell?). It also, and this is a minor peeve, makes it look like the top of Minnesota is connected by land (it isn't).

Edit: Apparently, this map is actually from Google, their "Google News Lab" according to one video where I got this map for the Democrats and it suffers the exact same problem:

Tuesday, January 12, 2016

HICSS 2016

Just spent a great week in Hawaii at HICSS 2016. Some great people and great papers! Also a few not so great papers and some not great presentations, which are not problems I recall from previous HICSS.

And, I am now co-chairing the new Games and Gaming mini-track in the Digital and Social Media track, so, there's some work to do there. Should be awesome! Actually I'm just charing it as we don't have any other co-chairs, not yet, but we will.

Thursday, November 19, 2015

Python local variable referenced before assignment

tl;dr: A mix of tabs and spaces for your indent will cause this problem. At least in Python 2.7.

I post this since the answers I see in Google/Stackoverflow all talk about scope, and I didn't have a scope issue. That's pretty much it. This usually happens when I paste some code in from an example online (like from Stackoverflow, which I do like most of the time), it comes in with spaces but I prefer to use tabs.

Sunday, November 15, 2015

Facebook, Paris, Beruit

A lot has been written about how Facebook activated its "Safety Check" feature for the recent Paris terrorist attack, but not for one the previous day in Beirut. There is some good commentary, from less well known sites to the bigger news sites to personal blogs.

So what are we left with?

Facebook is a part of the global media fabric as much as any other site, they not only act as gatekeepers for news but act within the news environment and ideas about what constitutes news. Facebook is based in California, and so most of the employees who make up the organization we call Facebook are American: they grew up within a specific media environment which had clear but subtle ideas about what news is. (As an aside, I wonder how many employees at Facebook who are involved in the curation of people's feed have any background in actual journalism.) To understand this framing of what is news and what isn't, we need to understand global history and the flow of information, a flow which often parallels economic flows. So yes, we need to understand words that some framings have determined should make us uneasy, such as imperialism and colonialism. (Think about it this way: in the Babar series of children's books, when Africans wear European clothes they are portrayed as good, but when there are Africans who wear non-European, "traditional", or "pre-contact" clothes [I am not sure what the best term is], they are portrayed as backwards -- yes, they are elephants and rhinoceroses, but they are being used as humans in the stories. This has not gone unnoticed.) Because often parts of the world that were used for colonialism and imperialism, empire building, and the extraction of goods through forced labor and violence, are now the parts that are not worthy of news coverage, although it isn't quite so easy and straightforward.

But what we do have is highly problematic, besides the sadly common lack of coverage in some parts of the world -- the Western news media didn't cover the Beirut attacks very much, neither did Facebook, and both of these non-reactions are for exactly the same reasons which can be couched in economic terms but have deeper cultural and historical roots. Facebook doesn't have as many users, most likely, who are directly connected to Beirut, but it has many more with connections to France. The same is true for their employees. But they are also reacting to what they see in the media and perhaps to trends they are continually monitoring, live, in the overall Facebook environment.

Like the media, Facebook is essentially bestowing the idea of newsworthiness on some issues and also deciding that some other things are not newsworthy at all. That really is a big problem, as it's clear no one there is qualified to do so. This is also a well-known issue more broadly and is not at all new. This is not to say it's good, it's not at all good, nor is it to say Facebook shouldn't have activated the "Safety Check" feature. I appreciated it, as I have friends in Paris.

There are also some technical issues, beyond deciding which events qualify.

For example, for an earthquake, what if I check in as "safe" after the initial earthquake, and then am killed shortly thereafter by an aftershock? (The same issue holds for other kinds of events, such as terrorist attacks.)

Facebook's page about the Safety Check says it's for natural disasters (as of November 15th, 2015), and does not mention other events such as terrorist attacks, nor how any of these will be selected. Yet it was used for an event that was not a natural disaster.

More broadly, it could be argued that being black, female, or GLBT,  in America is to live under constant threat (there are many other examples but I am not qualified to discuss them much, nor can I make an exhaustive list, this is just an example). But, like the framework that silently suggests that Beirut is less coverage worthy than Paris, these issues should be kept quiet.

Facebook has taken action in a very contentious area, one where ideology and hegemony are heavily invested in outcomes and how we think about what is worth thinking about. Yes, as we should expect from most gigantic global companies, they did a bad job. As we know, people have been discussing these issues for a long time. These issues are still issues. Now, more people are talking. That's an important step. Steps are how we move forward.

Addendum: There is also the profile picture change to overlay the French flag on your profile photo, which again is not a bad idea, the problem is still which events are worthy of this level of attention, who is deciding, and how are these decisions made. It's the same problem as that with big data algorithms, except here it's with people's decision making.


Addendum, part 2: Here is an article from The Verge about why the people at Facebook who make these decisions did so, but I personally don't find the official explanation very satisfying about the criteria for their selection of events because the official response avoids all of the difficult issues that most people are talking about. If you're a company like Facebook, you can address these issues in a much better, direct, and clear manner. 

Friday, October 30, 2015

The Civic Data Divide

I'd like to coin the term "civic data divide", and given that Google shows zero results for it, I think I can make that claim.

More importantly, I've been working on a paper looking at factors that affect the strength of a nation's open data policy. The numbers show that, although some people have theorized about the importance of both internet access and education for open civic data, neither of these factors play a role, at least not on the national level, indicating that, as we might expect, the early users of and agitators for open civic data are those who can use it: those who have internet access and those with the education to work with numbers. This is a minority of the citizenry, and as such the national measures for overall education and overall internet connectivity do not statistically relate to the strength of a nation's open data policy.

This should not come as a surprise, given the many other socio-economic divides in terms of access we've seen before, such as the digital divide. The civic data divide, I would argue, is an extension of what scholar Pippa Norris has discussed as the democratic divide, where there are some who use the internet to engage with governance and those who do not or cannot.

Yet this is not an overly problematic scenario. Internet access and education are indeed not evenly distributed across any one country, but many of those working with civic data and open data policies work every day with the issues faced by nations and cities, and as such they are aware of and engaged with socio-economic inequalities, and, more importantly, are trying to address the issues and make things better for all citizens. Along with the expansion of civic data programs and outreach (such as MIT's Civic Data Design Lab and NYU's Center for Urban Science and Progress), although the civic data divide currently exists, the very ideals behind open civic data are working to overcome it.

Google as of October 30, 2015, 5pm NYC time.

Edit: So, this does imply that those working on civic data are able to utilize their resources (socio-economic) for education so they have data skills, and so they can do this outside of their nation's educational system. But besides those that already have the resources for education, I'm guessing there are some who instead learn the needed skills via online courses, meetups, and other alternative educational avenues. But you still have to have the time to work on these projects.

Tuesday, October 27, 2015

Blue Background, White Text

Microsoft Word used to have a fantastic option, making the background blue (instead of white) and the text white (instead of black). I and many people liked the change of contrast. I first remember falling in love with this feature in the much-loved Word 5.1 a long, long time ago (circa 1992).

I use it on my home machine with Word 2011. But with my new laptop, Word 2011 was not an option, I had to use Word 2016, which I rather like so far (despite initially causing massive problems for my citation management software). And, the option for blue background, white text is gone. And that's disappointing and problematic.

Having most of the screen be white (the background) makes the screen very bright. It's like staring into a light, albeit a dim one. You want to keep the contrast between the text and the background, but you don't need black on white to do that. A lot of interfaces do that and I think it is stupid. Even this Blogger editor is doing that (but notice what I've chosen for my blog layout). This is a blog, it isn't ink on paper, it's way beyond that.

Which is another part of the issue: the paradigm. This is a computer, it's not ink on paper, which is a whole other technology. Yes, writing papers on the computer stems from typing in black ink on white paper on a typewriter, but this isn't a typewriter. You can change the writing in your document to two columns, add images, add footnotes, move anything anywhere, add page numbers, make sections, change something to italics after you write it, have hyperlinks.... You know. Computer word processing is based on typing on a typewriter, but it is light years beyond even an IBM Selectrix II with correctable ribbon. The computer can spellcheck. You can edit on the page and it will shuffle the text around. You can justify the text after you type it and change all the margins, then undo and redo all of that. You can repaginate on the fly (they actually just do this these days). I could list probably hundreds of ways in which a word processor is different from the black ink on white paper typewriter experience. You can change the typeface and font size after you have typed the words--try that on a typewriter. Yet, the product managers for Word at Microsoft have decided that this is the right way, and the only way, to do it. It's an outdated paradigm, and it sucks for my eyes.

Feature creep is one thing. Removing a useful feature that's been around for over 20 years is another.

And I loved the file icon:

AoIR 2015

Just spent a few days at AoIR 2015 in Phoenix! Note the hilarious typo ("indipendent") and we were not sure how that happened (it was discussed).