Amazon still thinks I'm a student, but for years I've told them I'm not. How is this a good use of customer data? How is this responsive to customers? It's insane and idiotic (and annoying when I am trying to give them my money, they make it harder to do so, but yes ok ok I am still an Amazon customer so what do they care?).
Here's a screen grab from this month, September 2016:
But I've told their online help people that I'm not a student, back in April as you can see here and previously in January of 2015 as you can see here. So much for customer feedback.
Additionally, Twitter's recommender needs some help:
Data & Society is an incorporated entity like Valvoline, but the two organizations are nothing alike and neither are their Twitter feeds. Valvoline isn't marked as a "sponsored" post (i.e., paid advertising), and even if it were the mismatch is just hilarious.
And currently I live in New York and I don't own a car.
The data is there, people just aren't using it well at all.
Tuesday, September 20, 2016
Still Data Fail
Monday, September 12, 2016
Star Trek and the Future
Although this has been acknowledged before (Star Trek inventing the future and all that), during the recent 50th Anniversary of Star Trek where many or all of the original episodes (remastered so they look nicer on today's televisions) were shown, I was moved by the earpieces they use, since they are like clunky Bluetooth earpieces. Here's a quick (and thus blurry) photo I grabbed off my TV with Spock in the foreground with his easily observable earpiece and Uhura in the background adjusting hers. Granted for TV such technologies would need to be easily observable by the audience, especially in the mid-1960s, today not so much since we actually have these things.
Thursday, July 28, 2016
Online Game Communities Presentation
About a month ago I did an online presentation for a summer class taught by Dr. Jaime Banks, who was over in Germany at the time, for the summer session she and Dr. Nick Bowman are involved with, SPICE: Summer Program in Communications Erfurt. It was really great, and the students had some good questions. I put the slides (slightly edited) up in Slideshare, you can find them here. The talk looked at some work in gaming, play, and communities, using different data. Just the slides are not as good as the slides and the audio, but there they are.
Sunday, July 24, 2016
TKinter, ttk, and Progressbar
tl;dr: ttk.Progressbar is max 99 by default, not 100, despite the documentation. If you try to overfill it, it won't accept the call that does so.
I was building a front end for a scraper app, and at first I tried Xcode and the Interface Builder (which I first saw over two decades ago on a NeXT machine, it was glorious then and it still is), but I couldn't get it to mesh with my Python code (so much of the online help is out of date). A friend told me I was being an idiot and should try something simpler, and I settled on TKinter, which had me up and running in very little time. (The front end took only two days, but I wasn't committing every waking hour to it, and I had to figure out how to take my linear Python script and conceive of it in the looping GUI manner, which was difficult.)
I wanted a text box field so the scraper could print to it like it does with Python's print statement to the terminal (but I don't want the user to have to deal with the terminal or the console). I ended up using ScrolledText, which you have to import (well as far as I can tell, and it's working, so, once it works, I don't have time to poke at it too much). (NB: with ScrolledText, I needed setgrid=True to make the frames resize nicely, this was VITAL, packing frames in TKinter is an art I do not yet understand, and with the ScrolledText field, you might want state='normal' to print to it, then state='disabled' so the user doesn't type in it [but loses copy capability], you'll want insert(END, new_string) to print at the bottom of your field, but then you also need see(END) so that it scrolls to the bottom -- otherwise it prints on the bottom but the view stays put at the top. Details.)
Then I wanted two progress bars, one to show the user the scrape progress and the second to show the parsing progress. The scraping one I needed to fudge a little, so I tried....
my_window.scrape_progress.step(10) # first init step
my_window.scrape_progress.step(20) # bigger step
As you can see, that's 10 + 20 + 20 + 50 = 100.
The bar would fill 10% (10), then to about 30% (10+20), then to about 50% (10+20+20), then it wouldn't fill anymore.
Eventually out of annoyance when trying alternatives, instead of 50 in the last step I used 49, and it worked.
So no, the max is not 100, it's 99, so the bar values are probably 0-99 for 100 increments, as 0-100 would be 101 increments. I suspect that step(100) won't work, but step(99) should fill it to 100%.
Some code:
from Tkinter import *
from ttk import * # ttk widgets should overwrite Tkinter ones in the namespace.
import ScrolledText as tkst # Not sure why this is its own library.
# from my window class def, nothing to do with the Progressbar:
def print_to_text_field(self, the_string):
new_string = '\n' + the_string
self.the_text_field.configure(state='normal')
self.the_text_field.insert(END, new_string)
self.the_text_field.see(END)
self.the_text_field.configure(state='disabled')
tk_root.update()
tk_root.update_idletasks()
Monday, July 4, 2016
Making a Spectrum/Gradient Color Palette for R / iGraph
How to make a color gradient palette in R for iGraph (that was written tersely for search engine results), since despite some online help I still had a really hard time figuring it out. As usual, now that it works, it doesn't seem to hard, but anyways.
(I had forgotten how horrible blogger is at R code with the "gets" syntax, the arrow, the less than with a dash. Google parses it as code, not text, and it just barfs all over the page, so I think I have to use the equal sign [old school R] instead. It is also completely failing at typeface changes from courier back to default. I see why people use WordPress....)
- Set the resolution for the gradient, that is, how many color steps there are/you want.
- Set up the palette object with a start color and an end color. (Don't call it "palette" like I did at first, that is apparently some other object and it will blow up your code but the error message won't help with figuring it out.)
- You'll want a vector of values that will match to colors in the gradient for your observations, for what I'm doing I got the maximum on the variable in one step...
- And then set up the vector in the second step (so, this is a vector of the same length as the number of observations you have, since each value represents the value that matches up against a color in the gradient). (In my code here, it's a ratio, but the point is you have numerical values for your observations [your nodes] that will be matched to colors in the gradient.)
- Create a vector that is your gradient that has the correct color value for each observation. (The examples of this I could find online were very confusing, and that's why I'm making this post.)
- Draw! (Or you could assign colors to your graph object and then draw.)
Also note that, I think, the my_palette object is actually a function, but it definitely isn't a "palette" in the sense of a selection (or vector) of colors or color values. I think that is part of what makes line 4, below, unusual. Maybe I should have used my_palette_f to be more clear, but if you've made it this far, I have faith in you. (Also note that colorRampPalette is part of R, not part of iGraph.)
- Set resolution, I'm using 100: my_resolution = 100
- Set palette end points, this starts with low values at blue and high values at red: my_palette = colorRampPalette(c('blue','red'))
- Get the max from your variable you want colorized to make the ratio: my_max = max(V(g)$my_var_of_interest, na.rm=TRUE)
- Create your vector of values which will determine the color values for each node. For me it was a ratio, so based on the max value: my_vector = V(g)$my_var_of_interest / my_max
- Notice here we have iGraph's V(g)$var syntax.
- Create the vector of color values, based on your variable of interest and the palette end points and the resolution (how many steps of colors). This will give you a vector of color values with the correct color value in the correct location for your variables in your df-like object: my_colors = my_palette(my_resolution)[as.numeric(cut(my_vector, breaks=my_resolution))]
- Ok let's explain that. Take my_vector, and bin it into a number of parts -- how many? That's set by the resolution variable (my_resolution). By "bin" I mean cut it up, divide it up, separate it into my_resolution number of elements. So if I have 200 items, I am still going to have 100 colors because I want to see where on the spectrum they all fall. Take that vector as.numeric (since maybe it comes back as factors, I don't know, I didn't poke at that.) Send that resulting vector of numeric elements (which are determined by my_var_of_interest and my_resolution) to the my_palette function along with my_resolution, which returns a vector of hex color values which are the colors you want in the correct order.
- Draw! plot(g, vertex.color=my_colors)
- Note that we aren't modifying the colors in the iGraph object, we're just assigning them at run time for plot(). We could assign them to the iGraph object and them draw the graph instead.
my_resolution = 100
my_palette = colorRampPalette(c('blue','red'))
# This gives you the colors you want for every point.
my_max = max(V(g)$my_var_of_interest, na.rm=TRUE)
my_vector = V(g)$my_var_of_interest / my_max
my_colors = my_palette(my_resolution)[as.numeric(cut(my_vector, breaks=my_resolution))]
# Now you just need to plot it with those colors.
plot(g, vertex.color=my_colors)
Sunday, July 3, 2016
Gephi and iGraph: graphml
When Gephi, which is great, decides to not exactly work, you can save your Gephi graph file in graphml format and then import it into R (or Python or C/C++) using iGraph so you can also draw it the way you were hoping to. (I'm having an issue with setting the colors at all in Gephi.)
It took me a few tries to figure out which format would work. I need location (since Gephi is good at that but I don't know how to make iGraph or R's SNA package do that) and attributes for the data. So far, so good!
Some helpful pages:
- An iGraph tutorial.
- Another iGraph tutorial.
- iGraph docs for R.
- Gephi file formats.
- iGraph read_graph().
Note!!!! Apparently if you make a variable in R (at least while trying to graph something with plot) and you use a variable for your palette that you name palette, you will destroy (ok ok overwrite) some other official variable or setting also named palette, but the error you get will not at all clue you in to what happened. Better to call your variable my_palette or the_palette, which is what I usually do (so why didn't I do it here?).
Saturday, June 18, 2016
Wednesday, June 15, 2016
Recent Travel
I've been to Germany for ICWSM 2016, then Paris, then Hong Kong, then Japan for ICA 2016. You can see some of my travel photos in Instagram. 4 weeks on the road.
ICA 2016, Fukuoka, Japan
Had a great and busy time at ICA 2016: one paper, one panel presentation, moderated a session, and won an award! (Google is being impossible with photos and tables as usual. So much for interfaces.)
I was lucky enough to be invited to speak on the new Computational Methods panel, for the CM interest group. I tried to give the crowd an exhortation to engaging with such methods, because we as social scientists have a lot to offer computational analyses. You can see the slides in SlideShare, but I don't spell it all out in the slides when I present. My presentation got a nice tweet too!
Presenting on the Computational Methods panel. |
As part of the Games Division pre-conference in Tokyo at Nihon University (I love the neighborhood there, the Ekoda stop on the Seibu-Ikebukuro line), we all went to Akihabara, and of course we saw and did cool things, like engage in deep discourse with Mario, the working-class Italian-Japanese plumber.
|
"You don't think quantitative and qualitative methods are complementary? Explain!" |
I also was lucky enough to run into Sanrio's Gudetama in Hong Kong and then again in Japan. |
Gudetama! |
CityU Hong Kong Summer School
Had a great time teaching a class and also an impromptu session on Gephi at the City University of Hong Kong's Summer School in Social Science Research! It's in the Department of Media and Communication, and run by my friend Dr. Marko Skoric. The main instructor was Dr. Wouter van Atteveldt, who is awesome and has great hats as you can see.
I also was fortunate enough to attend CityU's Workshop on Computational Approaches to Big Data in the Social Sciences and Humanities, which was great and had lots of great speakers.
Me, showing some great students a few things about Gephi. |
The three of us in front of the department sign. |
Tuesday, April 19, 2016
When Companies Fail The Data
Recently, I have encountered three examples of how giant data gathering companies have completely failed to use that data in any sensible way. The companies are Facebook, Amazon, and Pandora.
- The people at Facebook do not care about the accuracy of the ads they serve.
- The people at Facebook do not care if the ads they serve are purely for emotional manipulation.
- The people at Facebook are not using the 11 years of data they have on me to realize that I would not like this ad because:
- I do not like advertisements that lie.
- I do not like advertisements that manipulate.
- I am not a fan of Sylvester Stallone.
And yes, I know the image is an ad for Flonase, not for cars, it just happens to have a car--I use it here because it's in Spanish (although I am more complaining about the audio ads, images clearly work better here).
Thursday, April 14, 2016
For A Decent CSV Spreadsheet App
All I want is a decent spreadsheet app that does not insist on mangling my CSV files, which often have ID numbers in them which I might want to view as text and not numbers. Apple's Numbers is maddening (you have to export to CSV, extra steps, and it has a relatively low row limit, 65,535 I believe) and Microsoft's Excel is a little better but I'll use it as an example here of What You See Is Not What You Get.
I am doing some work on cities and (county-level) FIPS codes (so, in the US, FIPS codes are Federal level identifiers useful for a lot of things, they identify counties). Some cities are large and lie in more than one county. Some of the data I have deals with cities, and the income data is on the county level, so I need to map from cities to county FIPS.
Excel did not make this easy.
The file I grabbed off the net to help me map cities to FIPS (counties) quite correctly listed all the appropriate FIPS codes for each city. I needed to narrow this down to one (Wikipedia helped a lot, the geopolitical Wikipedians are nitpickers).
FIPS codes for counties have two parts, two leading digits for the state and then three digits for the county. So all FIPS codes that start with 36, for instance, are counties in New York state.
The format from my source file looked like this:
Raleigh, NC: 37063,183 Birmingham, AL: 01073,117 New York, NY: 36005,047,061,081,085(I am pretty sure those 5 numbers for NYC are the 5 boroughs, I know Brooklyn is its own county, Kings county.)
Excel, however, would show the following in the main view, interpreting these IDs as numbers--errors are in the parentheses, A, B, and C:
Raleigh, NC: 37,063,183 (A) Birmingham, AL: 1,073,117 (A,B) New York, NY: 36,005,047,061,081,000 (A,C)
- Added a comma that isn't there.
- Dropped leading zero.
- Rounded rightside digits.
That was all extremely infuriating, and reminded me of Microsoft's Clippy, where the coders thought they always knew better than you. Granted, a lot of apps and even programming language packages try to be smart and guess formats, and yes this can be useful. But if there are leading zeros and commas in odd places (or not) and it's a CSV (text) file, there could be a default "read CSV as text". Of course it seems that neither of these two programs have been coded to play nice with CSV files.
As such, they are not overly useful data science tools.
Tuesday, April 5, 2016
Case Study in Data Ethics at Data & Society
I am pleased to announce that a case study on data ethics, by myself and co-author Dr. Roei Davidson, has been published at Data & Society! Titled "The Ethics of Using Hacked Data: Patreon’s Data Hack and Academic Data Standards", we look at issues around using hacked data (or not).
Basically, no.
But I wanted to. See the paper for details! (It's free and concise, don't worry.)
Thursday, March 24, 2016
Microsoft's Epic Twitterbot Fail
If you read this blog, you've read about the rather hilarious failure of Microsoft's experiment with a learning Twitter bot. Trolls gave it so much input it started turning out hateful, sexist, racist tweets.
So we really have to wonder...
- Why are Microsoft engineers so ignorant of Internet culture?
- Why Microsoft engineers who program text-based bots have no idea about the range of text available?
Monday, March 14, 2016
Plagued By Bad Design, Still
Design, from websites to cities to forks, is so important, all around us, and so easy to get right--but also easy to get wrong in some cases. Here's one that was easy to get right, but the designers and people who approved it still got it wrong (don't they even test these things?).
The NYC MTA information/help audio posts found in many subway stations have two words, and two buttons, as you can almost see in the first photo. Except that the second button is really hard to see (although this photo unintentionally made it worse than usual, but it's still pretty bad).
Actual info post thing. |
There are two overall problems, which you can see a little in the below photo.
- The physical placement of the words in relation to the buttons.
- The color of the buttons.
They don't.
Notice the yellow lines are longer than the blue line. |
Much better! |
Sunday, March 6, 2016
Yelverton Seven
We held the seventh installment of the Yelverton Sessions (Yelverton Seven) in conjunction with CSCW 2016. Named after the location of the third meeting, held in Yelverton, England, the Yelverton Sessions involve both intensive work sessions combined with cultural and natural places of interest not only as a break but as inspiration. And, a lot of coffee and good food. They usually, but not always, are in conjunction with a conference.
We voted to name it after the third session as by then we realized that yes, this was a sustained effort we wanted to continue. And, who doesn't like the word Yelverton?
- Yelverton One, Bangor Maine and Fredericton Canada (ICA 2011).
- Yelverton Two, Flagstaff Arizona and The Grand Canyon (ICA 2012).
- Yelverton Three, Devon England (ICA 2013).
- Yelverton Four, Bainbridge Washington (ICA 2014).
- Yelverton Five, Hong Kong (WUN Understanding Global Digital Cultures 2015).
- Yelverton Six, Austin Texas (2016).
- Yelverton Seven, Santa Cruz California (CSCW 2016).
NYC School of Data
Spent most of the day yesterday at the NYC School of Data conference -- accurately billed as "NYC's civic technology & open data conference." Sponsored by a wide variety of organizations, such as Microsoft, Data & Society, the day involved a lot of great organizations such as various NYC government data departments, included great NYC people such as Manhattan Borough President Gale Brewer and New York City council member Ben Kallos, and was held at my workplace, the awesome Civic Hall.
CSCW 2016
Just got back from CSCW 2016 in San Francisco -- was part of a great pre-conference workshop on data ethics, saw some great papers and some great people. Also, telepresence robots!
Friday, February 12, 2016
UT Austin!
Just spent some time down in Austin with some friends and colleagues, what a great time and a great place! (Natalie, JD and soon to be PhD, wrote about it too.)
We also stopped by both Communication Studies (Hearst, of course!) and the School of Information for seminars.
Yes, we actually did a ton of work. (RStudio, variables, models, theorizing, all that good stuff. And coffee.)
Tuesday, January 19, 2016
Meaningless Data Viz
This Google Trends data visualization is horrible. It does indeed show "top searched candidate by state", I would guess, but that doesn't at all mean what the map implies it means -- that is, positive popularity of that candidate and also a lead over the other candidates. It doesn't even come close to showing that.
The data underlying this map could be any one of these completely different scenarios, using just the first three listed candidates to show the problem:
Candidate | State A | State B | State C |
1. Trump | 1 | 1,000,000 | 1,000,000 |
2. Cruz | 0 | 0 | 999,999 |
3. Rubio | 0 | 0 | 999,999 |
The order of the candidates in the image may be from the data, or it may be from polls, or it may be something else, we don't know.
In theoretical State A, Trump does lead, but it's meaningless and no one is searching.
In theoretical State B, Trump leads, in a statistically meaningful manner, and people are searching (but we don't know exactly on what terms, "Trump liar" and "Trump bankruptcy" and "Trump racist" are not endearing search terms).
In theoretical State C, Trump leads, but it's a statistical tie, and lots of people are searching.
Each of these scenarios are massively different, yet they would all result in the same visualization.
There are other numerical combinations, this is just a sample of three.
This visualization also conflate geography for population, that is it doesn't have any state level per-capita correction. For this you need, I have learned, a cartogram (I think I've linked to that page before, it's really informative--here's one for the world with a slightly different approach). And, it only considers people who have internet access and who are using Google and who are actively searching during the debate. That leaves out lots of people.
And, it leaves out anything that isn't a state (such as Puerto Rico), although I assume Washington, DC, is in there (who can tell?). It also, and this is a minor peeve, makes it look like the top of Minnesota is connected by land (it isn't).
Edit: Apparently, this map is actually from Google, their "Google News Lab" according to one video where I got this map for the Democrats and it suffers the exact same problem:
Tuesday, January 12, 2016
HICSS 2016
Just spent a great week in Hawaii at HICSS 2016. Some great people and great papers! Also a few not so great papers and some not great presentations, which are not problems I recall from previous HICSS.
And, I am now co-chairing the new Games and Gaming mini-track in the Digital and Social Media track, so, there's some work to do there. Should be awesome!
Update: The G&G mini-track has been approved and the CFP is out!