Monday, April 20, 2015

I am not a number!

I was poking around the web after the TtW conference and found this blog post rather apt.

And then this excerpt from Joe Turow's The Daily You aligns well with it.

Friday, April 17, 2015

Theorizing the Web 2015

Theorizing the Web 2015 is on! Here in the Bowery (LES?), two days of presentations. Already made some contacts about work and am looking forward to tomorrow when some people I know will be keynoting!

Ignite Talks at Civic Hall: Great Stuff!

Omidyar Network hosted a great bunch of ignite talks (the crazy auto-slide-advancing ones) at Civic Hall on April 13th, including a few people I know which was exciting!

The Speakers:

  • Laurenellen McCann Mo Open Data
  • Tony Schloss Red Hook WIFI- The Realest Community Tech
  • Miriam Altman The uphill battle to graduation day
  • Chris Whong In Search of Hess' Triangle
  • Rose Broome What is the Basic Income?
  • Lane Becker Good Enough for Government Work?
  • David Riordan The NYC Space/Time Directory: How The New York Public Library is unlocking NYC's past
  • Gavin Weale SA Elections 2014: a youth odyssey
  • Paul Lenz The Resilience of the Past
  • Kathryn Peters Disrupting government
  • Joel Mahoney Civic Technology and the Calculus of the Common Good
  • Kate Krontiris Understanding America's Interested Bystander
  • Nick Doiron The Civic Deep Web
  • Jessie Braden Open Data Perils: Question Everything
  • Daniel X. O'Neil Changing Civic Tech Culture from Projects to Products
  • Daniel Latorre What I Learned About CivicTech from Eastern Europe
  • Noel Hidalgo Clear eyes. Full heart. Can't Loose! NYC's civic tech in 2022.

Great topics, some great ideas, and some great slides.

Friday, April 10, 2015

SQLite and Python Notes

I don't have a background in SQL, so getting the syntax correct for SQLite in Python was a little tricky, especially since it reads like it's straight out of 1983. So, here is some working syntax for a search/select and replace, and also a search/select and an iteration through the results.

Search for one result (on a unique variable) and change some of that entry's data:

cursor.execute("UPDATE outfits SET size=?, members=?, scraped=? WHERE id=?", (how_many_members, char_id_list, 1, int(outfit_id)))


This assumes you know a bit about SQL. cursor is your cursor object. This snippet searches the db for any entries (lines, rows, whatever) for where the id variable matches the value of outfit_id. In this case, that variable will have all unique entries since I declared it that way when I made the db (which is some other SQLite code that is in multiple other places on the net). So, this line finds the one I want and then changes those three variables, then you commit it which actually write it. That seems really weird to me, either do it or don't do it. I assume this made some sense back in 1983 when people wrote code in capital letters. Oh and outfits here is the name of your table in the db. Well it's the name of my table in my db.

Iterator on search results:

cursor.execute("SELECT * FROM outfits WHERE scraped=?", (0,)) # this selects them but doesn't return them for use. NOTE TUPLE!!! 

not_scraped_outfits = cursor.fetchall() # aha! 

for an_outfit_row in not_scraped_outfits: 
    # do your stuff here 

That seems weird to me, but I guess I don't understand the cursor idea. You SELECT in caps, but then you have to fetchall(). That seems like two steps where you only need one. So, you SELECT everything (the asterisk, I think) from your table that matches the WHERE call, here where scraped is 0, since it's a Boolean. That returns possibly none, one, or more. Usually for me in this particular code it will return several, and then you have to iterate through the results, and I think there are a few ways to code the iterator call, but the code I have here works so there you go. Execute a SELECT which is a search (the WHERE), then fetchall the results (even though you already selected them), then you can iterate through them.

NB: Tuple! When you do the funky security thing, which I can't explain and don't care about since I am only running local code, the argument has to be a tuple, so if you are just passing one argument you need a trailing comma:

cursor.execute("SELECT * FROM outfits WHERE scraped=?", (0)) # fail 

cursor.execute("SELECT * FROM outfits WHERE scraped=?", (0,)) # success, due to the last comma there

Also, one of the many pages I was poking around at suggested SQLite Manager, a plugin for Firefox. There may be other similar things, I have no idea, but I really like it, it's free, and if you don't have anything that allows you to view the innards of your SQL db easily, I strongly recommend it. If you don't use Firefox, heck it's just another app (I tend to think I don't need three browsers on my machines, but hey).

More also, it is apparently a good idea to store really long ID numbers as text, not numeric. (Because something, somewhere, decided to round them all off so they were all wrong.)

Thursday, April 9, 2015

Kickstarter Talks

A great evening at Kickstarter HQ here in Greenpoint! Three fantastic talks about processes they use there:

  1. Kumquat: Rendering Graphs and Data from R into Rails, by Fred Benenson, Head of Data.
  2. Rack::Attack: Protect your app with this one weird gem! by Aaron Suggs, Engineering Lead.
  3. Testing Is Fun Again, by Rebecca Sliter, Engineer.

It is always great to see uses of ggplot, especially on data upon which I also use ggplot, and I got to talk to Aaron about the throttling they do to stop malicious scrapers (I fall into the non-malicious camp of course!).

They have a really great little auditorium and they were pretty awesome and had some text to speech system for anyone who was hearing impaired -- that's the bright rectangle in the lower left (it's washed out due to contrast).
Here is Aaron talking about throttling overly requesty processes, which I found really funny since I have scraped Kickstarter but hopefully for good not for evil. 
Finally, here is a ggplot chart I made of some Kickstarter data, for Music, US projects, with various other long-winded details, but this shows that, out of those who succeeded on their first project (blue line), people with a lower pledged/funder ratio (left side) were slightly more likely to do a second project (higher on the Y axis) than those with a higher pledged/funder ratio. We call this ratio "the sugar daddy" measure, since if you are higher on this measure, maybe your rich uncle came in at the last minute to save your project.

Friday, April 3, 2015

R and Unlist Your List!

If you try to assign a list to a column in an R data frame, it won't quite work, you need unlist. (That's the short version for the search engine snippet, although it does not make for a great narrative intro, it's more the concise summary.)

A few days ago, I was working in R, and was generating a new data frame from another one. It was a little more complex than I was used to, for instance I had to bin one variable by the values of another variable, and make some new percentage/frequencies in the new, smaller, data frame, so I couldn't just use the non-looping approach that is common to R (and which is a lot nicer and fast). The data frame was relatively small, so one loop level was not a problem, even on my 4.5 year old MacBook Air.

In one section, I generated a list of values (numeric, nothing fancy), and then assigned that list to a column in my new data frame. When I called the data frame to look at it, it looked fine, but if I did str(my_df) or summary(my_df) something was horribly wrong -- the column wasn't a numeric column, it was some odd list format and wasn't working for my ggplot.

I tried assigning the generated values directly to the column in the data frame, with something like this inside the loop, where I also incremented i:

my_df[i, 'the_variable'] = one_generated_variable

(Note I can't use R syntax there with the greater than sign, Google barfs on the code even though it's text, so I have to use the equals sign which is older R style.)

one_generated_variable was just a numeric value. Should have been fine, I thought! But no, it still came out as a list. I have no idea why, honestly it seems impossible since the values were generated one at a time and assigned then and there -- they were not bundled into a list first. But, unlist fixed the problem.

my_df$the_variable = unlist(my_df$the_variable)

That did it. I still don't understand the details, since I don't see why it was a list in the first place (aren't columns vectors anyways?). I have never run into that problem before, although mostly I've been working in Python lately.

Also, a friend put me onto data.table instead of data.frame for bigger data.