Thursday, February 19, 2015

Nice Little Python Trick

I can't summarize this for a headline, but I have a list (CSV) of KickStarter data, where each line is a project and includes the project URL and founder KickStarter username. I wanted to go in and get the biographies for each founder who had more than two projects. First I needed to drop all the one- and two-project people (easy), but then I'd have multiple URLs for each remaining founder. So, for someone with three projects, I'd have three unique URLs but only needed one.

I imagined sorting, or splitting, or checking against founder names that were already accounted for. Horrible. Then I realized... Dictionaries! With founder as the key.

So I read the CSV as a list of dicts (typically how it is done, but so few examples show this online it is horrid), so had

data_list[i]['founder']
and
data_list[i]['proj_url']
to work with.

And now Google insists on line breaks there with the pre tag. Sigh.

But, the solution! Since I only needed one URL per founder, it didn't matter which one I had. So I could just loop through the data once, and grab every URL, and let the dict just overwrite founder-key entries with any URL for that founder. So, the following code only loops once, returns a nice dict object, and uses founder names as the keys.

url_dict = dict()
for project in data_list:
    url_dict[project['founder']] = project['proj_url']

So for someone with, say, three projects, it will assign the first URL to their username, then assign the second URL and overwrite the first URL, and then the same for the third URL, overwriting the second. So I end up with every username associated with one appropriate URL. Perfect. No sorting, no checking, no nothing. Automatic, essentially.

So I thought that was nice.