Monday, July 4, 2016

Making a Spectrum/Gradient Color Palette for R / iGraph

How to make a color gradient palette in R for iGraph (that was written tersely for search engine results), since despite some online help I still had a really hard time figuring it out. As usual, now that it works, it doesn't seem to hard, but anyways.

(I had forgotten how horrible blogger is at R code with the "gets" syntax, the arrow, the less than with a dash. Google parses it as code, not text, and it just barfs all over the page, so I think I have to use the equal sign [old school R] instead. It is also completely failing at typeface changes from courier back to default. I see why people use WordPress....)

The way I will do it here takes six steps (and so six lines of code). There are a few different ways you could do this, such as where you set the gradient or if you assign the vertices (nodes) the colors in the graph object or at use them at the time of drawing but not actually assigning them in the graph object itself. The variable I based the gradient on is an integer, and given my analysis I'm making a ratio of "for each item in my data, what is its percentage on that variable compared to the maximum?" It's a character level in a game, so if a character is level 5 and the max level is 10, then the value I want is 0.5 (i.e. half).

Keep in mind that the gradient you use here isn't analog (like a rainbow with thousands [more I think] of colors), it's a finite number of colors, with a starting color and an ending color. If your resolution is 10 then you have ten colors in your gradient, determined by the software as 8 steps between the color you told it to start at and the color you told it to end at (8 steps + start color + end color = 10 colors).

The general conceptual steps for how I did it:
  1. Set the resolution for the gradient, that is, how many color steps there are/you want.
  2. Set up the palette object with a start color and an end color. (Don't call it "palette" like I did at first, that is apparently some other object and it will blow up your code but the error message won't help with figuring it out.)
  3. You'll want a vector of values that will match to colors in the gradient for your observations, for what I'm doing I got the maximum on the variable in one step...
  4. And then set up the vector in the second step (so, this is a vector of the same length as the number of observations you have, since each value represents the value that matches up against a color in the gradient). (In my code here, it's a ratio, but the point is you have numerical values for your observations [your nodes] that will be matched to colors in the gradient.)
  5. Create a vector that is your gradient that has the correct color value for each observation. (The examples of this I could find online were very confusing, and that's why I'm making this post.)
  6. Draw! (Or you could assign colors to your graph object and then draw.)
Let's look at some code and, on occasion, the resulting objects. (I'll include the code as one code block below this explained version.)

Don't forget library(igraph) 

Also, if you're new to iGraph, note that it uses slightly odd (well to me at least) syntax, or you can use slightly odd syntax, to access and assign values to the nodes, that is, the Vertices of your graph, with V(your_igraph_object), which looks a little odd when you do V(g)$my_variable, for instance. (Below I do use "my_whatever" to highlight user made objects, except I did use just "g" for my iGraph graph object.)

Also note that, I think, the my_palette object is actually a function, but it definitely isn't a "palette" in the sense of a selection (or vector) of colors or color values. I think that is part of what makes line 4, below, unusual. Maybe I should have used my_palette_f to be more clear, but if you've made it this far, I have faith in you. (Also note that colorRampPalette is part of R, not part of iGraph.)

Using the language from the above steps...
  1. Set resolution, I'm using 100: my_resolution = 100
  2. Set palette end points, this starts with low values at blue and high values at red: my_palette = colorRampPalette(c('blue','red'))
  3. Get the max from your variable you want colorized to make the ratio: my_max = max(V(g)$my_var_of_interest, na.rm=TRUE)
  4. Create your vector of values which will determine the color values for each node. For me it was a ratio, so based on the max value: my_vector = V(g)$my_var_of_interest / my_max
    • Notice here we have iGraph's V(g)$var syntax.
  5. Create the vector of color values, based on your variable of interest and the palette end points and the resolution (how many steps of colors). This will give you a vector of color values with the correct color value in the correct location for your variables in your df-like object: my_colors = my_palette(my_resolution)[as.numeric(cut(my_vector, breaks=my_resolution))]
    • Ok let's explain that. Take my_vector, and bin it into a number of parts -- how many? That's set by the resolution variable (my_resolution). By "bin" I mean cut it up, divide it up, separate it into my_resolution number of elements. So if I have 200 items, I am still going to have 100 colors because I want to see where on the spectrum they all fall. Take that vector as.numeric (since maybe it comes back as factors, I don't know, I didn't poke at that.) Send that resulting vector of numeric elements (which are determined by my_var_of_interest and my_resolution) to the my_palette function along with my_resolution, which returns a vector of hex color values which are the colors you want in the correct order.
  6. Draw! plot(g, vertex.color=my_colors)
    • Note that we aren't modifying the colors in the iGraph object, we're just assigning them at run time for plot(). We could assign them to the iGraph object and them draw the graph instead.
Done! Let's look at two of the resulting vectors (but you should be using RStudio of course so you can see them anyways), as when I did it helped me understand what was going on.

So, my_vector is the vector of values for the variable of interest which determine the colors. They aren't the color values themselves, they are the positions on the scale which will get mapped to colors in the spectrum / gradient. (Note I have 1,019 observations in this data.)

my_vector   num [1:1019] 0.31 0.581 0.112 0.108 0.181 ...

So, we can see these are ratios and we know they're between 0 and 1 since that's how I set it up. (A percentage of the max value in this data.) These will map to the right colors in the gradient. Note we can change the gradient, either its start color, end color, or the resolution (how many steps), and this my_vector won't change. This my_vector gets mapped to the colors. What the colors in the gradient are depends on the start color, the end color, and how many steps in the gradient there are.

Then there is also my_colors, which have colors in hex! Exciting to see it work.

my_colors   chr [1:1019] "#4D00B1" "#92006C" "#1900E5" "#1900E5" ...

If you are great at mentally mapping hex RGB values to colors between blue and red to a percentage between blue and red (blue and red being the start [i.e. 0] and end [i.e. 1] points as determined in line 2 up above) you'll note that the values in my_vector do indeed map to the colors in my_colors which is cool. (You will notice all the middle two values, the green in RGB, are 00, since there is no green when you go from blue to red.) Note that the 3rd and 4th values in the hex list (my_colors) are the same, as they are mapping from 0.112 and 0.108, which are, when binned into 100 bins, both being approximated to, most likely, 0.11. Thus they have the same color value (which is 19 in hex of red, RGB or #RRGGBB, and E5 of blue, so E5 is out of FF max, so lots of blue and a little red, as they are both 11% of the way on the scale from the bottom (blue) end to the top (red) end. This makes sense.)

So, there you go.

# Set up resolution and palette.
my_resolution = 100
my_palette    = colorRampPalette(c('blue','red'))

# This gives you the colors you want for every point.
my_max    = max(V(g)$my_var_of_interest, na.rm=TRUE)
my_vector = V(g)$my_var_of_interest / my_max
my_colors = my_palette(my_resolution)[as.numeric(cut(my_vector, breaks=my_resolution))]

# Now you just need to plot it with those colors.
plot(g, vertex.color=my_colors)