Showing posts with label R. Show all posts
Showing posts with label R. Show all posts

Tuesday, April 7, 2020

US Maps in R

A really great read (it's a chapter) about the use of maps in R (at least, US maps specifically), from Kieran Healy's Data Visualization. What are you trying to show with your map? What is your data? Is it spatial? Or, maybe it's actually about population, so why is Montana bigger than Connecticut?

There are some great projections, there's the standard geographical one, and the weird "geography squished into population size" one (Figure 7.1, lower left), and the electoral college/population one isn't bad depending on what you are trying to do (Figure 7.1, lower right), although I end up liking the one that makes all the states the same size, each a square (statebins, in section 7.3). (Of course, what is a state? They are not all comparable at all! What is Washington, D.C.? Why not Puerto Rico? Etc.!)

No post about maps is complete without XKCD's heatmap comic and another on map projections, as well as a link to the segment from The West Wing about map projections which everyone should watch.

Friday, February 8, 2019

Sorting Out Regressions

A great post over at R-Bloggers, [link broken as of 12/2020] "15 Types of Regression you should know." Also, how to choose which one is the right one? So many stats books (I have a few) are just terrible. They tend to throw stats language at people, and it's incredibly bizarre (p-hat? a hat? are you kidding me?) which is a problem because language is supposed to enlighten, not confuse. They also, at least for my English-language background, tend to throw Greek letters around and assume you know what the heck they are, which is an idiotic assumption. Again, language, especially in a textbook that is supposed to be explaining things, should be enlightening and clear, not obtuse. But, all the textbooks I have rarely cover anything but the most basic regression, and they always throw the regression equation at you, which is weird since never have I seen a paper with a regression equation in it, it's always a table.


Edit: Except now the post is no longer there and the URL redirects to R-Bloggers' best guess. Possibly this is the post, or something like it: https://www.listendata.com/2018/03/regression-analysis.html

Monday, July 4, 2016

Making a Spectrum/Gradient Color Palette for R / iGraph

How to make a color gradient palette in R for iGraph (that was written tersely for search engine results), since despite some online help I still had a really hard time figuring it out. As usual, now that it works, it doesn't seem to hard, but anyways.


(I had forgotten how horrible blogger is at R code with the "gets" syntax, the arrow, the less than with a dash. Google parses it as code, not text, and it just barfs all over the page, so I think I have to use the equal sign [old school R] instead. It is also completely failing at typeface changes from courier back to default. I see why people use WordPress....)

The way I will do it here takes six steps (and so six lines of code). There are a few different ways you could do this, such as where you set the gradient or if you assign the vertices (nodes) the colors in the graph object or at use them at the time of drawing but not actually assigning them in the graph object itself. The variable I based the gradient on is an integer, and given my analysis I'm making a ratio of "for each item in my data, what is its percentage on that variable compared to the maximum?" It's a character level in a game, so if a character is level 5 and the max level is 10, then the value I want is 0.5 (i.e. half).

Keep in mind that the gradient you use here isn't analog (like a rainbow with thousands [more I think] of colors), it's a finite number of colors, with a starting color and an ending color. If your resolution is 10 then you have ten colors in your gradient, determined by the software as 8 steps between the color you told it to start at and the color you told it to end at (8 steps + start color + end color = 10 colors).

The general conceptual steps for how I did it:
  1. Set the resolution for the gradient, that is, how many color steps there are/you want.
  2. Set up the palette object with a start color and an end color. (Don't call it "palette" like I did at first, that is apparently some other object and it will blow up your code but the error message won't help with figuring it out.)
  3. You'll want a vector of values that will match to colors in the gradient for your observations, for what I'm doing I got the maximum on the variable in one step...
  4. And then set up the vector in the second step (so, this is a vector of the same length as the number of observations you have, since each value represents the value that matches up against a color in the gradient). (In my code here, it's a ratio, but the point is you have numerical values for your observations [your nodes] that will be matched to colors in the gradient.)
  5. Create a vector that is your gradient that has the correct color value for each observation. (The examples of this I could find online were very confusing, and that's why I'm making this post.)
  6. Draw! (Or you could assign colors to your graph object and then draw.)
Let's look at some code and, on occasion, the resulting objects. (I'll include the code as one code block below this explained version.)

Don't forget library(igraph) 

Also, if you're new to iGraph, note that it uses slightly odd (well to me at least) syntax, or you can use slightly odd syntax, to access and assign values to the nodes, that is, the Vertices of your graph, with V(your_igraph_object), which looks a little odd when you do V(g)$my_variable, for instance. (Below I do use "my_whatever" to highlight user made objects, except I did use just "g" for my iGraph graph object.)

Also note that, I think, the my_palette object is actually a function, but it definitely isn't a "palette" in the sense of a selection (or vector) of colors or color values. I think that is part of what makes line 4, below, unusual. Maybe I should have used my_palette_f to be more clear, but if you've made it this far, I have faith in you. (Also note that colorRampPalette is part of R, not part of iGraph.)

Using the language from the above steps...
  1. Set resolution, I'm using 100: my_resolution = 100
  2. Set palette end points, this starts with low values at blue and high values at red: my_palette = colorRampPalette(c('blue','red'))
  3. Get the max from your variable you want colorized to make the ratio: my_max = max(V(g)$my_var_of_interest, na.rm=TRUE)
  4. Create your vector of values which will determine the color values for each node. For me it was a ratio, so based on the max value: my_vector = V(g)$my_var_of_interest / my_max
    • Notice here we have iGraph's V(g)$var syntax.
  5. Create the vector of color values, based on your variable of interest and the palette end points and the resolution (how many steps of colors). This will give you a vector of color values with the correct color value in the correct location for your variables in your df-like object: my_colors = my_palette(my_resolution)[as.numeric(cut(my_vector, breaks=my_resolution))]
    • Ok let's explain that. Take my_vector, and bin it into a number of parts -- how many? That's set by the resolution variable (my_resolution). By "bin" I mean cut it up, divide it up, separate it into my_resolution number of elements. So if I have 200 items, I am still going to have 100 colors because I want to see where on the spectrum they all fall. Take that vector as.numeric (since maybe it comes back as factors, I don't know, I didn't poke at that.) Send that resulting vector of numeric elements (which are determined by my_var_of_interest and my_resolution) to the my_palette function along with my_resolution, which returns a vector of hex color values which are the colors you want in the correct order.
  6. Draw! plot(g, vertex.color=my_colors)
    • Note that we aren't modifying the colors in the iGraph object, we're just assigning them at run time for plot(). We could assign them to the iGraph object and them draw the graph instead.
Done! Let's look at two of the resulting vectors (but you should be using RStudio of course so you can see them anyways), as when I did it helped me understand what was going on.

So, my_vector is the vector of values for the variable of interest which determine the colors. They aren't the color values themselves, they are the positions on the scale which will get mapped to colors in the spectrum / gradient. (Note I have 1,019 observations in this data.)

my_vector   num [1:1019] 0.31 0.581 0.112 0.108 0.181 ...

So, we can see these are ratios and we know they're between 0 and 1 since that's how I set it up. (A percentage of the max value in this data.) These will map to the right colors in the gradient. Note we can change the gradient, either its start color, end color, or the resolution (how many steps), and this my_vector won't change. This my_vector gets mapped to the colors. What the colors in the gradient are depends on the start color, the end color, and how many steps in the gradient there are.

Then there is also my_colors, which have colors in hex! Exciting to see it work.

my_colors   chr [1:1019] "#4D00B1" "#92006C" "#1900E5" "#1900E5" ...

If you are great at mentally mapping hex RGB values to colors between blue and red to a percentage between blue and red (blue and red being the start [i.e. 0] and end [i.e. 1] points as determined in line 2 up above) you'll note that the values in my_vector do indeed map to the colors in my_colors which is cool. (You will notice all the middle two values, the green in RGB, are 00, since there is no green when you go from blue to red.) Note that the 3rd and 4th values in the hex list (my_colors) are the same, as they are mapping from 0.112 and 0.108, which are, when binned into 100 bins, both being approximated to, most likely, 0.11. Thus they have the same color value (which is 19 in hex of red, RGB or #RRGGBB, and E5 of blue, so E5 is out of FF max, so lots of blue and a little red, as they are both 11% of the way on the scale from the bottom (blue) end to the top (red) end. This makes sense.)

So, there you go.

# Set up resolution and palette.
my_resolution = 100
my_palette    = colorRampPalette(c('blue','red'))

# This gives you the colors you want for every point.
my_max    = max(V(g)$my_var_of_interest, na.rm=TRUE)
my_vector = V(g)$my_var_of_interest / my_max
my_colors = my_palette(my_resolution)[as.numeric(cut(my_vector, breaks=my_resolution))]

# Now you just need to plot it with those colors.
plot(g, vertex.color=my_colors)

Friday, April 3, 2015

R and Unlist Your List!

If you try to assign a list to a column in an R data frame, it won't quite work, you need unlist. (That's the short version for the search engine snippet, although it does not make for a great narrative intro, it's more the concise summary.)

A few days ago, I was working in R, and was generating a new data frame from another one. It was a little more complex than I was used to, for instance I had to bin one variable by the values of another variable, and make some new percentage/frequencies in the new, smaller, data frame, so I couldn't just use the non-looping approach that is common to R (and which is a lot nicer and fast). The data frame was relatively small, so one loop level was not a problem, even on my 4.5 year old MacBook Air.

In one section, I generated a list of values (numeric, nothing fancy), and then assigned that list to a column in my new data frame. When I called the data frame to look at it, it looked fine, but if I did str(my_df) or summary(my_df) something was horribly wrong -- the column wasn't a numeric column, it was some odd list format and wasn't working for my ggplot.

I tried assigning the generated values directly to the column in the data frame, with something like this inside the loop, where I also incremented i:

my_df[i, 'the_variable'] = one_generated_variable

(Note I can't use R syntax there with the greater than sign, Google barfs on the code even though it's text, so I have to use the equals sign which is older R style.)

one_generated_variable was just a numeric value. Should have been fine, I thought! But no, it still came out as a list. I have no idea why, honestly it seems impossible since the values were generated one at a time and assigned then and there -- they were not bundled into a list first. But, unlist fixed the problem.

my_df$the_variable = unlist(my_df$the_variable)

That did it. I still don't understand the details, since I don't see why it was a list in the first place (aren't columns vectors anyways?). I have never run into that problem before, although mostly I've been working in Python lately.

Also, a friend put me onto data.table instead of data.frame for bigger data.

Sunday, January 25, 2015

Random R Notes - Factors, Rank v. Order, Unsplit

Some R issues I have run into recently...

I split a dataframe, then split it again, and the analysis was taking forever. Something was wrong. After inspection, the sub-split DF had all the factor levels from the original DF! Terrible since there were about 25,000 in one variable. It just took forever, which I don't think it should but hey I gave up on it.

I needed droplevels. You can apply it to the whole DF, removing unused levels from your old DF and then assign it (to a new one or just write over the old one).

your_new_df = droplevels(your_old_df)

(Google cannot handle the less than sign, used for "get" in R instead of =, with either the code tag or the pre tag, it just blows it up. Annoying.)

Then I could run the ordering code on dates. But no: there is, I learned, order and there is rank. There is also sort but I managed to avoid that somehow, so I won't discuss that here.

Note that for rank you need to figure out what to do with ties! (That is, when values are equal, how to rank them exactly.)

There was a really great post about it on Stackoverflow but I can't find it at the moment. This post might help, though.

Or, I made a nice little example! I use R: as the start of the input lines since the greater than symbol and blogger are not friends.

R: the_list = c('A', 'D', 'B', 'C')

R: order(the_list)

[1] 1 3 4 2

R: rank(the_list)

[1] 1 4 2 3


So, you see the two outputs are different.
Order says, put the first element first, then the third element would come next (B), then the fourth element (C), then the second element (D).
Rank says, the first element is first, the second element (D) is the fourth of them all, the third element (B) is the second overall, and the fourth element (C) is the third overall.

Edit: I called sort!

R: sort(the_list)


[1] "A" "B" "C" "D"


That's awesome.

So after that, I wanted to unsplit. But, no, I had added the rank column, so instead of 400 rows I only got 100 (I had 100 df's with 4 rows each). Unsplit does not work well (or at all?) if you add (or subtract?) items. So stackoverflow told me I needed do.call and rbind ("row bind").

rejoined_df = do.call(rbind, splitted_df)


Note that the splitted_df is the result of the split() call, which I'm not showing, so is not actually a DF, I think it's a list of DFs maybe. Or maybe it's not a typical DF. But you can call it directly, and if you use split you should familiarize yourself a bit with the resulting object.

There you have it, some random R notes.

Wednesday, October 23, 2013

R and Regex Named Matches

I use Python and R to do stuff, Python for web scraping and text clean up, R for the analysis. But people have expanded the functionality of the two, and they are overlapping (it's enough to not get them confused as it is some of the time). I found I needed to use named groups in regex in R, and... couldn't figure it out. The web did not help.

SHORT VERSION: Turn on the Perl regex style (perl = TRUE) and go read some Perl regex pages, you'll be fine. Name the match: (?<name>...), to match it later: \\g{name}
This is completely different from what I was used to.
Google blogger will try to blow this post up as I want to have greater than and less than symbols. Yeah they can't get that right. Oh maybe it's working.

Typically, if I use regular expressions it's in Python, but R can do it too, and sometimes you'll want to do that. But there isn't a ton of help online about it (despite the links I have lined up to include below) and there are some things that confuse the issue (to Perl or not to Perl...).

If you just want to do some work with strings, first check out Hadley Wickham's stringr. It's awesome.

I, however, wanted to do some pattern matching that included a repeated section, so I needed regex's named group functionality, which I couldn't find or figure out in stringr. I was looking for patterns like this:

5,-1,5,-1,5

...where 5 could be any number between 0 - 500 or so, but it would repeat. I had already removed spaces and added commas for easier parsing. (So other matches would be, like, 17,-1,17,-1,17.... etc.) So I needed to make sure the first match there was repeated, thus, named groups (or any group capture really, but I wanted to name it).

But I also couldn't figure it out in R. I can do it in Python, but the Python code for regex wouldn't work in R, alas. It was not clear what changes needed to be made.

One reason was the the \ needs to be escaped, that is, \\. So for example, \d+ needed to be \\d+. That wasn't too hard to figure out. But the rest was.

You can have Perl style, or POSIX, or not. Uh, what? No idea! I just needed it to work. Specifically, named groups in R. I found this page which said "Named subpatterns... are not covered here." Hmm. Another page said how "examples for the use of regex in R are rather rare" and had some useful examples. Eventually I figured I would set the Perl option and see what I could do; at least I could search on "perl", and that made all the difference as I could find out how to do named groups in Perl-style regex and there you go.

Name a group in Python: (?P<name>...)    
Name a group in R, Perl style: (?<name>...)

Note: I expect the less than and greater than symbols fail at some point.

Reference it later in Python: (?P=name)
Reference it later in R, Perl style: \\g{name}
    So curly braces (Perl?), and double backslash for R.

Some useful Perl-regex links:
http://modernperlbooks.com/books/modern_perl/chapter_06.html
http://perldoc.perl.org/perlre.html

Although honestly one problem I have with a lot of online examples (and the R help files) is that they are completely arcane. If I'm looking for help with syntax, a complex example isn't going to solve it, that's bad usability.


Post Keywords: regex, R, r-project, cran, grep, regular expressions, named groups.

Monday, October 21, 2013

R and head() and tail()

So, if you're not careful using R's head() and tail() commands, you'll end up with a little surprise. Perhaps I should say not careful reading the documentation.

Head() and tail() do not return just one item from the list (or whatever), they return several. So head does not mean first, and tail does not mean last.

Read carefully: "Returns the first or last parts of a vector, matrix, table, data frame or function" [Italics added.] PARTS. Plural. An 's' on there.

Example:

our_list = list(3, 7)       # Makes a list with two items, the first is 3, the second is 7. Integers.


Note I used "=" since if I use a "less than" bracket, Google blogger freaks out. (The typical R code is "less than" followed by a dash, which make an arrow, representing "gets", the left side gets [is given] the right side.) It's giving me a hard time with formatting as it is.
If you type in "our_list", R will print our_list:

[[1]]
[1] 3

[[2]]
[1] 7

So, the first item in our_list [[1]] has one item [1], which is a 3.
The second item in our_list [[2]] has one item [1], which is a 7.

I don't fully understand the difference between [[x]] and [x], it seems mysterious.
Edit: The R Inferno, 8.1.54.... aha. Still is mysterious, though.

If you type:
head(our_list)


...it would be nice to get just the head, that is, the first item. But no, you get the whole list (since the list is small you get the entire lists, larger lists would only return the first few items).

What you want is:
head(our_list, n=1)

...where the 'n' gives the value of how many items you want. (You don't actually need the n=, I have noticed.)
When I try "n=3" for this two item list, it just gives the first two items (i.e., the entire list in this case) and does not give an error.

Note I made our_list have [[1]] == 3 and [[2]] == 7 since far too often [[1]] == 1 and [[2]] == 2 and really people that's just not clear. If you're trying to make a useful example, don't make it where the same symbols (1, 2) are being used to represent widely different things.

Also, Googling for info on R's "by" command is just impossible, as "r by" is not a specific enough search string (in-context it's fine though). That's why I like books (yes paper) sometimes, if the index is any good, there you go.

Sunday, October 6, 2013

R For Loops Indexing

R does something a little unexpected -- well, unexpected to me -- with the indexing of the for loop (and maybe this is more general, I don't know).

If you have... (note I can't use "get" with the arrow made of brackets, Google does not parse that in terms of HTML and it kills the code...)

n = 5
for (i in 1 : n+1) {
    do stuff
}

...the index is 2 to 6, not 1 to 6. The +1 gets added to both the indices.
So it's like (i in (1:n) + 1) kind of.

What you need is....

for (i in 1 : (n+1)) {....

This is related to off by one errors (humorous explanation and more serious explanation), but I certainly didn't expect it to be parsed like that.

Monday, November 19, 2012

R and Date Formats

There are many ways to represent dates in computer languages. You can have something human-readable, with a year, month, and day, in some order and some combination of text and numbers.

  • 2012-11-19, year month day.
  • 11/19/12, US month day year.
  • 11/12/11, Nov 12 US or Dec 11 EU.
  • Nov-19-2012, a combination of text and numbers, to a human.
  • "2012-11-19", which is probably text to a computer given the quotation marks, but numbers to a human.
  • 1353340800, a big number that means nothing to most people, but is today at 4pm in seconds since Jan 1, 1970. This is a standard time format, I've found.
We will gloss over leap years, leap seconds, and different historical calendar systems, but it's all a fascinating and nuanced topic.

But I ran into a huge (yet small) problem with time formats in R, and trying to convert from seconds since 1-1-1970 to some current dates, where I had 45,000 data points. If I tried to analyze them, they ended up as 45,000 different points and didn't aggregate at all. Hard to visualize. So I wanted to aggregate them all by month, and instead of 45,000 points I would have, say 120 (10 years, 12 months each). I'm taking averages, and this approach makes sense. This turned out to be harder than I thought -- essentially I was just trimming everything but the year and month (days, hours, minutes, seconds). Simple!

Edit: I'm including a screen shot of the code, so Google doesn't destroy it (due to the characters in it and how the R code is interpreted by the Google HTML parser).
"the_df" is the data frame.
"cm" is creation month.
"bday" is the seconds since 1-1-1970.


The "as POSIX" call converts the number of seconds as a number to number of seconds as time.
The "strftime" strips off the day (off the end) and makes it a text string (string-from-time, I think).
The paste is the key part, it won't work without it, you have to add the day back on (the 1st of the month for this analysis). I think this makes it a string value (not a date).
Then the "as date" call converts it to a date from a text string.
(Or something generally like that.)

The point is, if you use "as.Date" in R, you have to hand it a day. I tried just handing it year-month, but that ends up as NA, even though I can't find anything that says that kind of thing shouldn't work.



Monday, August 20, 2012

R and Outputs and Printing and Loops

First off, R is really similar to python and it is making some things easier and some things more difficult.

But, R apparently won't print in loops. This is sort of weird.

So if you have...

for (i in 1:5) { summary(my_lm) }

...that won't output to the screen, but if you then just select the "summary(my_lm)" part and run that it prints fine like it should albeit once (perhaps that's a hint of sorts).

So you need...

for (i in 1:5) { print(summary(my_lm)) }

...which works like summary(my_lm) works when it's not in a loop.

Update: Apparently the same thing is true for two other conditions! One, I believe, is for outputs in general, so includes if you are trying to output to a file (or at least save to a file). So if you are trying to do something like the following generic example:

pdf("my_filename.pdf")
plot(x, y)
dev.off()


...generally, that works -- unless it's in a loop! Again, you need to do:

pdf("my_filename.pdf")
print(plot(x, y))
dev.off()


Note I am assuming you know how to do the whole pdf() or maybe there's jpg() or whatever with dev.off() at the end there.

Two, is that the output (to screen or file) call doesn't have to be in the loop itself, not exactly. It can be in a function which is called by a loop, and R notices it, so again you need to explicitly call "print()" with whatever you were trying to do that works fine when not in a loop. 

So to be clear, this does not work:

make_Plots <- function(ii) {
    # Makes the plots, saves to PDF (no it doesn't!)
    my_file <- paste("My_filename_sequence_", ii, ".pdf", sep="")
    pdf(my_file)
    plot(x, y)
    dev.off()
} # End of make_Plots 


# Main, as it were.
for (i in 1:10) {
    make_Plots(i)
}



So you see in the function call, we change the output device to be the PDF device (not the screen or anything else). So, when we call "plot" and then close the PDF device with "dev.off()" the plot call gets outputted into a PDF file with the specified name. But it doesn't, since this is an output/print call that is in a loop -- even though the loop is in the main and the output call is not!

Again, you need print(), like so:

make_Plots <- function(ii) {
    # Makes the plots, saves to PDF (this one does!)
    my_file <- paste("My_filename_sequence_", ii, ".pdf", sep="")
    pdf(my_file)
    print(plot(x, y))
    dev.off()
} # End of make_Plots 


# Main, as it were.
for (i in 1:10) {
    make_Plots(i)
}


So, I believe if you were to call the first make_Plots (no print) while control is not in a loop, it would work fine. But if you then call this otherwise fine and working and you tested it function while in a loop, it will stop working. That is insanely annoying, but I assume there is some reason for it. Sort of addressed in this FAQ, but just briefly.

Wednesday, July 25, 2012

Because You Should Use R

A colleague asked me about starting with R, which is not the easiest thing to do unless you know the ideas behind object-oriented programming (for instance, you don't open files, you load a spreadsheet or CSV into a data object, probably a dataframe -- this is not Excel or SPSS where you have the file open and staring you in the face).

I came up with a list of intro books I have so far found useful, besides help from friends and all of the awesome help online.

There are some good books by the publisher Springer, in its "Use R!" series and related books, with nice matching spines for if they are ever shelved (so far mine are all desk).

Also, here's a useful intro PDF a friend of mine put me onto, it's by John Verzani, if you prefer digital to paper.

Springer

A Beginner's Guide to R
www.amazon.com/Beginners-Guide-Use-Alain-Zuur/dp/0387938362/

Introductory Statistics with R
http://www.amazon.com/Introductory-Statistics-R-Computing/dp/0387790535/

For graphing you will use the excellent ggplot (aka ggplot2) package.
ggplot2: Elegant Graphics for Data Analysis
www.amazon.com/ggplot2-Elegant-Graphics-Data-Analysis/dp/0387981403/

If you are an SPSS or SAS user, a Rosetta Stone:
http://www.amazon.com/SAS-SPSS-Users-Statistics-Computing/dp/1461406846/

Or if you are a Stata person instead:
http://www.amazon.com/R-Stata-Users-Statistics-Computing/dp/1441913173/


And some non-Springer books too.
I have liked both of these a great deal.

Using R for Introductory Statistics
http://www.amazon.com/Using-Introductory-Statistics-John-Verzani/dp/1584884509/


R in a Nutshell
http://www.amazon.com/Nutshell-Desktop-Quick-Reference-OReilly/dp/059680170X/

That should get you started!

Thursday, June 14, 2012

Mixed Shapes/Colors in R's ggplot

R is easier to learn than Dwarf Fortress, and just as awesome, although it is for statistical analysis and visualization, not beer-swilling dwarf sim fans. But I had a little problem when trying to get both different shapes and colors, at the same time, in a scatterplot in ggplot.

(Ok I went back to edit this and Google destroyed it, sometimes blogger is ok with greater- and less-than, other times, it will destroy the entire post.) But, what was going on? I knew you could do both the color and the shape simultaneously, as there is an example of both shape and color being user-set in the ggplot2 book by ggplot's creator, Hadley Wickham, but there was no code sample (page 112, figure 6.14).

Turns out you have to "activate" the attributes you want to set manually!

I had a hard time finding and understanding the extensive and somewhat diverse help sources, so thought I'd put this up.

Here is the final working code -- I had to go back into my files to get it. Note I'll use the equals sign here (so the post doesn't blow up again).

my_graph = ggplot(my_csv_data) + 
    geom_point(aes(my_x_value, my_y_value, color = Desc, shape = Desc)) +
    scale_colour_manual(name = "Legend Name", values = c("This" = "black", "That" = "red", "Also" = "grey33")) +
    scale_shape_manual(name = "Legend Name", values = c(1, 2, 5)) 

The comment I have after that code:
You need BOTH color = and shape = in the aes call or else the later scale calls won't work for whichever you don't have!