Wednesday, January 7, 2009

Of Surveys and Statistics

Two items today: A very bad online survey (sadly most surveys I take online have problems), and the statistical package R makes the NYTimes!

Let's start with R. You can read about R in the NYT story, or on Wikipedia, or go to R's homepage itself.

But, what I want to point you to (besides R's open source status), is the horrible quote in the NYT article from a marketing shill from SAS, a competing statistics program (along with SPSS and Stata).

...Anne H. Milley, director of technology product marketing at SAS. She adds, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”

If it is free, it is bad. So annoying, this is FUD. She has no facts about R to back up her claim. What we do know is that she is badmouthing a competitor. She must be concerned, which gives legitimacy to R when she was trying to do the opposite. If SAS is paying her to market for SAS, why is she marketing for R? (It's always a bad day when I can do marketing better than a marketer.)

In survey news, I took an online survey this morning -- it's fun to see what they are asking about, and (especially for me) how they do it. Often they do it badly.

Today's inbox offering turned out to not be a survey, but a pre-survey. Do you qualify to take a survey? That's annoying. Don't make your sample jump through hoops. Given that the emails only go out to people who signed up (and who have an Internet connection), the sample is already rather small and unrepresentative statistically speaking.

One question in the pre-survey (this is important for the story) was about brands of jeans, and which brands I might have heard of. There was a large list of brands, and in many cases I had heard of the brand but didn't know they made jeans (that was not important to the story, but was just something I found interesting). But, there was a list. Keep that in mind.

So, eventually after saying "no!" to the common "do you or anyone in your household work in PR, marketing, advertising, or market research" question since you know if you say yes they don't let you take the survey, I got to the real questions. One other important tidbit is that the pre-survey and the survey were by different companies.

The survey asked about purchasing habits and jeans. The survey authors made lots of bad assumptions. One was that they assume I remember how much I paid for my recent jean purchases. I have no idea. None. Seriously, if I like them (and price is a factor), I buy them (assuming I am actually shopping for jeans, that is). Once I notice that the price is acceptable, I don't need to know the price anymore or ever again. It is completely irrelevant. This is a serious error in a lot of online surveys.

Another problem is that they assume I know where I purchased the jeans. I know I bought Gap jeans at the Gap, but couldn't remember where I bought Levi's (there are a million stores here in NYC). Oh, wait, I got them at the Levi's Store, maybe. I think so. I don't know exactly, it isn't important. (War in Iraq, global economic meltdown, those are important, where I bought some pants, not a big deal really.)

Another problem (am I using that phrase too much?) was when the real survey asked me what brands of jeans I had heard of. This was the exact same question I had seen in the pre-survey. Yes, two different survey companies, but the list seemed identical and the two companies are co-ordinating anyways because of the pre-survey. Technically I had heard of all of them just a minute previously! When you create a survey, you need to be aware of question ordering, and you need to be aware of what facts (or not) you prime the respondent with (so if you mention to respondents that Gerald Ford was photographed falling down the steps of an airplane, some respondents might think he was clumsy while others might feel sympathy for him, or if you mention how he went to UM, people who are pro-UM might feel more positively about him, while OSU and MSU people might feel more negatively about him -- might).

The other issue with the second question was that the HTML layout was terrible compared to the first version. For the first question, you could click a "yes" box if you had heard of the jeans, but otherwise ignore the entry. For the actual survey, there was a yes and a no button for every entry. There were about 30 brands. I don't want to click yes or no 30 times, how about I just click yes for the 7 I know? Don't make it difficult on your respondents to answer your questions.

What really irritated me, from a methodological point of view, was this question. Let me first point out they (whoever it was) made the survey so you couldn't copy and paste (way too complex HTML), but luckily there is always the "view source" option.
Q: Which two of the following categories are most important to you, meaning you would not be willing to cut your spending if you needed to spend less on something?

The categories were mundane, such as apparel, personal electronics, dining out, home improvement, sports, and the like. The problem is that they assume that their categories include two things in which you the respondent would not be willing to reduce your spending. That is a horrible assumption and it is flat out wrong. With incorrect questions and incorrect answer sets you end up with incorrect data. If the economy is bad enough, and it is, I'll cut spending across the board! What they think they are doing by forcing the answer set on you is forcing you to reveal your deeply-hidden inner truths. In some survey circumstances (which are far removed from this type of question), that may be appropriate -- for instance, asking a 4-point question (very yes, somewhat yes, somewhat no, very no) and not mentioning "not sure" but actually having it as an option (usually in face to face or over the phone interviews, this doesn't work with fixed online surveys).

"Most important" does not equal "not be willing to cut your spending."

And why two?

Update: About forcing answers/opinions... From Bennett and Iyengar, in the Journal of Communication, 58(4). They mention "...news-driven polling that pushes people with no basis for having opinions into opinion expression..." That's what I'm talking about.