Ah, yes, memories of Stats...
Tannin said:
It's a self-selecting sample [...] you must have an appropriate sample selection method. To try to do otherwise achieves one thing only: it demonstrates your scientific incompetence beyond all doubt. [...] your original data remains meaningless, as does any conclusion you draw from it.
Indeed, you are correct, but we're often presented with imperfect data collected via flawed sampling in the real world and have to make do with it, drawing tentative conclusions where we can. IMO, it's better than nothing. (blasphemy coming from the mouth of a former science major, but I have become more practical and pragmatic these days)
Yes, you want a random sample, not a self-selecting one, but that's just not going to happen with a survey on the web. Why would I allow a self-selecting sample? Let's think about why this is a theoretical no-no -- "because the participants may not be representative of the population and may be actively engaged in biasing the data" are two major issues that comes to mind.
Why would it not be representative? With large enough numbers of participants, I think we have enough owners of all flavours of drives to get a representative cross-section. Barring some kind of scandal like "WD hires child labourers and supports the Columbian drug cartels", I don't think you would get an over-representation of any particular drive or manufacturer (except, of course, IBM's 75GXP -- but we know that already). It's not like there is an SCO or a Microsoft in the survey, where people would be motivated to participate more than they normally would and answer maliciously to skew the results.
Indeed, I wouldn't trust a survey like this to produce anywhere near accurate results (which I think is what you are hung up on)... but the important point is that any biasing is likely to affect all drives from all manufacturers fairly equally. I don't care that the reliability results for all drives are off by a factor of ten as long as all drives are more or less off by a factor of ten. It's the
relative differences I care about. And I think that even the SR reliabilty survey preserves enough of these differences to be able to say that the X15 (98th percentile) is more reliable than the Maxtor DM+9 (21st percentile).
If faced with a purchasing decision for two otherwise similar drives (price, capacity, performance) and i see one drive is in the high 80th percentile range and one drive is in the low 30's, I would probably lean towards the higher rated drive. That's a big enough difference that even a self-selecting sample couldn't invalidate. Now, given two drives in the same decile? Who would care?