Which components for a basic, high-reliability system?

honold · Jun 16, 2003

well i should thank my lucky stars for not needing to view real/wmp9 files

i use wmp6.4 on xp (kudos ms for leaving that in), and quicktime when i need it. never use it enough to crash.

but i do derail threads with internet explorer!

Tannin · Jun 16, 2003

I rarely have Quacktime crash on me, what I usually get is a crash in some other app after I've been using Quacktime. It's pretty consistent, and fairy easy to reproduce. At various stages, probably dependant on the particular configuration I have had at the time, the QuackCrashes have been in Internet Explorer, Age of Empires, or ... was it Opera? Or Mozilla? Maybe it was both. I have sometimes wondered if it was really a video driver issue (Matrox G450), and I usually have reasonably recent versions, but in the end it's easier just to remember not to uses Quacktime if I'm planning on doing anything important (such as playing Age of Empires).

time · Jun 16, 2003

honold said:
if you want to prove me wrong on the dell thread, do it on the dell thread.

Huh? No-one has mentioned Dell in this thread. Do you think people are arguing with you just to score off you or something? :roll:

You said: eliminating moving parts will always increase reliability as long as the fanless parts are of equal quality to the fanned ones ... if i truly care about reliability, i don't use moving parts

What you are saying is an engineering rule of thumb, but that's all it is. It's no substitute for good design. There are plenty of examples of machines that could be built with far fewer moving parts, yet the designers discarded that approach. You need to look at all the points of failure, not just the moving ones.

That was my point, pure and simple. As I said, "Fanless doesn't necessarily mean higher reliability ... I'm particularly disputing your suggestion that fanless power supplies are more reliable than actively cooled ones."

I don't understand why you got your knickers in a knot.

honold · Jun 16, 2003

time said:
What you are saying is an engineering rule of thumb, but that's all it is. It's no substitute for good design. There are plenty of examples of machines that could be built with far fewer moving parts, yet the designers discarded that approach. You need to look at all the points of failure, not just the moving ones.

i never said it was a substitute of good design. i 'ignored' other points of failure because they are categorically irrelevant. perhaps next time i'll do the right thing and suggest he buy a psu whose caps won't leak?

i never said anything about other points of failure, but it's pretty obvious that as long as one operates in a cool environment and uses quality power sources (preferrably with a ups) the only way to mitigate component failure is to reduce dependency on moving parts and buy quality hardware.

That was my point, pure and simple. As I said, "Fanless doesn't necessarily mean higher reliability ... I'm particularly disputing your suggestion that fanless power supplies are more reliable than actively cooled ones."

i'll put decent external dc against decent internal ac anytime. i would love to wager on it.

CougTek · Jun 16, 2003

I disagree with Honold. And I'm the voice of reason on this one. Three years ago, I was working with the development team of a certain telecommunication multinationale and we were working on its next-generation, bleeding edge product. It was supposed to be the next piece of hardware to serve as backbone for the Net. Design life was targeted at 25 years. It was big as a refrigerator and became red inside due to heat when under full load. And it was full of high-speed fans too.

Lower mecanical parts equal better reliability is dead wrong. Component life is mainly determined by the temperature at which they run and the amount of power they dissipate relatively to the amount of power they were designed to dissipate. There are plenty of VCR out there without fans blowing up after one year and a half. Yet there are military parts lasting decades without fans too. Why? Design tolerance. Use a component like a resistor or a transistor in a place where it will dissipate on average ~75% of its maximal dissipation limit and it will last two years. Replace that component by another designed to sustain 10x the load and it will last 25 years. That's why cheapo electronic devices die quickly : because they use lower grade components (which are way cheaper too) that are close to they operating limit. BTW, fans too have design tolerance and obey to the same rule.

Fans only help to lower the average operating temperature. They are a mean to an end, not an end by themselves. Sure, a CPU fan that dies will cause a system failure (doesn't mean it will kill the CPU though, just lock it). But if your fanless Epia board use components that are closer to their load limit, take my bet it will die, not just lock and reboot, before the Athlon or Pentium 4 box will.

Blown capacitors or overheating resistors, in my experience, caused something like five times the amount of PSU failures than fan failures.

IIRC, the 75GXP fiasco was caused by chips on the PCB, not the glass platters, GMR head or the actuators.

Enough to make my point?

honold · Jun 16, 2003

CougTek said:
Enough to make my point?

sure. it's a fantastic point. too bad we're talking about pcs.

Sure, a CPU fan that dies will cause a system failure (doesn't mean it will kill the CPU though, just lock it)

dead athlon cpu fan = dead athlon cpu. the system will lock and the load will signifigantly decrease, but they will still self-immolate doing nothing at all.

But if your fanless Epia board use components that are closer to their load limit, take my bet it will die, not just lock and reboot, before the Athlon or Pentium 4 box will.

emphasis added on 'if'. so that's an 'if'. wow! very interesting! here's another reliability consideration: if an epia board used components closer to their load limit, yet a grand piano fell on the intel/amd-based pc, the epia board would outlast it! stuff to think about when making your next purchase...

hard drives and fans (along with the components which depend upon them) fail more often than any other component in a pc. reducing dependency on these items (which should be done reasonably) will increase reliability. 'and i'm the voice of reason on this one.'

honold · Jun 16, 2003

before this one gets yanked out of context, i didn't mean 'we're talking about pcs' to say that cougtek's points (however irrelevant) were at all invalid. rather, that they don't matter in this thread, as such factors cannot be reasonably determined by consumers for pc components.

honold · Jun 16, 2003

CougTek said:
Blown capacitors or overheating resistors, in my experience, caused something like five times the amount of PSU failures than fan failures.

did you determine this by verifying fan functionality on every dead psu you encountered? dead fans promote such symptoms.

IIRC, the 75GXP fiasco was caused by chips on the PCB, not the glass platters, GMR head or the actuators.

and this is yet another irrelevant point. we're talking about general pcs and general failures. the majority of people who experience disk crashes don't experience them because their ibm 75gxp drive has a pcb-related manufacturing glitch.

if i had suggested he use an ibm 75gxp, you could have made some kind of statement about it. the 75gxp's failure is very different from general drive failures.

CougTek · Jun 16, 2003

honold said:
'and i'm the voice of reason on this one.'

In fact, you're a (fairly good) software guy taking thru his ass about hardware. But that's ok. We need debates.

A modern motherboard with cpu overheating protection ill shut down the system before the Athlon dies.

What's the reasoning behind the piano part and how's that part more revelant to the thread than my post?

Answer is 'yes' to the question about PSU. Did you determine the functionality of the PCB of the dead hard drives the majority of people experience?

It's 2:35am here. Good night.

Tea · Jun 16, 2003

Whereas I. on the other hand, am a (fairly good) hardware ape who is talking through her elbow about hardware. (Elbow? I think that's the word. Whatever you call the bendy bit that you lift up your glass of gin and tonic with.)

I too would question Honold's assertion that it's fans and hard drives that fail. However, I agree entirely that it is moving parts that are the problem. In fact, there is one particular moving part that, in my experience, is responsible for 100 percent of PC failures. Yes, you read it right, one hundred percent.

That one particular moving part, astonishingly enough, gets almost no attention in the technical press, and in stark contrast to other, more glamourous parts like hard drives and CPUs and cooling fans, costs only a few cents and could very easily be improved.

(Tea? That is nonsense. You are making a fool of yourself. Again.)

(Shutup, Tannin. I waz doing juzt fine until you butted in.)

Err .. az I was saying, if the people who design computers would only realise that the power switch is the critical item, we wouldn't have all this trouble.

(You are nuts!)

(No I'm not. Have you ever had a computer dissolve intop a pile of smoking rubbish before you even plugged it in and touched the power switch?)

(No. But ...)

(Well there you are then. Have you ever had a computer fail half-way through the boot sequence without you having touched the power switch first?)

(Well, no, but ...)

(What about a Blue Screen of Death on a machine that isn't plugged in and switched on yet?)

(Oh for God's sake, Tea. Get a grip on reality!)

(No need to be rude about it. I was doing perfectly fine in this thread till you came along.)

(Rubbish. You were being stupid.)

(Fine! I present a zerious theory, bring an impressive array of evidence along to zupport it, and then just because it doezn't agree with your preconceptionz and you are utterly unabe to refute it, you dezcend to perzonal abuse. If you're zo clever, Mr Tannin, you write the thread!)

(Oh calm down. I was only pointing out that ... er ... Tea?)

(Tea?)

(TEA!)

(stupid ape)

Tannin · Jun 16, 2003

Hmmph. Well, if I'm writing the thread, I'll make it clock up against my post-count, not yours.

As I was saying ...

(Who waz zaying?)

(Err ... as Tea was zaying - I mean saying, damnit!)

(That's better.)

As Tea wasn't saying but would have been if she had had any more than three or four briain cells capable of firing off a neuron at any particular given moment, I'm not so sure about the "moving parts fail" theory.

But I don't think that this is because Honold is wrong, rather, I think it's because he and I (as a general rule) work on different types of hardware: him in the corporate sector (mostly) and me in the general home/small business market (90% of the time). Let's work through the machinery in question and see if we can pick the similarities and differences.

First, we religiously use full-size, industry-standard cases with fair-dinkum full-size ATX power supplies - none of these dinky-toy slimline corporate vomit boxes, thankyou.

(I thought you were going to be polite.)

(Who? Me? Whatever for? I'm not even being polite to you tonight, let alone Dell.)

(You fool! Now you've done it! You mentioned the "D" word!)

(Look, if I give you two bananas and a glass of gin will you shutup for a while?)

(That dependz.)

(Depends on what?)

(On whether the term "two bananas and a glass of gin" is to be interpreted as including a small quantity of chocolate ... say, er ... about half a kilo.)

(Oh all right then. You know where it is.)

(Well, I know where it used to be, up until about 6 o'clock this afternoon.)

(OK, OK, there is some more under the front seat of the car. Don't take anything else except the chocolate.)

Ahem. .... As I was saying, those full size cases and PSUs of ours almost certainly run a good deal cooler than your typical micro-form corporate unit. I've frequently seen systems (especially in tower, as opposed to desktop form) run perfectly happily for months on end with a failed PSU fan. Not always, of course, but the bigger the box and the better the natural airflow, the more likely that is. Also, we build in a good deal more tolerance, power rating-wise than the corporates - i.e., we will use a 350 where a Compaq or an IBM will often try to get away with a micro-form 170W unit.

Obviously, this desn't apply to server standard stuff, but I'll come to that later. Err ... excuse me a moment...

(On the nail by the back door. It's the one with Clocker's avatar engraved on it. Oh, and try not to lock them in the car this time.)

Sorry, I'm back. Now, the heavier-duty corporate stuff: there I'm assuming that we are talking quality components. I'd expect an IBM or a ... er ... can I ay the "D" word? ... would build a server PSU with good quality, over-engineered electronics, and that therefore we can expect those PSUs to have an electronic failure (blown cap or etc.) only very rarely. So here too, if my house of cards logic can go this little bit further, we can expect to see a relatively higher number of fan failures leading to system failure. In the little cheap vomit boxen, they are under-engineered and under-cooled, so a fan failure is catastrophic; in a server they are over-engineered on the electronic side so there isn't much else to go wrong besides the fan; and in my mid-range systems, the fan isn't so critical to the system (especially because there are often several other fans somewhere in the box to help it out) and the electronics are (generally speaking) amply big enough but of only moderate build quality.

The long and the short of it is that we see quite a number of PSU failures in the average year, but only perhaps 5 or at most 15% of them are attributable to a failed fan.

Tannin · Jun 16, 2003

Now, time to consider hard drives. Bottom line is, these days, hard drive failure accounts for a tiny percentage of system failures, and that percentage is continuing to fall with every passing month as the number of in-service systems we have running relatively low-reliability drives falls.

The vomit box makers, on the other hand, buy whatever happens to be cheapest at the time, and as a matter of routine they use drives that, by our standards, simply don't cut the mustard. Seagate U Series things, for example. You know, we still send more Seagate drives back for warranty replacement than we do Samsungs, and more Western Digitals too, and yet we haven't bought either brand in any large quantity for over two years now. We sell something like 95% Samsung, yet our returns are about 10 to 20% Samsung. That figure is gradually rising as the number of in-service and under-warranty WD and Seagate drives we have falls. (Currently we also have 2 in-service Maxtor drives, and zero IBM or Fujitsu. Neither Maxtor has failed. All those thousands of IBM drives we sold in 1GB and 4GB days are OOW now.) At a wild guess, 80 to 90% of our current in-warranty systems are Samsung-equipped. Eventually (barring the odd-bod we get to cover out of stocks) 100% of our failed drives will be Samsungs, simply because we won't have any other brand in service - and our hard drive RMA numbers, already tiny, will become smaller still. (Unless, of course, Samsung replace their 40GB/platter 7200 with their own home-grown 75GXP imitation - don't laugh! It happens to every drive maker eventually - look at the wonderful reliability record IBM drives had up until the disaster model. IBM were the best around before that.)

Anyway, the bottom line is that (in our particular case) hard drive failures are rare, and getting rarer. They are outnumbered by ... oh ... 10 to 1 by motherboard failures, maybe 3 to 1 by video card failures, 5 or 10 to 1 by PSU failures, and so on. Even then, the proportion of those failed drives to which we can attribute the failure to mechanical (i.e., mving parts) problems is very low. "Won't spin up" goes onto our hard drive RMA forms now and then, but only for perhaps 1 in 10 drives - 1 in 5 at the absolute maximum. Bad sectors would be the single most common fault, probably accounting for 40-odd percent of drive RMAs, maybe even more. But bad sectors are not always mechanical in origin anyway. It's very easy indeed to have "bad sectors" as the symptom of a drive that is, in fact, in perfect mechanical condition, but is having problems with its read channel or the phenomenally complex electronic system that is responsible for head positioning, timing, and sequencing. Let's be generous and say that "won't spin up" and "bad sectors" account for 50% in total: that still leaves the other 50% of hard drive failures - all the "no detect" and "no LBA function" and "read channel failure" and "write failure" and the like - to be accounted for on the non-mechanical side of the system failure ledger.

Finally, there is the matter of motherboards. We switch brands quite a lot: we find a model that works well for us, so we use lots of them. Eventually, that model goes EOL (or just becomes commercially obsolete because of newer models creating demand for something else) and we drop it. Then we have to start trying 5 of these and ten of those until we find one we like again. I have never yet found a motherboard brand that I can always rely on: they have all had their horror boards, even my long-term favourite brands like Gigabyte (586-S2), ASUS (virtually anything that started with CUSI, not to mention several others of doubtful worth), Soltek (SL-75DRV-4), Epox (forget the model number but it was about the same time), MSI (pick a card, pick any card), and so on.

Anyway, we see quite a few motherboards fail - far more than we see by way of failed CPUS - and for this purpose I am counting a failed fan as if it was the CPU's fault, if you know what I mean. Honold, on the other hand, at least in the server and high-end space, presumably works with really well-engineered boards. (At least I certainly hope that that $8000 IBM server I saw an ad for the other day has one hell of a good motherboard in it!) And for some reason I have never quite got my mind around, vomit box motherboards seem to be, on the whole, remarkably reliable. We get any number of IBMs and Compaqs and Hewlett-Crapard Pavillions in for repair, but it's fairly rare for the motherboard to be at fault. Chief failure points are (in order) hard drive, PSU, them modem. (They use the most incredibly crappy modems in them, I have no idea why, though it may have to do with needing to use a non-US model to fit with Australian specs and, not being based here, getting the choice comprehensively wrong.) Motherboard failures would (I think) be next after those three - which, when you think about it, is a pretty decent record on the part of the people that make their motherboards for them.

Bear it in mind, however, that we rarely get to see vomit boxen in the first 12 months before their warranty runs out - why pay me to fix it when IBM have to do it for nothing? This almost certainly skews the figures, as the vast majority of our motherboard failures occur within the first ... oh ... three to six months. (Or, naturally, once we start hitting the end-of-useful-life wall at maybe 5 or 6 years. But I'm not considering really old systems in this thread.) We do see in-warranty vomit boxen sometimes, just the same; usually it's under one of two circumstances: (a) the customer needs it done right now and can't wait while Hardly Normal's send it off to Sydney and back, or (b) when it's the fourth or fifth failure inside six months and they just can't stand the grief they are getting from the damn thing and they want it fixed right. Both circumstances are relatively rare.

(I sometimes wonder if, somewhere in Ballarat, there is an irate customer handing a computer over a counter and saying "I've given up on those mongrels over at the Tannin Shop, and even though it's under warranty, I want you to fix it for me!" I don't think so, but that's the thing with that particular circumstance - in the nature of things, if it was happening, I'd be the last to hear about it, just the same as HP never get to hear about the 5th disaster with their crappy Pavillion - the customer tells me all about it, in graphic detail, complete with life history, views on politics, children's health, and violins playing in the background.)

Anyway, my point is that presenting vomit box mainboard failures are relatively rare, but that this may (or may not) be a matter of sampling error. Someone wo works behind the counter of Hardly Normal's or Myer would know. (Hmm ... I know a couple of people .. maybe if I poured a beer or two into them I could find out?)

Errr .... has anybody seen Tea? She's awfully quiet....

Tannin · Jun 16, 2003

Oh... It's OK, she left a note.

She's just walking over to the Belinda house to borrow the spare keys. It's only 10 miles. She'll be back by Wednesday.

honold · Jun 16, 2003

CougTek said:
In fact, you're a (fairly good) software guy taking thru his ass about hardware. But that's ok. We need debates.

it doesn't take an astronomer to observe that the sun comes up in the morning. i did warranty work on desktops/workstations/servers for 6 years for hp/compaq/ibm/dell.

and a debate isn't needed. there is fact and there is fiction, and any time we spend getting to fact is wasted if the evidence is already before us.

What's the reasoning behind the piano part and how's that part more revelant to the thread than my post?

you were speaking on an 'if'. 'if the epia is designed or operates worse, it will fail more quickly'. well, la dee frickin' da. it's a useless observation. you don't know if it's better or worse, so my piano statement was about as valuable. 'if your house is set on fire, it will burn.' there is no insight in that statement.

Answer is 'yes' to the question about PSU. Did you determine the functionality of the PCB of the dead hard drives the majority of people experience?

i never made an explicit claim about what % of hard drive failures are attributed to what - only that they have moving parts, and that they do often fail.

honold · Jun 16, 2003

honold said:
hard drives and fans (along with the components which depend upon them) fail more often than any other component in a pc.

feel free to disagree with me on that one. tell me floppy drives and cd audio cables fail more often - i don't care.

reducing dependency on these items (which should be done reasonably) will increase reliability.

how will reducing dependency on hard drives/fans fail to change or reduce reliability?

Prof.Wizard · Jun 16, 2003

OT

Off Topic:
honold vs CougTek

I won't comment on who's who...

blakerwry · Jun 16, 2003

:errr:

Which components for a basic, high-reliability system?

honold

Storage is cool

Tannin

Storage? I am Storage!

time

Storage? I am Storage!

honold

Storage is cool

CougTek

Hairy Aussie

honold

Storage is cool

honold

Storage is cool

honold

Storage is cool

CougTek

Hairy Aussie

Tea

Storage? I am Storage!

Tannin

Storage? I am Storage!

Tannin

Storage? I am Storage!

Tannin

Storage? I am Storage!

honold

Storage is cool

honold

Storage is cool

Prof.Wizard

Wannabe Storage Freak

blakerwry

Storage? I am Storage!