Why ECC RAM?

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
This is a serious question for thinking minds. :)

We've all been raised on the principle that servers need error-checked RAM but client PCs don't. I've seen papers that point to a PC with 8GB RAM experiencing a soft error (due to cosmic rays at sea level) about once a day (some claim much more, but let's try to be realistic here).

The thing is, Intel has obviously decided that's bullsh*t, because none of their extensive i-core series support ECC RAM. Not even their most expensive 6-core (12 if you believe in hyperthreading) CPU.

Xeon still does, but it's optional. Unsurprisingly then, you have to work at it to even find DDR3 RAM with ECC.

What the hell is going on here? Personally, I suspect that cosmic rays are likely to affect far more than 1 bit per 32 or 64-bit word. So unless you're on super high-end hardware with 'chipkill', perhaps ECC - with its extremely limited correction ability of just a single bit - is more or less a waste of time?
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
22,232
Location
I am omnipresent
I noticed that when I was looking at my quiet server. I sort of shrugged and marked it up to some silly attempt at making an artificial product distinction; Intel might announce at some future point an i-series chipset that does do ECC, for all the folks (scientific computing?) waiting for one.
 

Handruin

Administrator
Joined
Jan 13, 2002
Messages
13,916
Location
USA
I've had a production ESX server with 48GB of RAM return me numerous parity errors/corrections detected which in turn ended up being a faulty RAM chip. The more eggs in one basket with VMs, the greater the chance of a RAM parity error or problem over time. It's not a terrible idea to have some parity correction if something is on its way out the door. In the case above, I was able to schedule the downtime and more all the VMs off the systems so that I could bring it down to swap out the bad memory.
 

Santilli

Hairy Aussie
Joined
Jan 27, 2002
Messages
5,257
For what it's worth, my server is running 2 gigs of said ram.
I was looking at adding some ram, since Windows 7 and a few programs consistently use 70-76% of the ram in the machine.
2 gigs of ECC from Crucial is about 160 bucks, with shipping.
With the speed of the new chips, that's nearly the cost of a 920.
Think I'll save my money and upgrade to a non-ECC motherboard, and chip.

Sounds to me this was another how to make money off the consumer type angle by Intel and companies.
 

blakerwry

Storage? I am Storage!
Joined
Oct 12, 2002
Messages
4,203
Location
Kansas City, USA
Website
justblake.com
So unless you're on super high-end hardware with 'chipkill', perhaps ECC - with its extremely limited correction ability of just a single bit - is more or less a waste of time?

I don't think you need super high end hardware for chipkill, SDDC is the equivalent I believe, and is available on ~ $1000 standard intel server platforms from any manuf.

In the right platform, ECC can be very handy -we've been alerted to DIMM failures on 3 separate servers (typically after several years of use) due to ECC errors reported by the system during operation. It's certainly better to receive an email that DIMM #3 is "about to fail", rather than unknowingly waiting for your data to get corrupted or the server to lock-up and have to restore from backups and troubleshoot the problem after the fact. It's not always convenient to take a server down for an overnight run of memtest to determine if a stick of RAM is faulty.


Unsurprisingly then, you have to work at it to even find DDR3 RAM with ECC.
I found the opposite, looking at 8GB+ DDR3 sticks you can't find any that aren't ECC - at least, not on crucial's site.


santilli said:
2 gigs of ECC from Crucial is about 160 bucks, with shipping.
With the speed of the new chips, that's nearly the cost of a 920.
Think I'll save my money and upgrade to a non-ECC motherboard, and chip.

ECC does seem to carry a 25-50% premium at crucial.com, but that still makes your 2GB of DDR1? DDR2? about $100. 4GB DDR3 ECC sticks only carried a $20 premium over non-ECC sticks (not mixing in buffered vs unbuffered). I think the cost factor you're running into is likely due to supporting "legacy" hardware. Look on the bright side, at least the current cost of 2GB of RAM is probably half the cost of what you paid for the initial 2GB ;- )
 

Chewy509

Wotty wot wot.
Joined
Nov 8, 2006
Messages
3,348
Location
Gold Coast Hinterland, Australia
Re: Chipkill is available on any AMD Opteron based system. Even my 6yr old system has that available. (Tyan K8W motherboard and AMD Opteron 242's).

Soft error rate with detected ECC errors on my system is about 1 every 6-9months, but since I had no choice but use Reg ECC it wasn't a debate I had to enter with. It was certainly nice to know that one of my DIMMs was dying before experiencing hard errors, and was easy to prove for RMA as well.

IIRC all AMD chips (Opteron, Athlon64 and newer) support Reg ECC, but is up to the motherboard BIOS implementator to enable the use of Reg ECC.

On the Intel side, the X38 was the last desktop/workstation chipset to support Reg ECC with regular desktop chips such as the Core 2 Duo/Quad. Most X58 systems support Reg ECC, but only if you use a compatible Xeon CPU. (Most single socket 3400/3500 series only have a small proce premium over the same i7 model at the same clock speed, but your milage may vary depending on your location).
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
So in the Intel world, if you want any kind of error checking/correction at all, you need a Xeon.

If you want something better than simple single-bit ECC, you need a first generation i-Core socket 1366 system. Socket 1155 and 1156 won't cut it, so tough if you want second generation.

If you want 8-bit correction (two modules rather than one), you need to leave one of the three s1366 RAM channels empty.

In the AMD world, any Phenom II supports ECC, you just need the motherboard BIOS to enable it. That means Asus but not Gigabyte.

And Opteron supports Chip Kill (handles an entire DIMM failure) provided you stick with single channel operation (I think), as well as single-bit.
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
Here's some definitive research that blows away some popular preconceptions regarding memory errors:

a) Soft errors aren't that important after all, because hard errors and datapath errors are a couple of orders of magnitude more common than previously thought. :errr: That is, most of the errors appear to be down to failing or bad hardware.

b) Error rates are directly related to utilization, so lightly loaded PCs are likely to be fine. The data came from the opposite end of the spectrum: Google's server fleet. They saw an average of 2,000 to 6,000 corrected errors per gigabyte per year, as well as a smaller number of uncorrectable (which prompted DIMM replacement).

In other words, a server with 24GB of utilized RAM did, on average, experience about 10 errors per hour when under load, >97% of which were able to be corrected. Chipkill which was 4 to 10 times more effective than single-bit correction.

Having said that, 2 out of 3 of their servers didn't see an issue in a given year. So your hardware is more likely than not perfect, otherwise it's a bag of eels. :(
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
So why can't you build a server with an AMD X6 on an Asus M4A88TD-M/USB3 or M4A88TD-M EVO/USB3?

Six cores, 6MB L3 cache, five or six SATA 3 (6.0Gb/s) ports, USB3, 16GB of ECC RAM - all in a microATX form factor for not much more than $500. Maybe half the price of an equivalent Xeon s1156 platform, but it sure as hell will be more than half as fast. Food for thought.
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
I don't think you need super high end hardware for chipkill, SDDC is the equivalent I believe, and is available on ~ $1000 standard intel server platforms from any manuf.

For just the motherboard and part of the CPU, that sounds about right. :p You need a 5xxx chipset and they don't come cheap.

I found the opposite, looking at 8GB+ DDR3 sticks you can't find any that aren't ECC - at least, not on crucial's site.

The significance lies in the fact that you can't find any 8GB sticks that aren't registered. In practise, the ECC is a given once you walk down that (incompatible) path.

What I found to be thin on the ground were 4GB unregistered ECC sticks.
 

Santilli

Hairy Aussie
Joined
Jan 27, 2002
Messages
5,257
Slight thread drift. Dual Xeons, 3 ocz Vertex Turbos in Raid 0, through a PCI-X 9550.
2 gigs ram, using about 60-78% for normal usage.

Would there be any noticeable difference in speed going up to 4 gigs of ram? Using Windows Ultimate 7, 32 bit?
 

Handruin

Administrator
Joined
Jan 13, 2002
Messages
13,916
Location
USA
Slight thread drift. Dual Xeons, 3 ocz Vertex Turbos in Raid 0, through a PCI-X 9550.
2 gigs ram, using about 60-78% for normal usage.

Would there be any noticeable difference in speed going up to 4 gigs of ram? Using Windows Ultimate 7, 32 bit?

This should be in its own thread, but I would say I doubt it if you're only using 60-78% of the 2GB you have now.
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
Y.A.S.H.

or possibly Y.A.S.H.F.A.H.P.

Yet another Santilli Hijack fretting about his PC.
 

CougTek

Hairy Aussie
Joined
Jan 21, 2002
Messages
8,728
Location
Québec, Québec
Yes, it is. But I don't find him annoying. He's just Santilli. And unless it comes from Mark, he doesn't mind when someone is poking at him. I feel that with Greg, laughing of Santilli is just like laughing with Santilli. Bored this evening? Just make fun of Santilli. He won't be pissed. He'll know it's not done with the intend to hurt him. Seen this huge target wearing an hawaiian shirt? Oh, it's just Greg asking to be hit again. Just for fun between old friends.

Sorry for your thread. You won't read about ECC here from now on. It's not his fault. To Greg's eyes, all threads are small pieces of the "Something Random" thread. We like him anyway.
 

Santilli

Hairy Aussie
Joined
Jan 27, 2002
Messages
5,257
Thanks CT

Let's look at this topic:

"Why ECC RAM?
This is a serious question for thinking minds.

We've all been raised on the principle that servers need error-checked RAM but client PCs don't. I've seen papers that point to a PC with 8GB RAM experiencing a soft error (due to cosmic rays at sea level) about once a day (some claim much more, but let's try to be realistic here).

The thing is, Intel has obviously decided that's bullsh*t, because none of their extensive i-core series support ECC RAM. Not even their most expensive 6-core (12 if you believe in hyperthreading) CPU.

Xeon still does, but it's optional. Unsurprisingly then, you have to work at it to even find DDR3 RAM with ECC.

What the hell is going on here? Personally, I suspect that cosmic rays are likely to affect far more than 1 bit per 32 or 64-bit word. So unless you're on super high-end hardware with 'chipkill', perhaps ECC - with its extremely limited correction ability of just a single bit - is more or less a waste of time? "

It appears you already have your conclusion. Time, do you have any computers with ECC ram? If so why?

I tend to agree with you, when I'm wearing my tinfoil hat, that ECC might be worth something.;-)

On the otherhand, the point I was trying to make is that in actual use, one, I've NEVER had an error message.

Two: the stuff is VERY expensive. When my favorite Dolphin was still posting here, he used a similar setup to mine for something over 100 workstations, and, they worked very well. But, he was working for NASA. Wonder if too many rude comments drove he, Buck, and a few other folks out of this forum, a huge loss to the rest of us?

Some stuff is designed for NASA, etc. with huge budgets of taxpayer dollars that they spend like water, depending on Congress' budget for them at the time. Something I'm sure never happens in OZ.

I can't help but think this thread needs drift. Anyone that starts it with a serious topic for serious minds, and then proceeds to discuss the effects of cosmic rays must have a triple layer tinfoil hat.

I also suspect a cheaper way to stop the alleged effect of cosmic rays would be some sort of shielding: perhaps tin foil?
Be honest, I thought this was tongue in cheek from the time it was posted.
 

CougTek

Hairy Aussie
Joined
Jan 21, 2002
Messages
8,728
Location
Québec, Québec
I also suspect a cheaper way to stop the alleged effect of cosmic rays would be some sort of shielding: perhaps tin foil?
Cosmic rays travel trough planets. There's a project in Antarctica to try to observe those that cross Earth at the bottom of the kilometer-deep ice cap. If neutrinos can travel all that way through rock and ice, you really think a sheet of aluminum will block them?
 

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
17,454
Location
USA
On some days you can really feel those cosmic rays.
 

Santilli

Hairy Aussie
Joined
Jan 27, 2002
Messages
5,257
Tongue and cheek, or humor if you would. My first reaction was lead,..
but to be real, not a bloody thing we can do about such particles, hence my reaction to the thread.
 

Santilli

Hairy Aussie
Joined
Jan 27, 2002
Messages
5,257
Tongue and cheek, or humor if you would. My first reaction was lead,..
but to be real, not a bloody thing we can do about such particles, hence my reaction to the thread.

Discussing cosmic rays starts me thinking that the old comic books
like the Fantastic Four, actually did real well, considering what we are finding now.

Watching the Roxy Pro in Aussie land, and, Carissa Moore just won.
Cute, and great surfer.
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
Cosmic rays travel trough planets.

No, most are stopped by the atmosphere. You're confusing them with neutrinos.

There's a project in Antarctica to try to observe those that cross Earth at the bottom of the kilometer-deep ice cap.

The cosmic rays don't travel a kilometer through the ice, they're stopped when they collide with atoms in the ice and give off "fleeting flashes of blue light" - that's what the detectors underneath the ice are looking for.

If neutrinos can travel all that way through rock and ice, you really think a sheet of aluminum will block them?

Not at all, but we're talking about cosmic rays, not neutrinos. :)
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
Tongue and cheek, or humor if you would. My first reaction was lead,..
It may surprise you to learn that there are different types of 'radiation'; lead is useful for X-rays and Gamma rays.

but to be real, not a bloody thing we can do about such particles, hence my reaction to the thread.

You can bury your equipment, that's 100% effective. Conversely, the problem is many times worse in an airliner:
Cosmic rays were recently suspected as a possible cause of a Qantas Airlines in-flight incident where an Airbus A330 airliner twice plunged hundreds of feet after an unexplained malfunction in its flight control system. Many passengers and crew members were injured, some seriously. After this incident, the accident investigators determined that the airliner's flight control system had received a data spike that could not be explained, and that all systems were in perfect working order. This has prompted a software upgrade to all A330 and A340 airliners, worldwide, so that any data spikes in this system are filtered out electronically. (from Wikipedia)

Discussing cosmic rays starts me thinking that the old comic books
like the Fantastic Four, actually did real well, considering what we are finding now.

So you're saying that we should worry about our PCs turning into rubber or catching fire and flying around the room?

Watching the Roxy Pro in Aussie land, and, Carissa Moore just won.
Cute, and great surfer.

I notice you managed to acknowledge her skill after praising her appearance. That attitude has been a topic of debate over here; my daughters would call you a misogynistic prick, but I'm okay with your need to enlighten us about your TV viewing habits and appreciation for the fit female form, especially in the middle of a thread about the merits of error correction. Excuse me while I look for my tin foil hat. Thank you.
 

Santilli

Hairy Aussie
Joined
Jan 27, 2002
Messages
5,257
"So you're saying that we should worry about our PCs turning into rubber or catching fire and flying around the room?"

Considering what just happened in Japan, that is no longer funny, and, in some places, my guess is that's exactly what happened.

"I notice you managed to acknowledge her skill after praising her appearance. That attitude has been a topic of debate over here; my daughters would call you a misogynistic prick, but I'm okay with your need to enlighten us about your TV viewing habits and appreciation for the fit female form, especially in the middle of a thread about the merits of error correction. Excuse me while I look for my tin foil hat. Thank you. "

Quite alright. Funny how those that championed certain causes are now condemned by those whose causes they championed.

I don't want to enlighten you too much, but, women's surfing has a bit of history pretty similar to Women's Professional Golf. Taken in that context, the issue of appearance was, and is finally fading as an issue.
Raging feminism can be rather ugly when it raises it's appearance, and, another area of similar concern is Women's Basketball officials in the United States.

And actually, thank you for a topic that turned out to be more fascinating then it first appeared.

It does appear to be a serious problem, once one leaves the earth, and, as chips continue to get smaller.
 
Top