Memtest x86 Question

LiamC

Storage Is My Life
Joined
Feb 7, 2002
Messages
2,016
Location
Canberra
How many passes of Memtest x86 (3.2) would people consider necessary before ruling out memory issues?

My X2 is starting to give errors in F@H, where I'm getting Error 72, which research indicates is memory related. The machine has not been overclocked for weeks. BIOS settings are standard, or auto. Memory is KingMax Hardcore DDR500. So far, the memory has done 7 passes without issue.

Sometimes a new work unit is downloaded and it aborts straight away. I run two instances. One instance seems to give more errors than the other.

I ran StressCPU last night after noticing the errors for about 45 minutes (two instances) without error.

Anybody have any suggestions for troubleshooting?
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
22,232
Location
I am omnipresent
I normally either do 3 passes or let it run for 24 hours.
I hate to bring up Sandra, but as it stresses a wider array of system components, it might be another tool to try.
 

P5-133XL

Xmas '97
Joined
Jan 15, 2002
Messages
3,173
Location
Salem, Or
3 passes or overnight is what I typically run for memtst.

If you really want to stress a system, for stability, I use folding@Home: It seems to detect problems far more reliabily than any other stress tester that I've used. It's major flaw is that while it seems to be particularly sensative at detecting problems, it is a poor tool to determine what that problem is...

How about under-clocking (both CPU and RAM) as a component testing procedure.

Also install a motherboard monitor with alarms and then check for temps and/or voltage issues: PS's that don't hold voltages steady sometimes showup as highly intermittant (an error or two a week) memory errors.

All this being said, F@H sometimes produces errors through no HW fault. If you are getting an error WU only very occasionally then don't worry about it. As the frequency increases, the likelyhood of HW problems becomes much more likely.
 

LiamC

Storage Is My Life
Joined
Feb 7, 2002
Messages
2,016
Location
Canberra
Thanks guys. I knew that F@H produced the odd error, but it is a little more frequent than that. The XP 1600+ I had (150MHz == 1800+) only produced an error every odd week. The A64 3200+ I've had going since last week, has been running at 2.2Ghz, 2.3GHz and now 2.4 GHz and has yet to produce an error. The X2 just gave three EUE errors in quick succession.

Temps are in the high 40's, low 50's with both cores folding. I've just updated to the latest BIOS. I'll keep looking, as something doesn't seem right.
 

LiamC

Storage Is My Life
Joined
Feb 7, 2002
Messages
2,016
Location
Canberra
I always run Speedfan. Just checked the voltages. There is two +5V being logged??? Anyhoo, both of them are averaging 4V, anywhere between 3.5V and 6V. The fluctuations are quite steep. I'll check with another PS and get back to you.
 

Buck

Storage? I am Storage!
Joined
Feb 22, 2002
Messages
4,514
Location
Blurry.
Website
www.hlmcompany.com
I know this doesn't seem to relate to your error code, but I would also test the hard drive. If you are running large WU's, memory will be hogged and Windows will swap more frequently, even for routine duties.
 

LiamC

Storage Is My Life
Joined
Feb 7, 2002
Messages
2,016
Location
Canberra
I ran chkdsk over all the disks last night and they didn't report any errors. However, a week ago, my 3 year old switched the machine off and it hosed my C: drive. I had to do a repair job on Windows, and it went right back to the install screen at the next boot. Chkdsk also reported a lot of drive errors. It hasn't "felt" the same since. I know that's not scientific, but that is the only way to describe things. SCSIMax reports that the drive is fine though.
 

LiamC

Storage Is My Life
Joined
Feb 7, 2002
Messages
2,016
Location
Canberra
Hmm, replaced a Zalman 400W with an Antec SmartPower 350W, and the 5V rails show the same fluctuations between 2V and 5.5V. The +12V, and +3.3V rails are flat as. Motherboard voltage regulators on the blink?
 

Bozo

Storage? I am Storage!
Joined
Feb 12, 2002
Messages
4,396
Location
Twilight Zone
Try Prime95. It can be set up to stress the CPU and memory. You can also run multiple sessions.

Look for bulging capacitors on your motherboard.

Bozo :mrgrn:
 

paugie

Storage is cool
Joined
Dec 13, 2003
Messages
702
Location
Bulacan, Philippines
Hmm, replaced a Zalman 400W with an Antec SmartPower 350W, and the 5V rails show the same fluctuations between 2V and 5.5V. The +12V, and +3.3V rails are flat as. Motherboard voltage regulators on the blink?
Is your motherboard by chance a Gigabyte? I've had the same observations on the 5V rail on 3 Gigabyte mobos, one a KT600 and the other 2, NF3 boards. But all of them are very stable. It could be a quirk of the monitoring chip or of Speedfan. I would not worry about it.
 

LiamC

Storage Is My Life
Joined
Feb 7, 2002
Messages
2,016
Location
Canberra
'Tis a Gigabyte—K8NS Ultra 939. Strangely enough, it is is stable with a single core 3200+ in it, but definitely not stable with a 3800+ X2. I am going to try other monitoring software.
 

LiamC

Storage Is My Life
Joined
Feb 7, 2002
Messages
2,016
Location
Canberra
Could it possibly be just incompatibility with the KingMax memory?

The K8NS runs fine with a 3200+ (single core) in it—with Corsair Value Select Memory.
 

LiamC

Storage Is My Life
Joined
Feb 7, 2002
Messages
2,016
Location
Canberra
Heat. Bloody heat.

I have been running the X2 on the Asrock for a couple of months without errors. Then, about the time the 12cm fan died in the SLK1650, the Early Unit End erros came back. And about the time I swapped motherboards, I swapped the Zalman p/s for the Smartpower p/s which has the p/s air inlet on the bottom of the p/s, sucking air from directly above the CPU. In another thread, I alluded to the fact that the p/s swap dropped the case temp by 4 to 6 degrees celsius. With the dead fan, the CPU temp was (Speedfan) reporting as 56 degrees.

Enter AMD64 TCaseMax
http://www.thecoolest.zerobrains.com/forums/viewtopic.php?t=83

According to it, my TCaseMax is 61 degrees. If the temps are "off" by a couple of degrees, the voila! That seems to be the problem. I swapped the Gigabyte back in when I replaced the 12cm case fan and the CPU temps have been in the 41~47 degree range @ 100% CPU usage (2 instances of F@H)—and not one EUE!. Problem solved!

Implications. The overclocking headroom is limited by heat/case/environment. Sitting (the motherboard) on the test bench in my metal shed in Autumn/Winter in Canberra, the CPU was registering ~28 degrees. Ambient was in single figures/low double digits. Sitting in the case with three HDD (one a 15K SCSI), an optical, and GeForce 6600GT with a mass of cables, and things get toastier. 41 degrees now with an ambient of 15~16. When the heating comes on (~21 ambient), the CPU temp goes up to 46~47.

In Summer, the ambient can be in the mid thirties, for a couple of months straight.

If your not running your CPU flat out 24/7, and with Cool 'n' Quiet enabled, there is no issue, or you would strike it only infrequently (3D games, video encoding?). Enough so, that you would probably blame the game/app for the crash.

If you bought the box from a shop, they probably test it on an open bench, you'd never encounter the problem. Food for thought.
 
Top