problem Computer crash when processing large amounts of data

KrazeyKami

What is this storage?
Joined
Jan 2, 2011
Messages
24
Hi all,

For a short introduction of my tech level; I’ve been a system manager for the past 8 years, and been playing around with computer for nearly 20 years.
Until recently, I’ve never faced a problem with my own computer that I couldn’t solve, or figure out what was giving issues, until now… And that’s why I need your help, and a fresh look on things.

First, my setup:

Motherboard: ASUS P6T SE
Motherboard BIOS: v.0808
CPU: Intel i920 @ 2,6 GHz (Stock)
CPU Heatsink: ProlimaTech MegaHalems + 2 Cooler Master120mm fans
CPU Idle Temp: 30-35 degrees per core
CPU Load Temp: 45-50 degrees per core (Prime95)
Memory Part Number: 6 GB OCZ Gold PC3-8500U + 6 GB OCZ Gold PC3-10700U (Both OCZ3G1333LV6GK @ 1066 MHz, Stock)
Memory Voltage: 1.65
Video Card(s): Asus nVidia GTX295
Sound Card: Sound Blaster Fatal1ty X-Fi
PSU Model Number: Cooler Master 1000W Real PowerPro
Hard Drive(s): Intel X-25 M SSD 80G (OS/Boot), 4x 2TB HD204UI Samsung Spinpoint F4 (all on ICH10 / Sata2)
Optical Drive(s): GGW-H20L BlueRay RW
Other Cooling: Cooler Master Stacker 831 case with 6x Cooler Master 120mm fans
Operating System: Windows 7 Professional x64

This machine is running 24x7, rock solid. I never had any issues with performance or stability.

Now, for the problem:

Until a few weeks ago, I didn’t have the 4x 2TB drives in it.

I bought the drives, made a RAID5 array, using the ICH10R, and that’s when the problems began:
When copying (or downloading) large amounts of data to the array, thus creating a high I/O on any of the 2TB drives, my computer suddenly reboots. No blue screens (although that option is checked ON), and the only remarks in Event Viewer are: System suddenly rebooted unexpectedly, Event ID 6008, and possible cause: Power failure, Event ID 41).
No error codes, no nothing.

I decided to break up the array, and see what happens when I copy 1.5 TB of data from 1 drive to the next; The computer reboots again. The drives are now single SATA2 drives, with a 2 TB partition. After 10-80 minutes, the system reboots without notice or error.

So, I started to systematically remove drives, and test with the other drives.
Regardless of what drive is the source, or destination (I tried all combo’s, and all directions), the system reboots when a high amount of data is generated. Not just with copying the existing data, but also when downloading (and at the same time repairing files (.PAR), and extracting).

I tried the following:

  • Check for overheating: All values are well below 40 degrees C; (also checked drives);
  • Even put an active cooler on my southbridge (The ICH10);
  • Memory checks. Ran 3 different programs to check / test my memory, ran overnight for hours and hours, multiple passes, 0 errors.
  • Remove all other hardware, except for the absolute minimal necessary;
  • Swap / replace powercords, SATA cables, even rotate drive position on the SATA connectors;
  • Reset BIOS settings;
  • Reinstall Windows 7 (delete the entire 80GB partition on the SSD and reinstall, no other tweaks, but right after install, start the copy transactions) to rule out the possibility of faulty software and / or drivers;
  • Reformat / create the drives / partitions: Tried both MBR and GPT partition; Different block sizes;
  • Turn off Write Back Cache, to even further rule out a problem with my RAM;
  • Calculate the PSU needs; I tried multiple programs, even a paid one, and counted manually: Granted, on a full Direct3D load (games on high etc), my GFX card needs around 450 watts. This makes a grant total of 950 Watts. However, the problems occur while idle in Windows, so the consumption for my GFX is max. 100 Watts, making a total of (roughly) 600 Watts, well within the limits of my 1000W PSU;
  • CPU check / Prime95; runs for days, stable, without a single error;

The facts:
  • the problem only (and only) occurs when copying / downloading a large amount of data;
  • The system runs flawlessly under high load (playing games, watching movies, running programs etc);
  • I never had this problem before, but then again, I never had the space to start downloading 250 GB of data, or copying 1.5 TB ( I didn’t even have 1,5 TB ^^) data to other drives.
I am able to reproduce a “fast” reboot error:
I created 3 separate batch files, which basicly tell Robocopy to copy data from:
  • Drive D: to Drive E:
  • Drive D: to Drive F:
  • Drive D: to Drive G:
  • When I run these scripts separate it runs for a while, but also reboots / crashes after an hour or so.
  • When I start these scripts all at once, it reboots within 5 minutes.
  • Remember, that I already cloned the drives, so I could rotate the source drive, and systematically removed / switched a destination drive, thus trying all different combo’s (and to check whether one of the drives might be faulty).
  • Further this also makes me doubt if it’s the shear size of data that causes the problem, cause within 5 minutes, not even 100 MB is being addressed, and still, it reboots.
  • This is making me think, that the PSU might be the problem. As soon as these drives are actively called upon, i can imagine a sudden increase in the 12V+ rail, can cause to overload my 12V rail... altho my PSU has 6x 12V rails, i'm not to convinced this might work as well as people say...there are many discussions on the web about the use of 6x 12V rails. Could it be, that my 12V rail is maxed? (considering it's giving power to: The Mobo, The Cpu (4/6 pins, can't remember), The GFX card (both 6 and 8 pins), 5 drives (SSD + 4x 2 TB), an Optical BD-RW, and ofcourse the onboard devices (Soundcard) and USB devices (headset, webcam).

I am all out of ideas. If there is something that I haven’t checked / tried, please tell me. I think I wrote down everything I tried thus far; maybe I missed something, but I’ve been testing and trying for 3 weeks now.


For now my conclusion / suspects are:
  • The motherboard. Either a chip in the ICH10 was fried, or the SMBus got a dent;
  • The motherboard (or ICH10 / SMBus) is just not capable of processing such large amounts of data.
  • My PSU. Mainly, the 12V+ rail. It could be (maybe), that my 12V rail is max.loaded, and when kicking in the extra drive operations, it fluctuates, and tilts it a bit above it maximum, thus crashing my computer. Looking at the symptoms (sudden reboots without any errors) it might be a more plausible cause then all the other things I tried. And yet, if you look at my hardware setup, I cannot imagine I reached the 12V max.
    However, I will be testing this week, by taking another PSU on a second desktop, and connect my hard drives to that power supply. Or maybe even, take an el cheepo GFX card and remove my GTX295, and see if the computer stays stable during copying…

I’ll post the results shortly. In the mean time, if any1 has seen this problem before, or has other ideas / solutions to try, please let me know here and I’ll try them.


Many thanks in advance for thinking with me.

Kind regards,
Kami.
 

Stereodude

Not really a
Joined
Jan 22, 2002
Messages
10,865
Location
Michigan
Based on all the testing you've done, it seems like bad hardware (PSU or mobo).

Have you tried Windows XP or Linux?
 

Howell

Storage? I am Storage!
Joined
Feb 24, 2003
Messages
4,740
Location
Chattanooga, TN
You've done a lot of good work there to narrow down the problem and good ideas for getting more information. Let us know how it turns out.
 

KrazeyKami

What is this storage?
Joined
Jan 2, 2011
Messages
24
@ Howell: Thank you, i will post the results soon. I did check something tho:

According to the PSU calculator, this is the recommended Watts / Amperage for my setup with a full 100% load; Below that is a table of power my PSU can handle:

psu.JPG


Afaik, i can add the 12V rails together, so my PSU can handle max. 128 A on the 12V rail?
This is my PSU: http://www.coolermaster.com/product.php?product_id=2519


Does this mean my PSU should have more then enough power?
I'm still gonna try with a different PSU or GFX card to be sure, but according to the above i think all should be covered... Any thoughts?

@Stereodude:
Regarding the PSU, see above.
Regarding the mobo, that was one of my thoughts as well, altho... everything runs smooth and stable, as long as i don't presure the HD's... So that makes me wonder, if the mobo is to blame... surely it would have shown more issues then, not only with the HD's?
 

KrazeyKami

What is this storage?
Joined
Jan 2, 2011
Messages
24
ps. Stereodude; i also thought of installing WinXP x86, to see if the problem is related to W7 x64. Maybe i will, if all other things fail. Linux is not really an option for me, cause i have not enough experience with Linux to know what the heck im doing, let alone how to toroughly test stuff ;)
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
Hi Kami

You've been meticulous and your further diagnostic proposals are good.

But your power calculations are wrong. Firstly, you can't just add all the current for each 12V line together. See where it says 960W? The total draw on all 12V lines can't exceed that number, which works out to 80A.

Having said that, your PSU is massive overkill. I estimate your power budget to be about 53A, which equates to a good quality 650W PSU. Where did you find the calculator you mentioned?

Of course, that assumes your PSU isn't faulty.

Once you've eliminated the PSU (give the GTX295 a holiday and almost any PSU will do as a test, even 300W), I can't see how it could be anything else other than the motherboard, although it might be interesting to lose the SSD for a while.
 

BingBangBop

Storage is cool
Joined
Nov 15, 2009
Messages
667
My immediate reaction is that you've repeatedly installed the wrong chipset drivers into Windows. What happens if you boot to MS-DOS and transfer ton's of data to and from a pair of drives? Do you still get a reboot? If so, then it is HW and likely the MB or RAM with the PS as a remote third place.
 

Stereodude

Not really a
Joined
Jan 22, 2002
Messages
10,865
Location
Michigan
ps. Stereodude; i also thought of installing WinXP x86, to see if the problem is related to W7 x64. Maybe i will, if all other things fail. Linux is not really an option for me, cause i have not enough experience with Linux to know what the heck im doing, let alone how to toroughly test stuff ;)
Instead of installing Linux, you could use a Linux LiveCD / LiveUSB (like Ubuntu 10.10). You don't have to install anything since it boots from the CD / USB drive. Then, start several data copies going and see what happens.
 

Bozo

Storage? I am Storage!
Joined
Feb 12, 2002
Messages
4,396
Location
Twilight Zone
MS-DOS? :rotfl:

Did we suddenly go back to 1993 or something? :eek:wneddnce:

I still write small xxx.bat scripts to copy a large amount of data. I sure beats all the "are you sure..." that you get with the GUI.
These run from the command prompt. And they seem to run faster.
 

ddrueding

Fixture
Joined
Feb 4, 2002
Messages
19,742
Location
Horsens, Denmark
My first instinct is a slightly goofy PSU. I would second an Ubuntu LiveCD; it is super easy, and copying information is also simple. Motherboard would be a maybe, but PSU first.
 

Howell

Storage? I am Storage!
Joined
Feb 24, 2003
Messages
4,740
Location
Chattanooga, TN
You might give your memory a thorough testing if you haven't recently. One of the stick on the upper end of that 16g may be flakey.

One other cheapish thing to try would be an add-in sata controller.
 

Stereodude

Not really a
Joined
Jan 22, 2002
Messages
10,865
Location
Michigan
You might give your memory a thorough testing if you haven't recently. One of the stick on the upper end of that 16g may be flakey.
He had this in his post "Memory checks. Ran 3 different programs to check / test my memory, ran overnight for hours and hours, multiple passes, 0 errors."
 

KrazeyKami

What is this storage?
Joined
Jan 2, 2011
Messages
24
Problem Solved!!

Hi everyone,

Thank you all for your feedback.

I did do a reinstall of W7, and with 0 additional drivers (nor updates) the problem still persisted. I had my PSU checked, and it's a monster. No way that it could be undervoltage (or watts).

In the end, with some help from an OC'er, i found the problem:

The X58 mobo can't handle 12 GB of RAM.
I removed 6 GB, and the machine is rock-solid again. It seems that in Auto settings on the BIOS, the QPI voltage is too low for 12 GB of RAM and needs to be entered manually. I am going to try the other 6 GB's later tonight to make sure it isn't a faulty DIMM, but the QPI story seems to be on the ball. So, i expect the system stays stable, even with the other (only) 6 GB installed.


I'm still waiting for a reply from OCZ on what the best settings would be to have both types of RAM run on the right voltage and SPD's.

Thank you all for your replies. I hope this post will help others with similair problems.


Kind regards,
Kami.
 

KrazeyKami

What is this storage?
Joined
Jan 2, 2011
Messages
24
Yeah, I don't understand why a good RAM testing program shouldn't have detected the issue.

I used the following 3 programs:

- MemTest86_3.5a
- MemTest84+-4.20
- HCI MemTest

All programs ran for hours overnight, and all gave 0 errors.

However, according to the person that helped me get on this trial, he had the very same issue; The memory seems to be fine in programs like MemTest.

I guess we cannot trust those programs blindfolded in finding this kind of errors....
 

KrazeyKami

What is this storage?
Joined
Jan 2, 2011
Messages
24

It can't. Not without proper manual Voltage settings.

Sure, it boots, it works, all run fine.... and then try to cross-copy 1.5 TB.
Besides, this is all over the OCZ and Asus forum.
Even just now, i got a call from Asus Support, and they confirmed that the Auto setting from Asus mobo's give the OCZ a too low voltage setting, and should be manually upped.

Anyway, the post speaks for itself...
 

MaxBurn

Storage Is My Life
Joined
Jan 20, 2004
Messages
3,245
Location
SC
My point is that OCZ sells out of spec, overclocked memory that may or may not have been tested at all. If you get memory that meets the specifications of the chipset you don't need to do these things.
 

Bozo

Storage? I am Storage!
Joined
Feb 12, 2002
Messages
4,396
Location
Twilight Zone
My point is that OCZ sells out of spec, overclocked memory that may or may not have been tested at all. If you get memory that meets the specifications of the chipset you don't need to do these things.

Looks like a good reason to get out of the memory business.
 

MaxBurn

Storage Is My Life
Joined
Jan 20, 2004
Messages
3,245
Location
SC
I'm sure it was a money based decision, you just can't keep selling crap that doesn't work process all the returns and still make a profit.

From what I understand on most of the drives is they are just a sticker and a vendor channel for the makers, which is a good thing here I guess.
 

ddrueding

Fixture
Joined
Feb 4, 2002
Messages
19,742
Location
Horsens, Denmark
I would have mentioned RAM voltage earlier, as I had a similar issue, but in your post above you mention 1.65v. In every machine I've tried, the RAM defaults to 1.5v which simply isn't stable at higher capacities.
 

MaxBurn

Storage Is My Life
Joined
Jan 20, 2004
Messages
3,245
Location
SC
Isn't JEDEC standard DDR3L memory 1.35V while DDR3 is 1.5V?
 

ddrueding

Fixture
Joined
Feb 4, 2002
Messages
19,742
Location
Horsens, Denmark
The standard is 1.5v, though every module I've purchased has specified 1.65v on the packaging and in the XMP memory profiles. Typically all I need to do is go into the BIOS and enable XMP.
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
There is plenty of 1.5V RAM, it's just that the catalogues are cluttered with fanboy overclock shit, which quite ironically needs 1.65V.

Kingston also makes low voltage RAM ("LoVo") that does 1600MHz at 1.25V and 1866MHz at 1.35V. Highly desirable and hard to get.

If you buy shit RAM, expect to have to push the limits of the CPU (1.65V) to get it to run satisfactorily. Any more than that and you clearly need to bin it (if your CPU is Intel).
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
Possibly, memory checking utilities are still writing and reading one word at a time. That's hardly stressing RAM and isn't how block data is managed, so perhaps they need to be dragged kicking and screaming out of the nineties and into the modern world?

I haven't used one for years because they didn't confirm subtle RAM problems for me.
 

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
17,497
Location
USA
Perhaps the motherboard is defective. 24GB is no problem on a decent X58 board.
 

KrazeyKami

What is this storage?
Joined
Jan 2, 2011
Messages
24
Hi everyone,

Thank you for all your feedback and ideas / suggestions.

Well, i have an update on the issue:

I've tried the 6GB sets seperate from each other. Seems with 6GB, i can stress my PC all i want (high IO, Downloads, etc) and it runs stable without crashing.
So, this rules out the possibility of faulty RAM.

It could very well be, that (one of) the other 3 RAM Sockets is broken / damaged. I'm just not sure if i can start filling the A2, B2, C3 slots. I read that (and afaik it always was like this) you should start with A1, B1, C1 in case of triple memory. Anyway, lets assume the slots are fine. I'm tending to test this tonight tho.

I made a post on the OCZ forum a few days ago, no reply yet: http://www.ocztechnologyforum.com/f...3-10700U-(Both-OCZ3G1333LV6GK)-on-ASUS-P6T-SE

I tested the RAM sets seperately and checked the settings / voltages etc with CPU-Z. This is the result:

3x 2GB PC3-8500F:
PC3-8500F.jpg



3x 2GB PC3-10700F:
PC3-10700F.jpg



I notice that, indeed, the voltage is Auto set to 1.5 volts.
Furthermore, i notice that they run on different timings; 7-7-7 vs 8-8-8.

If i look in the Timings Table, i see that, when comparing e.g. the 8-8-8 settings, they both have different Frequency speeds, different tRAS, different tRC and different tRFC's.

I'm not sure if this would be in indicator to the problem of combining these DIMMS;
Does this mean they'd run on different out-of-sync settings, or are they forced to the same settings? Also, on the sticker of my memory, it says 9-9-9@1,65v.

I hope OCZ will respond soon to these questions. Basicly i'd like to know what the right settings should be to enter manually in the BIOS, and if this won't give any problems with the 2 sets combined.

Oh, i also got a call from an Asus support agent last night; He confirmed that OCZ memory is known to get the wrong / too low Voltages from the Auto Settings. He recommends to manually set the Voltage to 1.65v. Still, i'd like for OCZ to confirm this and also give me the rest of the settings.

I'll keep you all posted, and if some1 has some facts on this, i'm all ears / eyes! :)

Kind regards,
Kami.

p.s.:
Just for the record: seems this problem overall has nothing to do with my HD's, Silent Data Corruption, ICH10(R) or any RAID5 constructions. It was just where i started my search to ID the problem.
 

Bozo

Storage? I am Storage!
Joined
Feb 12, 2002
Messages
4,396
Location
Twilight Zone
It's my understanding that memory with different timings will default to the slowest.
 

Buck

Storage? I am Storage!
Joined
Feb 22, 2002
Messages
4,514
Location
Blurry.
Website
www.hlmcompany.com
I would be interested in seeing if the problem persists with the RAM changed out to either 3 of these: KVR1066D3N7/4G ($42.00 each @ egg) or 3 of these: KVR1333D3N9/4G ($42.00 each @ egg). These are Kingston ValueRam sticks. Both models are on Kingston's compatibility list for your Asus board.
 

MaxBurn

Storage Is My Life
Joined
Jan 20, 2004
Messages
3,245
Location
SC
My jibe yesterday was mostly rhetorical. Yeah, what Buck said and note that both of which are 1.5v modules. Plenty of memory out there that doesn't need to be overvolted. Starting with the HCL is a really good idea too...
 

Howell

Storage? I am Storage!
Joined
Feb 24, 2003
Messages
4,740
Location
Chattanooga, TN
The motherboard manual does suggest filling the memory slots in the following order for best performance:. a1,b,1,c1,a,2,b,2,c2.

You have three memory channels and they are suggesting you spread your memory across the channels. This is all well and good if all of your sitcks of memory are of a uniform type.

You have 6G (3x2G) of one type of stick and 6G (3x2G) of another type of stick. I suspect the voltage and timing is controlled at best per channel and that you are accidentally mixing sticks within a channel.

Since you have 3 channels I would suggest you place two sticks of your fastest memory in channel A, two sticks of your other memory in channel B and test. If that works out OK add one of the remaining sticks to channel C and test. If that works out OK add the remaining stick to channel C and test. Channel C will be a mixed channel and may be unstable.

You may also be able to fix this issue by manually specifying timings and voltages. Beware, there are warnings in the manual (pg. 2-11) about running the voltage at 1.65V
 

KrazeyKami

What is this storage?
Joined
Jan 2, 2011
Messages
24
The motherboard manual does suggest filling the memory slots in the following order for best performance:. a1,b,1,c1,a,2,b,2,c2.

You have three memory channels and they are suggesting you spread your memory across the channels. This is all well and good if all of your sitcks of memory are of a uniform type.

You have 6G (3x2G) of one type of stick and 6G (3x2G) of another type of stick. I suspect the voltage and timing is controlled at best per channel and that you are accidentally mixing sticks within a channel.

Since you have 3 channels I would suggest you place two sticks of your fastest memory in channel A, two sticks of your other memory in channel B and test. If that works out OK add one of the remaining sticks to channel C and test. If that works out OK add the remaining stick to channel C and test. Channel C will be a mixed channel and may be unstable.

You may also be able to fix this issue by manually specifying timings and voltages. Beware, there are warnings in the manual (pg. 2-11) about running the voltage at 1.65V

Hey Howell,

What do you mean by accidently mixed?
I have 6 DIMM slots, from which 3 are Orange colored (thats A1, B1, C1) and 3 are black (A2, B2, C2).

I've tried to run the older memory in A1, B1, C1, and the faster memory in A2, B2, C2. I also switched em around (fast in A1, B1, C1), but to no prevail.
I don't think the memory channels are mixed? It's Triple Channel DDR3, and the types are all on their own channel.

Are you suggesting to place Fast Memory in A1, B1, Slow Memory in A2, B2, and see if that runs? Aren't you mixing the memory then? I believe the channels are divined by the number, not by the letter (ABC is a triple channel).

I also tried to change the settings manually, as suggested by OCZ;

http://www.ocztechnologyforum.com/f...3-10700U-(Both-OCZ3G1333LV6GK)-on-ASUS-P6T-SE

Nothing worked. Im seriously doubting that the memory types are compatible.
The funny thing is, OCZ is still claiming this is exactly the same type; it's even sold at the local store under the same Product number. However, both SiSoft and CPU-Z claim these memory types are different, and both have their own specific speeds and SPD's. Funny fact tho, on the sticker on the modules, it both says PC-10666 - 9-9-9@1,65v.
If you look at my post from Yesterday 09:04 AM, you see the screenshots as how the memory is being detected by my computer.

Im still waiting for a reply from OCZ and i wonder as to how they react. Personally, i'd like to try with 6 DIMMS of the PC3-10700U memory. I have a gut feeling that this is the key to all the problems.
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
You don't understand the meaning of channel. Each channel has its own driver circuit. Howell postulated that each channel may be able to use parameters distinct from the other channels.

Most motherboards allow up to two RAM modules to be connected to each channel. However, this doubles the capacitance and often requires the channel to run at a lower clock frequency than might otherwise be achieved.

The color coding of RAM sockets caters for the majority of PCs where there is only one module per channel. By sticking to the one color, you guarantee that corresponding banks in all channels are populated, leading to maximum performance (the channels can operate concurrently). There have also been boards that wouldn't work properly if you failed to populate a channel.

The colors become meaningless if you choose to populate all the sockets.

When you put a DDR3-1066 module in the orange A1 socket and a DDR3-1333 in the black A2, you are actually mixing RAM types on the one channel. Howell has suggested - as a diagnostic - that you try to keep the same type on each channel.

Having said all that, I believe a modern motherboard such as yours should use the slower timings, rather than just the ones from the first bank - which I think you are mindful of.
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
You may be able to solve your problem by increasing the voltage to the maximum 1.65V - it worked for Ddrueding and others. This increases the signaling levels and therefore the ability of the circuitry to resolve hi/lo transitions on the bus.

Unfortunately, the OCZ staffer has ruined your solution by advising you to overclock your RAM when you are already having trouble keeping it stable at stock settings! Effective memory speed on a s1366 board should be 1066MHz, not 1333MHz. That's what CPU-Z was telling you when it reported 533MHz (1066 / 2).

This forum has a quite astonishing amount of collective technical expertise. Several people have raised concerns about your use of OCZ RAM, as well as dealing with the company. Surely you can now acknowledge that the company has falsely labelled your RAM and then given you incorrect technical advice?

Populating both banks in every channel of a s1366 motherboard requires carefully designed modules with low capacitance. Qualified RAM has been tested to meet that requirement at 1.5V. Food for thought.
 
Top