Ryzen

Stereodude · Mar 10, 2020

I watched it yesterday and marveled at the stupidity of it. Anyone dropping that much on a system can afford a few hundred dollars extra for a much more proper chassis like a Norco 4224.

Handruin · Mar 10, 2020

I don't even think the Norco is appropriate for $10k worth of drives. I would at least consider a decent Supermicro with a quality SAS backplane and expander.

Stereodude · Mar 10, 2020

Handruin said:
I don't even think the Norco is appropriate for $10k worth of drives. I would at least consider a decent Supermicro with a quality SAS backplane and expander.

Well, I did call it a more proper chassis.

That'd be the minimum for such a build IMHO. As you mention Supermicro would be the right choice. My main server is in a Supermicro 4U chassis. My backup server is in a 4U Norco.

I also don't understand why he didn't use a SAS expander connected (dual link) to a one SAS HBA instead of using a pile of HBA's with 8 drives on each. There wouldn't have been any performance difference and he wouldn't need all those PCIex16 slots. Then it wouldn't have had to run headless either.

Handruin · Mar 10, 2020

Fair...but then there would be a less click-bait title to drive ad revenue for Linus.

I do enjoy when he tries some outlandish stuff but this one wasn't enough on the outlandish side to be that interesting. If he had done a collaboration with some local metal working fabrication company and built some ridiculous enclosure, then I'd be on board with that kind of stupidity.

Stereodude · Mar 10, 2020

Handruin said:
Fair...but then there would be a less click-bait title to drive ad revenue for Linus. I do enjoy when he tries some outlandish stuff but this one wasn't enough on the outlandish side to be that interesting. If he had done a collaboration with some local metal working fabrication company and built some ridiculous enclosure, then I'd be on board with that kind of stupidity.

He bought a ridiculous metal "enclosure" in this one.

snowhiker · Mar 10, 2020

Is Linus a truly a dope or just pretends to be a dope so people will feel superior watching his videos and want to watch more to see what dopey thing he will do next??!

I guess with 10M YouTube subs he's doing something right.

Stereodude · Mar 11, 2020

Handruin said:
I'll be curious to get your feedback on the Scythe Fuma 2 when you're done your build and test it for a while. This CPU has a decent amount of heat to get ride of and I wanted to keep it reasonably quiet without dealing with AIO water coolers.

I saw this the other day

Handruin · Mar 11, 2020

I like the HC channel, they're pretty decent. Seems like a good value & performer for a HSF.

Stereodude · Mar 16, 2020

I'm trying to track down some crashes while encoding with x265 on the 3950X. I'm nervous that it's an underlying hardware issue. I made some changes to the encoding settings and now I get crashes if I run too many encodes at the same time (that really saturate the processor). I'm running 8 encoding jobs simultaneously and I had over 5 crashes in the course of 24 hours before I stopped restarting the crashed segments. Running only 4 at a time with roughly 50% CPU usage on the same segments that crashed previously completed without issue. Now I'm trying to test if the encoding settings are triggering or exposing the issue (or not). There are a lot of variables to rule out ignoring potential underlying HW causes. Encoding switches, x265 builds, how the video is fed to the encoder, etc.

Earlier I did see that occasionally one of the P95 stress test threads would stop and report an error when stress testing the system. I didn't see that with the first motherboard that I exchanged for the 2nd.

Handruin · Mar 16, 2020

Those types of instability are a huge pain to get to the bottom of. Maybe try uncoupling the infinity fabric and see if it becomes more stable.

I've had a wonky instability when I was using the Ryzen PBO and when I would turn off my system overnight. When I turned it back on the next day, the system would not post for 3 attempts and then set the BIOS back to defaults. I'd have to reset the RAM XMP profile and the infinity fabric coupled mode. It's not an issue now since I'm not using PBO but still weird that it would only happen after the system was powered off for some amount of time.

Stereodude · Mar 16, 2020

*fingers crossed* I think I may have isolated which command line switch for x265 is causing the issue to appear. Now I need to see if a different build of the same x265 version (different compiler) has the same behavior or not. I also need to run this out longer to confirm that I haven't just gotten lucky.

Stereodude · Mar 17, 2020

So apparently I got lucky. With the different command line switches I still got a crash, it just took much longer to happen. With the other switches I got 5 crashes in 24 hours (before I stopped restarting the crashed encodes) vs. 1. I did find another build of the same version of x265 that I'm trying now just to rule out some odd compiler bug. Presuming I still get crashes (which I expect I will) my first step will be to not use the XMP profile for the memory and see what happens.

snowhiker · Mar 17, 2020

SD your random computer lockups are spreading like a virus. I just had a strange lockup myself a few minutes ago. I tried to edit my long post in the COVID-19 thread and computer freezes. CTRL-ALT-DEL doesn't do anything. Even the hardware reset button on my case didn't reboot computer right away. Always been an instant computer restart after reset button is pressed. I pressed it several times and finally held it in for 10-15 seconds before computer finally rebooted. A real WFT? moment when case reset button doesn't immediately work.

Stereodude · Mar 17, 2020

I got a crash of x265 with a build using a different compiler of the same x265 version... Looks like the system isn't quite stable under 100% load (with no overclocking). I'll admit I didn't stress test the 2nd motherboard with a variety of memory tests like I did the first one. Given that it's not overclocked the XMP profile on the memory is my prime suspect candidate for the problem.

sedrosken · Mar 18, 2020

I'm actually quite happy with my stability -- since I got everything dialed in, not one crash or lockup, and I've been periodically folding and stress testing with mPrime and big compilation jobs using all threads. For a cheap upgrade this is really looking nice.

Gödel · Mar 18, 2020

Stereodude said:
I got a crash of x265 with a build using a different compiler of the same x265 version... Looks like the system isn't quite stable under 100% load (with no overclocking). I'll admit I didn't stress test the 2nd motherboard with a variety of memory tests like I did the first one. Given that it's not overclocked the XMP profile on the memory is my prime suspect candidate for the problem.

In the words of an ex-president, "I feel your pain." How distressing to have it run OK for many hours before flaking out. I assume that you (and snow and handy) are using quality components, and you already checked the RAM "with a variety of memory tests", and presumably "reseated" connections, etc.

But offhand, to me, your issues seem unlikely to be caused by software. Wouldn't that, including the compiler, have been tested by many users in a variety of situations? Or do you think it might be due to some arcane multi-thread-handling "glitches" that are extremely rare?

If you back down from XMP to "hard code" the various RAM params in BIOS, and that fixes the problem for you, that would be great, but it would still leave the nagging problem that XMP wasn't being handled correctly for your hardware.

Stereodude · Mar 18, 2020

It's definitely not the x265 compiler build. I got a crash with another build also. I've had the whole system reboot from a bluescreen twice in the past 24 hours. Once while running a Prime95 torture test, the other time x265. I also saw a failure in Prime95 where it detected a HW failure. My concern is that the Prime95 failure occured using one of the small tests which doesn't use system memory and fits entirely within the cache on the chip. I haven't slowed the memory down yet. I about to go make the change. It seems like it's getting worse (less stable) over time.

Edit: I already got a Prime95 HW failure detected within 1 minute of running with the memory at the slow non-XMP clocks.

Code:

[Wed Mar 18 16:13:30 2020]
FATAL ERROR: Resulting sum was 6.058846950323651e+016, expected: 5.022461721491425e+016
Hardware failure detected, consult stress.txt file.

Stereodude · Mar 18, 2020

I got a second. I think they're always the same worker. I've definitely seen #25 several times. Maybe I saw #26 once.

Code:

[Wed Mar 18 17:17:34 2020]
FATAL ERROR: Rounding was 0.5, expected less than 0.4
Hardware failure detected, consult stress.txt file.

Stereodude · Mar 18, 2020

Based on some Googling it seems like I may have a bad CPU. I guess there are a few more things I can try like running on only one stick of RAM (trying each stick by itself) and clearing the CMOS but it's not looking promising.

snowhiker · Mar 18, 2020

^^^ That blows.

sedrosken · Mar 19, 2020

Ouch. Really sorry to see that. Hope the RMA process goes smoothly with all of the... current events going on.

Chewy509 · Mar 19, 2020

Sucky, but out interest have you confirmed your DIMM Rank setup?

I've found that some boards will dictate that for all DIMMs to be populated, that only SR (single rank) DIMMs should be used? (Just double check the DIMMs, their rank and what the motherboard says is supported for your CPU). It's a long shot, but you never know.

I updated my system at work with a Ryzen 9 3900X based setup about 2 mths ago, and never had any crashes, or lockups? The system has been rock stable. (running Debian 10).

Stereodude · Mar 19, 2020

Chewy509 said:
Sucky, but out interest have you confirmed your DIMM Rank setup?

I've found that some boards will dictate that for all DIMMs to be populated, that only SR (single rank) DIMMs should be used? (Just double check the DIMMs, their rank and what the motherboard says is supported for your CPU). It's a long shot, but you never know.

I'm not sure where I'd find that information. The manual doesn't have it. The memory QVL list has single and dual rank memory on it shown as being supported in 1's and 2's. The memory I'm using isn't on the list.

There was a later BIOS for the motherboard. It did not solve the issue. I'm testing it with 1 sticks of RAM at a time now (only have 2).

Stereodude · Mar 19, 2020

Single sticks of RAM failed in two different slots in the motherboard.

I guess I can try adding more voltage or underclocking. The voltage settings for these CPUs are as clear as mud. When underclocking via a set multiplier it looks like the "turbo" boost doesn't kick in.

Stereodude · Mar 19, 2020

I bumped Vcore by 0.05V and looks like that might be enough to do the trick. Running a much longer test now. Of course that's slowed the chip down slightly under full load by ~50-75MHz & increased the temps by 1-2C.

Chewy509 · Mar 20, 2020

Your DIMMs are dual rank: https://www.pic-upload.de/view-33001480/F4-3200C16D-32GVK.jpg.html

And if you look at ASRock's page for a similar board: https://www.asrock.com/mb/AMD/X570 Taichi/#Specification

Under DIMMs, 4x Dual Rank DIMMs is limited to DDR4-2666... (I don't know why Gigabyte don't publish the same list).

The DIMMs are rated at 1.2V, but will only achieve speed at 1.35V (so check the voltage in the BIOS). So drop the speed to 2666, ensure the voltage is 1.35V and see how you go.

Stereodude · Mar 20, 2020

Chewy509 said:
Your DIMMs are dual rank: https://www.pic-upload.de/view-33001480/F4-3200C16D-32GVK.jpg.html

And if you look at ASRock's page for a similar board: https://www.asrock.com/mb/AMD/X570 Taichi/#Specification

Under DIMMs, 4x Dual Rank DIMMs is limited to DDR4-2666... (I don't know why Gigabyte don't publish the same list).

The DIMMs are rated at 1.2V, but will only achieve speed at 1.35V (so check the voltage in the BIOS). So drop the speed to 2666, ensure the voltage is 1.35V and see how you go.

That link is to the specs of the wrong memory. But, I'm way past that point anyhow. I had them running at 2133 for most of the testing and only with a single stick at the end and the failures persisted. Also, I only have 2 sticks, not 4 sticks.

It has run for over 12 hours without any crashes or errors with the +0.05V Vcore boost with both sticks of RAM back in the system running the XMP profile at 3600MHz. I'm going to let it run for 24 hours. I probably won't RMA the CPU until the Wuhan Flu nonsense is over.

Stereodude · Mar 21, 2020

I got >24 hours without an error in Prime95 small FFT with the +0.05V Vcore boost on the R9 3950X. Now I'm trying a x265 encoding session.

Chewy509 · Mar 21, 2020

Certainly sounds like a bad memory controller on the CPU, or a bad motherboard (power delivery or DIMM related). Glad to see the Vcore adjustment appears to be working.

snowhiker · Mar 22, 2020

Since you had to bump the voltage a bit, is there any "test points/contacts" of the MB you can manually check voltages with a multimeter and compare to reported voltages on BIOS screen? Perhaps the CPU/RAM is being unintentionally under-volted and the voltage drops enough to cause a crash?

Stereodude · Mar 22, 2020

Chewy509 said:
Certainly sounds like a bad memory controller on the CPU, or a bad motherboard (power delivery or DIMM related). Glad to see the Vcore adjustment appears to be working.

Vcore doesn't change the memory voltage. It changes the CPU core voltage. I don't get the connection you're making.

Chewy509 · Mar 22, 2020

Vcore is used by the memory controller in the CPU, does it not?

Stereodude · Mar 22, 2020

Chewy509 said:
Vcore is used by the memory controller in the CPU, does it not?

No, there's a separate voltage control for the memory controller.

Stereodude · Mar 27, 2020

I'm about to throw this system in the trash. I've never had a system this unstable and problematic. If I could get 75% of my money back and be rid of it, I'd do it heartbeat and consider the 25% the cost of relearning my lesson on AMD.

Now I'm noticing (for about the past week or so) that after about a day or so of uptime the task manager will just stop updating. Today it did that plus newly launched programs failed to appear in the taskbar. The taskbar was still otherwise totally functional. Programs launched yesterday could be manipulated from the taskbar and showed up in it with their icon.

I'm still having network issues. Iperf still says everything is fine, but I can tell something is up. I've been using it primarily by remote desktop and the mouse will jump around, fail to respond to mouse clicks, have a delay in responding, etc. I can tell there's something wrong like dropped packets or packet latency issues or interrupt latency issues. I use another PC primarily by remote desktop and have never have these issues, so it's the system, not RD.

I did a sfc /scannow and it says everything is fine.

I'm pretty much to the point of clean installing Windows 10 as a last troubleshooting step before I just consider this another AMD lesson learned and try to recoup what I can out of it.

I plan to:
1) Run some prolonged memory tests outside of Windows
2) Take out / disconnect some of the extra non essential hardware (10gig nic, optical drive, HDDs) and see if anything improves/changes
3) Clean install Windows 10
...
x) Have a bonfire

Gödel · Mar 27, 2020

Aw, κραπ, first the hardware problems, and now Windows OS problems. Did you have task manager running "full-time"?

Stereodude · Mar 27, 2020

Newtun said:
Aw, κραπ, first the hardware problems, and now Windows OS problems. Did you have task manager running "full-time"?

What do you mean running "full-time"? It was sitting open in the bottom corner of the desktop from about the time I booted up the computer. Typically closing it and re-opening it is possible, but it doesn't update. This morning it wouldn't close (it hung not responding). I've never seen this sort of behavior on another Windows 10 system. Even ones that have been up for weeks under a heavy load.

Gödel · Mar 27, 2020

Yeah, that's what I meant. I usually start it only when I want to do some "spot-checking" (with <Ctrl><Shift><Esc>), then exit it again (<Esc>).

Stereodude · Mar 27, 2020

Newtun said:
Yeah, that's what I meant. I usually start it only when I want to do some "spot-checking" (with <Ctrl><Shift><Esc>), then exit it again (<Esc>).

I don't think that should matter. It sits open for weeks on my Xeon based system with no similar issue.

snowhiker · Mar 28, 2020

Flakey motherboard and/or CPU.

Gödel · Mar 28, 2020

Yeah, very vexing to have such a fundamental program, seemingly not requiring much processing power, flake out.

Ryzen

Not really a

Administrator

Not really a

Administrator

Not really a

Storage Freak Apprentice

Not really a

Administrator

Not really a

Administrator

Not really a

Not really a

Storage Freak Apprentice

Not really a

Florida Man

Storage is nice, especially if it doesn't rotate

Not really a

Not really a

Not really a

Storage Freak Apprentice

Florida Man

Wotty wot wot.

Not really a

Not really a

Not really a

Wotty wot wot.

Not really a

Not really a

Wotty wot wot.

Storage Freak Apprentice

Not really a

Wotty wot wot.

Not really a

Not really a

Storage is nice, especially if it doesn't rotate

Not really a

Storage is nice, especially if it doesn't rotate

Not really a

Storage Freak Apprentice

Storage is nice, especially if it doesn't rotate