Tannin
Storage? I am Storage!
In another thread, I mentioned the excellent performance of the old K6-III CPUs on integer tasks. Then Cas said:
Now this raises an interesting question. My main workstation/server was a K6-III 500 for a long, long time. (Actually a 450+ overclocked to about 560 - 5x multiplier, 112MHz FSB.) I replaced the K6-III several times but could never actually make the thing go faster in all circumstances than the old K6-III. I slipped in a Duron 700, an Athlon 1000, an 1100, played with P-IIIs, even a Thunderbird 1200 and a 1333 or 1400, but I always kept getting dissatisfied with the performance and downgrading it back to the old K6-III again. The bigger Athlons were faster at some tasks, of course, but it wasn't until I went XP 1700 and DDR that I finally had a machine that was faster in all circumstances.
Now in explaining this to someone - a customer, I think it was, I remember being asked how was it that the XP could be faster when it only had 384k of cache, as against the K6-III's 1.3MB. And I remember over-simplifing it by saying that, given the vastly greater clock speed of the 266MHz DDR as compared to the 100 (or 112) MHz tertiary cache RAM on the K6 board, it wouldn't be too far off the mark to think of the Athlon's RAM as being "all cache".
Was I oversimplifing? I know the clock speeds, of course, but what about the latency? I've seen figures for bits and pieces of this here and there around the traps - notably in an excellent article over at Ace's comparing the caching strategies of the Athlon Classic, the Coppermine P-III, and the then-unreleased Athlon Thunderbird - but never managed to gather it all together in one place.
How many CPU wait states are involved with a secondary cache miss for an XP 1800 running 266MHz DDR, for example. And how does that compare with a secondary cache miss (i.e., a tertiary cache access) for a K6-III? And so on.
Anyone care to write a mini-thesis on this?
...my 400MHz K6-III on a P5A outperformed my 500MHz Xeon box handily with one of its processor’s disabled. Of course, even the K6-III couldn’t keep up with the dual configuration. Fortunately the datasets for compilation tend to fit nicely in to the K6-III’s cache. I suspect that the device would not have faired so well in a database server. Despite its impressive cache hierarchy, performance dropped off quickly for applications with larger datasets. The Xeon offered almost twice the memory bandwidth.
Now this raises an interesting question. My main workstation/server was a K6-III 500 for a long, long time. (Actually a 450+ overclocked to about 560 - 5x multiplier, 112MHz FSB.) I replaced the K6-III several times but could never actually make the thing go faster in all circumstances than the old K6-III. I slipped in a Duron 700, an Athlon 1000, an 1100, played with P-IIIs, even a Thunderbird 1200 and a 1333 or 1400, but I always kept getting dissatisfied with the performance and downgrading it back to the old K6-III again. The bigger Athlons were faster at some tasks, of course, but it wasn't until I went XP 1700 and DDR that I finally had a machine that was faster in all circumstances.
Now in explaining this to someone - a customer, I think it was, I remember being asked how was it that the XP could be faster when it only had 384k of cache, as against the K6-III's 1.3MB. And I remember over-simplifing it by saying that, given the vastly greater clock speed of the 266MHz DDR as compared to the 100 (or 112) MHz tertiary cache RAM on the K6 board, it wouldn't be too far off the mark to think of the Athlon's RAM as being "all cache".
Was I oversimplifing? I know the clock speeds, of course, but what about the latency? I've seen figures for bits and pieces of this here and there around the traps - notably in an excellent article over at Ace's comparing the caching strategies of the Athlon Classic, the Coppermine P-III, and the then-unreleased Athlon Thunderbird - but never managed to gather it all together in one place.
How many CPU wait states are involved with a secondary cache miss for an XP 1800 running 266MHz DDR, for example. And how does that compare with a secondary cache miss (i.e., a tertiary cache access) for a K6-III? And so on.
Anyone care to write a mini-thesis on this?