It looks like recently released 6850/6870 are just slightly tuned 57XX GPUs. There no changes in supported instructions/cache size/double precision (not presented as it was before) or anything else. The only difference I’ve found so far reading forums and papers is — “flow control clauses don’t require as many cycles [as 5XXX]“. Meaning that complex kernels with large number of control clauses may works better at 68XX comparing with 57XX. But it doesn’t matters for hash calculations (well, may be a bit for huge muti-hash lists). Looks like marketing guys won and so it’s 68XX not 67XX family now, though from peak performance point of view it looks weird that 6870 slower than 5850. Of course it has nothing in common with 3D gaming benchmarks. But who buying modern GPUs for gaming these days? :D

Anyway, while 68XX looks totally boring from programming point of view the upcoming 69XX is a different story. It turns out that even Catalyst 10.6 can compile code for mysterious ISA id=15 and resulting disassembly looks very interesting — T unit indeed gone from ATI’s thread processors and XYWZ units now can process instructions they weren’t able to handle before, like 32-bit integer multiplies. It basically means that utilization percentage for ATI GPUs should grow and I’ve decided to check it.

I’ve took 2 GPU kernels to analyze — PBKDF2 (the algorithm core here is just 2xSHA1 transforms) and single MD5. Right now (for 5XXX family) utilization for PBKDF2 is already at 95.5%. After analyzing disassembly for ISA id=15 it turns out that it increased to 99.2%. Also number of instructions reduced by about 1% making final value of 4.6% estimated performance gain.

For MD5 results looks way more impressive — right now it’s really hard to fully utilize all 5 stream cores, I’ve made several tests with different numbers of hashes processed per thread simultaneously and ends with (first and default) value of 4. Utilization in this case is just around 83.5%. But with new 4x stream cores the 4xMD5 hashes can be perfectly vectorized, thus hitting 98.6% utilization value. So it’s ~18% speed-up just from 4+1 to 4x stream cores architecture change. In other words, if they’ll be 69XX with 768 SP @ 850Mhz it should show about 2100M single MD5 speed compared to current 1870M with 5770 (800 SP @ 850Mhz).

Of course, it’s very premature assumptions and there are some chances that ISA id=15 will be even just a myth, who knows. But if you’re planning to update your GPU right now (and use it mainly for GPGPU not games) I suggest to wait for 69XX release.

Updated speed estimations (with above assumptions for 69XX) available here.

Updated ighashgpu available here. It should works with 68XX family now, if it isn’t — run it with /debuglog switch and send “CAL device N, target = XX” value to me.