[Bf-cycles] correlation between GTX680 dual-precision FP and cycles performance?
mjg at see3d.co.uk
Wed Mar 28 10:06:31 CEST 2012
Thank you Brecht, very much appreciated.
On 03/27/12 19:14, Brecht Van Lommel wrote:
> I basically don't know at this point, it requires careful analysis of
> the code running on such a GPU. Maybe it is a matter of tweaking some
> parameters, or maybe the new scheduling really is problem that is very
> difficult to overcome. At least two other GPU raytracers seem to run
> slower on it than the GTX 580, so that is worrying, but I just have
> not done any testing or analysis of Cycles running on this GPU.
> We also don't use 64 bit precision, so that should have no influence.
> On Tue, Mar 27, 2012 at 4:26 PM, Matt Gray <mjg at see3d.co.uk> wrote:
>> Hi Brecht,
>> Hoping for your comment on my speculation as to why the GTX680 is not
>> the Cycles powerhouse a lot of people hoped it would be:
>> It is possible the performance deficit results from nothing more than a
>> lack of software optimisation (CUDA 3.0 for the GTX680?), but the
>> anandtech review of the card listed quite a few other examples where the
>> 680 has been 'detuned' as far as compute is concerned.
>> The brush-stroke summary being that the GTX 680 is Nvidia's first
>> 'efficient' architecture in a long time, at least for gaming, precisely
>> because a lot of the heavy-lifting silicon for compute purposes was
>> removed. This would not be unexpected on what is supposed to be a
>> mid-range GPU like the 460/560 generation, as it was rumoured to be
>> before a spot of re-branding occurred.
>> A few quotes from anandtech:
>> " The CUDA FP64 block contains 8 special CUDA cores that are not part of
>> the general CUDA core count and are not in any of NVIDIA’s diagrams.
>> These CUDA cores can only do and are only used for FP64 math. What's
>> more, the CUDA FP64 block has a very special execution rate: 1/1 FP32.
>> With only 8 CUDA cores in this block it takes NVIDIA 4 cycles to execute
>> a whole warp, but each quarter of the warp is done at full speed as
>> opposed to ½, ¼, or any other fractional speed that previous
>> architectures have operated at. Altogether GK104’s FP64 performance is
>> very low at only 1/24 FP32 (1/6 * ¼), but the mere existence of the CUDA
>> FP64 block is quite interesting because it’s the very first time we’ve
>> seen 1/1 FP32 execution speed. Big Kepler may not end up resembling
>> GK104, but if it does then it may be an extremely potent FP64 processor
>> if it’s built out of CUDA FP64 blocks."
>> "So NVIDIA has replaced Fermi’s complex scheduler with a far more
>> simpler scheduler that still uses scoreboarding and other methods for
>> inter-warp scheduling, but moves the scheduling of instructions in a
>> warp into NVIDIA’s compiler. In essence it’s a return to static
>> scheduling. Ultimately it remains to be seen just what the impact of
>> this move will be. Hardware scheduling makes all the sense in the world
>> for complex compute applications, which is a big reason why Fermi had
>> hardware scheduling in the first place, and for that matter why AMD
>> moved to hardware scheduling with GCN."
>> "What makes this launch particularly interesting if not amusing though
>> is how we’ve ended up here. Since Cypress and Fermi NVIDIA and AMD have
>> effectively swapped positions. It’s now AMD who has produced a higher
>> TDP video card that is strong in both compute and gaming, while NVIDIA
>> has produced the lower TDP part that is similar to the Radeon HD 5870
>> right down to the display outputs."
>> So my questions:
>> 1. Does Cycles use dual precision FP (FP64?)?
>> 2. If not, does the poor performance result scheduler and other
>> architecture deficiencies?
>> 3. If yes, how much of the poor performance derives from the lack of
>> dual-precision grunt?
>> 4. Or, are we jumping the gun branding the GTX680 as poor, and optimised
>> builds will surprise?
>> Many thanks
>> Bf-cycles mailing list
>> Bf-cycles at blender.org
> Bf-cycles mailing list
> Bf-cycles at blender.org
More information about the Bf-cycles