[Bf-cycles] RFC: Deprecate and remove ssef/ssei/sseb ?

Mohamed Sakr 3dsakr at gmail.com
Thu Mar 23 22:45:29 CET 2017

Hi Sergey,

for me, anything that gives performance gain is a +.
from experience, every time I optimize a class to use SSE/vectorization, it
gains so low performance (some times it loses performance!), by a small
factor (less than 4%).
in general the main factor that affects performance is how the memory is
accessed, most of the time the bottle neck is always in the memory
operations not the floating point operation count.

conclusion: do whatever you see right, if it gives performance by a 1% then
it is good to go :)

Mohamed Sakr

On Thu, Mar 23, 2017 at 9:11 PM, Sergey Sharybin <sergey.vfx at gmail.com>

> Hey everyone,
> This topic is inspired by annoyance of having both float{3,4} and ssef
> data types in Cycles. For a long time there was a good reason for that: we
> did not have any vectorization on float3/4 operations because that was
> causing rendering slowdown. But since Blender 2.78b we've got global SSE
> optimization enabled on AVX and AVX2 kernels and to my knowledge we can
> enable it for SSE4.1 kernels. This causes some redundancy and causes the
> following issues:
> - There is now two almost matched code bases: one is vectorization of
> flaot3/4 and other one is ssef
> - Such duplication is increasing risk of two code bases diverging from
> each other: we can fix bug in one of the code paths but not in another.
> - it is not really clear now whether someone need to prefer ssef over
> float4 for his optimzied code.
> - This often causes avoidable duplicated code paths which are ifdef-ed in
> the kernel (one using float4 and other one using ssef).
> Similar notes applies on ssei and sseb as well.
> I think it makes sense to prefer float3/4 nowadays (and their integer and
> boolean analogs) nowadays and retire sse{f,i,b} implementations. This will
> definitely avoid confusion about what data type to use for the new code and
> avoid having vectorization code implemented twice. There is no so many
> places where this types are used in the kernel and in most cases it's quite
> trivial to replace with float4 directly.
> However, there are following downsides:
> - We'll need to support some vectorization instructions on float4, for
> example, len_squared<>(ssef).
> This is quite trivial job, just needs to be done with care. Not so much of
> an issue.
> - In most cases ssef is passed as constant reference.
> This is a bit more tricky. From experiments, passing float4 to a
> force-inlined function does not always avoid copy-constructor from being
> called. This is giving issues in Pluecker intersection code.
> Simplest solution here would be to still have code path if-dfefed and keep
> constant references in there. This wouldn't allow us to merge GPU and CPU
> code paths easily but will get us free from redundant classes without
> performance loss.
> Introduction of constant references we'll need to raise anyway. The only
> stopper here is OpenCL which does not have those. Crazy approach could be
> to have ccl_ref macro, so we can write foo(const float4 ccl_ref bar)
> (similar to ccl_restrict). This will allow us to merge some codepaths
> between CPU and GPU and avoid unwanted copy-constructor overhead on CPU.
> Perhaps this constant reference topic we can save for later and solve
> issues one by one.
> The mail is getting too long now, so let me ask this: what do you guys
> think? Does this ssef to float4 replacement makes sense? Do i miss
> something and we still need to have sse types?
> --
> With best regards, Sergey Sharybin
> _______________________________________________
> Bf-cycles mailing list
> Bf-cycles at blender.org
> https://lists.blender.org/mailman/listinfo/bf-cycles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.blender.org/pipermail/bf-cycles/attachments/20170323/3435aa82/attachment.htm 

More information about the Bf-cycles mailing list