[Bf-committers] CPUDevice and implementation selection (native/sse2/sse3)

Mon Feb 11 23:58:42 CET 2013

Two questions..

Currently in CPUDevice/CPUDeviceTask where optimization can be used it
first calls the system_cpu_support_sse2() check, and then if that is
unavailable tries system_cpu_support_sse3(). I guess this is also a
two part in itself - 1) Is it false to assume SSE3 would be better
than SSE2? If SSE3 is better then shouldn't this check be first
followed by the less ideal SSE2 (followed by the even less ideal basic
impl)?  2) If a cpu supports SSE3, would it always also support SSE2,
and effectively never use the SSE3 implementation (as SSE2 always gets
used instead) as-is.

The other thing is since CPUDeviceTask is already OO-based, rather
than doing checks each time an optimizable method is called to
determine what implementation to use, wouldn't it be cleaner to make
CPUDeviceTask semi-abstract and create three sub-classes (e.g.
BasicCPUDeviceTask, SSE2CPUDeviceTask, SSE3CPUDeviceTask) with each
custom impl and just have task_add() [or something] decide which to
create? Depending on how often these methods are called it may or may
not have much time saving (by not doing those checks each time), but
would seem more maintainable than having several related #ifdef's and
system_cpu_support_*()'s scattered about. It might also eventually
help allow other implementations to be dropped in without needing
large chucks of the core CPUDeviceTask modified (i.e. if plugable
support for devices is ever reached/to be reached).

Also, for CPU's that support (and thus require SSE), how hard would it
be to compile the non-optimize calls (and functions) out to reduce the
final executable size, as that code will never be called in these
cases? If the final 'else' part of the 'if/else if' was removed and
used an #else instead (on WITH_OPTIMIZED_KERNEL) for the non-optimized
parts. Ok.. this makes is 3.5 questions total! =)

-Chad