[Bf-committers] CPUDevice and implementation selection (native/sse2/sse3)

Tue Feb 12 00:17:03 CET 2013

Hi,

On Mon, Feb 11, 2013 at 11:58 PM, Chad Fraleigh <chadf at triularity.org> wrote:
> Two questions..
>
> Currently in CPUDevice/CPUDeviceTask where optimization can be used it
> first calls the system_cpu_support_sse2() check, and then if that is
> unavailable tries system_cpu_support_sse3(). I guess this is also a
> two part in itself - 1) Is it false to assume SSE3 would be better
> than SSE2? If SSE3 is better then shouldn't this check be first
> followed by the less ideal SSE2 (followed by the even less ideal basic
> impl)?  2) If a cpu supports SSE3, would it always also support SSE2,
> and effectively never use the SSE3 implementation (as SSE2 always gets
> used instead) as-is.

The order should indeed be switched, will commit fix for that. And yes
any CPU supporting SSE3 will support SSE2 in practice.

> The other thing is since CPUDeviceTask is already OO-based, rather
> than doing checks each time an optimizable method is called to
> determine what implementation to use, wouldn't it be cleaner to make
> CPUDeviceTask semi-abstract and create three sub-classes (e.g.
> BasicCPUDeviceTask, SSE2CPUDeviceTask, SSE3CPUDeviceTask) with each
> custom impl and just have task_add() [or something] decide which to
> create? Depending on how often these methods are called it may or may
> not have much time saving (by not doing those checks each time), but
> would seem more maintainable than having several related #ifdef's and
> system_cpu_support_*()'s scattered about. It might also eventually
> help allow other implementations to be dropped in without needing
> large chucks of the core CPUDeviceTask modified (i.e. if plugable
> support for devices is ever reached/to be reached).

Subclassing indeed would be possible, I think it's just a matter of
preference when you do that vs. just adding some if statements. At the
moment I don't think it would help clarify much but if the code gets
bigger it might be a good change.

> Also, for CPU's that support (and thus require SSE), how hard would it
> be to compile the non-optimize calls (and functions) out to reduce the
> final executable size, as that code will never be called in these
> cases? If the final 'else' part of the 'if/else if' was removed and
> used an #else instead (on WITH_OPTIMIZED_KERNEL) for the non-optimized
> parts. Ok.. this makes is 3.5 questions total! =)

I guess it's possible but the plan is to add explicit SSE instructions
eventually, and then it's nice for testing to be able to quickly try
the non-SSE version even if the CPU does not need it. The plan is to
take advantage of sse4/avx in the future too so we'll probably get a
few more options, but this is just for the performance-critical kernel
so I don't think binary size is that important.

Brecht.