[Bf-committers] SSE/AVX in cloth simulator

Fri Dec 20 12:57:43 CET 2019

> On 19. Dec 2019, at 13:55, Mariusz Pluciński <plucinski.mariusz at gmail.com> wrote:
> 
> 2. I tried explicitly enabling auto vectorization in GCC, but it didn't
> change much. Is that normal? If not, which flags should be used?

Auto vectorization is hit and miss - it can help in straightforward cases, but compilers can’t always automatically find the best code paths. 

> 3. If there's no other way, I may be ready to try to rewrite critical parts
> of the simulator for SSE/AVX. In such case, could you give me a guideline
> on how to do it correctly (with ultimate merge into master in mind)?

The important parts IMHO are:
* leave the scalar part intact for non-x86 architectures such as ARM
* profile first, optimize second. Make sure that you understand what the bottleneck is before jumping the conclusions. SIMD helps when compute is the bottleneck, b it won’t do much if branch mispredictions or cache misses are the performance limiters.

-Stefan