[Bf-committers] SSE/AVX in cloth simulator
stewreo at gmail.com
Fri Dec 20 12:57:43 CET 2019
> On 19. Dec 2019, at 13:55, Mariusz Pluciński <plucinski.mariusz at gmail.com> wrote:
> 2. I tried explicitly enabling auto vectorization in GCC, but it didn't
> change much. Is that normal? If not, which flags should be used?
Auto vectorization is hit and miss - it can help in straightforward cases, but compilers can’t always automatically find the best code paths.
> 3. If there's no other way, I may be ready to try to rewrite critical parts
> of the simulator for SSE/AVX. In such case, could you give me a guideline
> on how to do it correctly (with ultimate merge into master in mind)?
The important parts IMHO are:
* leave the scalar part intact for non-x86 architectures such as ARM
* profile first, optimize second. Make sure that you understand what the bottleneck is before jumping the conclusions. SIMD helps when compute is the bottleneck, b it won’t do much if branch mispredictions or cache misses are the performance limiters.
More information about the Bf-committers