[Soc-2014-dev] Weekly Report #1 Cycles
Thomas Dinges
blender at dingto.org
Fri May 23 13:14:20 CEST 2014
Hey everyone,
here my report for the first week on Cycles optimizations:
http://wiki.blender.org/index.php/User:DingTo/GSoC_2014/Weekly_Reports/Week1
Best regards,
Thomas
== Pre work ==
I started early with my GSoC, therefore I already worked on some of my
goals.
* Calculate face normal on the fly: Instead of storing the face normal,
we now calculate it during rendering. See commit (6d62837e5bb2). The
performance loss is only ~1-2%, while saving quite some memory. I hope
to speed this up still, but I need to find the right place inside the
BVH traversal still, to check if we can calculate it there and then
store it somewhere (Intersection struct?).
* AVX2 kernel: I added an AVX2 kernel for Intel Haswell CPUs (can also
be used with AMD, as soon as they support it). The AVX2 kernel makes
rendering about 3-5% faster in several scenes. I tested this with clang
on Mac OS with files from our test suite. The AVX2 kernel relies on
AVX2, FMA3, BMI and BMI2 instruction sets, and we use some dedicated
FMA3 intrinsics already in the kernel. More improvements here can
probably be made, but I think it's already a solid basis. See commits
(ac908f6c1f6d, 3844b8f85c7d and caaf0e484da8)
* I also looked into Multi Lamp Sampling for Volumes, and submitted a
first patch. This needs additional work for Equi-angular sampling
though. https://developer.blender.org/D526
== What I did this week ==
This week I spend most of the time on research and tests, but also
looked into the fast inverse sqrt instructions.
* Read some documentation on SIMD intrinsics and C++ code optimization,
thanks to Marcos Sánchez-Dehes for pointing me to these!
http://www.agner.org/optimize/
* I looked into High-Performance timers for benchmarking purposes, but I
don't have a working implementation yet. It looks like each OS might
need its own implementation, e.g. QueryPerformanceCounter on Windows.
Maybe there is a better solution here, some feedback on this would be
appreciated! Probably I should also look into profilers, I am mainly
interested in benchmarking specific code parts or a function, to see
whether a change improves performance or not.
* I started to look into fast inverse sqrt instructions. Here is a
simple patch: http://pasteall.org/51827/diff Performance wise, I need to
do more tests with it, but the render result is slightly different with
the patch. Maybe the solution needs to be refined with one or more
Newton-Raphson steps? Also it looks like we only use 1/sqrt() in the
Microfacet and Ward closure code, which are not really bottlenecks afaik.
== Next week ==
Continue to look into the Face Normal calculation code and start with
uchar attribute support, for things like Vertex colors (to reduce memory
usage).
== Questions ==
See above, mainly some input about profiling would be cool. :)
Thanks!
More information about the Soc-2014-dev
mailing list