[Soc-2016-dev] Weekly Report #12 - Cycles Denoising

Lukas Stockner lukas.stockner at freenet.de
Sat Aug 13 05:47:21 CEST 2016


Hi!

This week, I started over the weekend by doing some minor changes and fixes:
 - Now, Cycles Standalone has a command line option to set the tile size (ffea3f5a in the branch, ef27d8ec in master)
 - Revert a "fix" that actually totally broke Glossy/Glass GGX (10cb9a19)
 - Fix building after the debug_fpe commit (edcf60b4)
 - Fix a wrong calculation of the feature matrix norm (d23f0003)
 - Remove some useless files I added a while back by accident
 - Moving denoise utility and prefiltering functions into separate files to clean up the main file (b7dc25cb)

The next larger feature was standalone denoising of single frames: By rendering with denoising information enabled (no need to activate denoising itself) and saving the result as Multilayer EXR, that EXR can then be denoised by Cycles Standalone to produce a clean output file. In itself, that feature is mainly useful for development, since it allows to pre-render once and just test the filter. (Commit: 3f94371a)

After that, I decided to go for the SIMD kernel optimization next. That resulted in:
 - A fix for a pretty longstanding hidden issue in master, where SSE4.1 function fallbacks were accidentally also used to override the native functions (cf017e81)
 - A few SSE utility functions, like horizontal maximum and sum (741a2453)
 - A SSE4.1-optimized version of the first kernel, which used to take up most of the time. The speedup in that function depends on a few factors, but it's usually about 2-3 times faster (dba99c49).
 - A SSE4.1-optimized version of the NLM prefiltering kernel, which reduces prefiltering time by a factor of about 3 (95fa4836)
Together, these functions make denoising more than twice as fast on compatible processors (pretty much any processor since 2011).

Next, I created a clean implementation of the Blue-Noise dithering patch - now under review at D2149. While doing so, I also fixed a problem in master regarding CUDA texture limits (bbbc079a), fixed the KernelIntegrator structure padding that was wrong since the Light Portal commit (82e65abf) and added a CTest that checks for problems like that in the future (7c3a06c3).

One of the components of D2149, the simulated annealing tool used to precalculate the dither matrix, took a bit of time to optimize - but after a number of improvements, such as approximate math functions and yet another SSE4.1-optimized code path, it now runs 9 times as fast! A simulation pass with 3 Billion iterations is running right now, I'll upload the result once it's done.

After that, I finished the standalone denoising by finally adding the inter-frame denoising mode (1c675f1c, e0208200, 2af90268) - now, the denoiser can use previous and later frames to avoid flickering and produce a better result in general!
The filtering is a bit slow, though - one reason for that is that OIIO actually needs about 10 seconds to read five Multilayer EXRs from the disk, and I don't yet understand why (Disk I/O isn't the bottleneck, I even tried a Ramdisk). Also, it's just more pixels - but that could be improved by doing things like using a smaller half window for secondary frames.



So, since next week is the final GSoC week, I'll spend most of my time on final documentation.
The project in general is in a working shape now, I covered the main parts of the proposal (except for possibly adaptive sampling), but the branch isn't close to being finished and polished yet.

Of course, though, I'll continue to work on it after the GSoC ends - my goal is to get the denoiser into master, after all!

Lukas

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
Url : http://lists.blender.org/pipermail/soc-2016-dev/attachments/20160813/1f0f9404/attachment.pgp 


More information about the Soc-2016-dev mailing list