[Soc-2016-dev] Weekly Report #12 - Cycles Denoising

Sergey Sharybin sergey.vfx at gmail.com
Mon Aug 15 09:24:56 CEST 2016


Hi,

When you mention using intrinsics to speed something up it's always
interesting to know what kind of speedup this gives.

Also while blue noise is an interesting experiment, it's not a part of GSoC
so not sure what it is doing in the report.

In Blender we use threaded EXR decoding, so that could be a reason why OIIO
is poor in performance. It had some options for threading AFAIR, so make
sure the IO is threaded there.

On Sat, Aug 13, 2016 at 5:47 AM, Lukas Stockner <lukas.stockner at freenet.de>
wrote:

> Hi!
>
> This week, I started over the weekend by doing some minor changes and
> fixes:
>  - Now, Cycles Standalone has a command line option to set the tile size
> (ffea3f5a in the branch, ef27d8ec in master)
>  - Revert a "fix" that actually totally broke Glossy/Glass GGX (10cb9a19)
>  - Fix building after the debug_fpe commit (edcf60b4)
>  - Fix a wrong calculation of the feature matrix norm (d23f0003)
>  - Remove some useless files I added a while back by accident
>  - Moving denoise utility and prefiltering functions into separate files
> to clean up the main file (b7dc25cb)
>
> The next larger feature was standalone denoising of single frames: By
> rendering with denoising information enabled (no need to activate denoising
> itself) and saving the result as Multilayer EXR, that EXR can then be
> denoised by Cycles Standalone to produce a clean output file. In itself,
> that feature is mainly useful for development, since it allows to
> pre-render once and just test the filter. (Commit: 3f94371a)
>
> After that, I decided to go for the SIMD kernel optimization next. That
> resulted in:
>  - A fix for a pretty longstanding hidden issue in master, where SSE4.1
> function fallbacks were accidentally also used to override the native
> functions (cf017e81)
>  - A few SSE utility functions, like horizontal maximum and sum (741a2453)
>  - A SSE4.1-optimized version of the first kernel, which used to take up
> most of the time. The speedup in that function depends on a few factors,
> but it's usually about 2-3 times faster (dba99c49).
>  - A SSE4.1-optimized version of the NLM prefiltering kernel, which
> reduces prefiltering time by a factor of about 3 (95fa4836)
> Together, these functions make denoising more than twice as fast on
> compatible processors (pretty much any processor since 2011).
>
> Next, I created a clean implementation of the Blue-Noise dithering patch -
> now under review at D2149. While doing so, I also fixed a problem in master
> regarding CUDA texture limits (bbbc079a), fixed the KernelIntegrator
> structure padding that was wrong since the Light Portal commit (82e65abf)
> and added a CTest that checks for problems like that in the future
> (7c3a06c3).
>
> One of the components of D2149, the simulated annealing tool used to
> precalculate the dither matrix, took a bit of time to optimize - but after
> a number of improvements, such as approximate math functions and yet
> another SSE4.1-optimized code path, it now runs 9 times as fast! A
> simulation pass with 3 Billion iterations is running right now, I'll upload
> the result once it's done.
>
> After that, I finished the standalone denoising by finally adding the
> inter-frame denoising mode (1c675f1c, e0208200, 2af90268) - now, the
> denoiser can use previous and later frames to avoid flickering and produce
> a better result in general!
> The filtering is a bit slow, though - one reason for that is that OIIO
> actually needs about 10 seconds to read five Multilayer EXRs from the disk,
> and I don't yet understand why (Disk I/O isn't the bottleneck, I even tried
> a Ramdisk). Also, it's just more pixels - but that could be improved by
> doing things like using a smaller half window for secondary frames.
>
>
>
> So, since next week is the final GSoC week, I'll spend most of my time on
> final documentation.
> The project in general is in a working shape now, I covered the main parts
> of the proposal (except for possibly adaptive sampling), but the branch
> isn't close to being finished and polished yet.
>
> Of course, though, I'll continue to work on it after the GSoC ends - my
> goal is to get the denoiser into master, after all!
>
> Lukas
>
>
> _______________________________________________
> Soc-2016-dev mailing list
> Soc-2016-dev at blender.org
> https://lists.blender.org/mailman/listinfo/soc-2016-dev
>
>


-- 
With best regards, Sergey Sharybin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.blender.org/pipermail/soc-2016-dev/attachments/20160815/14105629/attachment.htm 


More information about the Soc-2016-dev mailing list