[Bf-committers] Seemingly hugepage-related performance issues

Brecht Van Lommel brechtvanlommel at pandora.be
Thu Nov 22 19:04:26 CET 2012


Hi,

If it starts swapping when disabling transparent huge pages support,
then to me this seems to be an issue which is not specifically related
to huge pages. The memory got fragmented, which means we're in trouble
either way, it can compact memory or start swapping, and they're both
slow.

We could try to hold onto memory longer, but I doubt this would help
and is a bit fishy anyway. It could be nice for this particular case
but not for others, and it's hard to distinguish when it's a good idea
to do it.

The question then becomes how to avoid such fragmentation. On Windows
we also have issues with memory fragmentation (especially 32 bit), and
on Linux we're building now with jemalloc which reduces fragmentation.
The real solution might be to look at each module in Blender and see
how we can avoid big allocations there, in this case that would mean
tiled grids probably.

I think this is one of many things in Blender that could be optimized,
there's various other opportunities to reduce memory usage, and trying
to avoid memory fragmentation is one of them. It would be great to
have this solved but I fear this is something for the list "would be
nice to optimize". Maybe one of the fluid sim developers is interested
in implementing a tiled grid to see if that solves it?

Brecht.

On Thu, Nov 22, 2012 at 1:46 PM, Jonas Wielicki
<j.wielicki at sotecware.net> wrote:
> Hi all,
>
> First off, I did run long tests (i.e. baking) with blender 2.59 when I
> experienced this issue first and I did a short check to verify it's
> still present in blender 2.64a. For full system specs see [3].
> [sidenote: I'm still using blender 2.59 cause that's what my linux
> distribution's (fedora 16) shipping]
> I've been pointed to this mailing list after jumping into the devel irc
> to find out where to discuss this problem.
>
> Description of symptoms
> -----------------------
>
> I've been baking a simulations on a rather decent PC for a few hours
> now. As long as I keep memory use in the mid-terms of the available
> physical memory, everything is fine. However, things start to screw up
> when I go to the upper range, like blender using 80% or more of the
> physical memory (which should be fine, as the remaining isn't used too
> much). In that case, other (memory using) applications often stall
> without any swapping involved.
>
> I've observed blender using htop (it's like top, just more awesome) and
> did some research on the involved kernel thread, khugepaged. When the
> stalls happen, blender, the other stalling application and khugepaged
> are using most of the CPU (with blender and the stalling application
> using 90%--99% of each core and khugepaged totalling to 8% or
> something). Now, using CPU isn't unusual for blender, but it's spending
> the time in the kernelspace instead of the userspace (100% of it), which
> is obviously not desired.
>
> khugepaged is related to a linux kernel feature called Transparent
> Hugepage Memory, about which more information is available here[1]. It
> seems to boil down to try to keep memory for application using lots of
> it as contiguous as possible.
>
> Appearantly, this involves some memory compaction and moving around of
> pages, which I am able to observe using
>
>     watch "cat /proc/vmstat | grep compact_*"
>
> Especially compact_fail and compact_pages_moved are increasing heavily
> (compared to their absolute value) (the values are explained in [1]).
>
> Suggested diagnostic
> --------------------
>
> In theory, compaction should be fine and after a few minutes, everything
> should even out -- the application doing heavy calculations involving
> lots of memory gets its contiguous pages and can crunch the numbers happily.
>
> However, things start to screw up if the application releases and
> allocates large blocks memory alternatingly (possibly only on an in the
> meantime averagely used desktop system (now the first question is
> whether that's actually of interest for the blender project) ),
> especially if the time between the allocation and deallocation is a lot
> smaller than the time needed for the compaction to converge (which may
> be the case with a complex smoke simulation in blender 2.5). See the
> message [2] for some reference that this might be relevant.
>
> Indicators that the diagnostic may be correct
> ---------------------------------------------
>
> Now, blender does exactly that. For each frame of the 256-division smoke
> sim with 2 subdivisions high-resolution noise (and some 32k emitter
> particles involved), blender (de-)allocates the whole memory for each
> frame at the beginning/end of each frame. With hugepaging, this makes
> blender stall for some time during the allocation. Other applications
> trying to allocate larger blocks of memory (firefox, pdf viewers) are
> also pulled into the vortex and get stalled for some time, often shorter
> than blender though.
>
> Observing /proc/$pid/stack of the blender threads points to the
> compaction routines in the kernel too (try_to_compact_pages is in the
> callstack actually).
>
> The specific behaviour of stalling at the start of a frame is _not_
> observed when turning off transparent hugepage support (echo never >
> /sys/kernel/mm/transparent_hugepage/enabled before starting blender),
> _but_ the system starts swapping, possibly because no contiguous memory
> is available for blender.
>
>
> Because this is, as far as I can tell, expected behaviour in the linux
> kernel (inferred from the discussion of the patch at[2]; the patch
> itself is afaik not related to the problem, but the discussion is
> enlightening of the purpose and the effects of hugepaging), I decided to
> go ahead and report this to blender, as it seems this could be fixed by
> changing the memory use behaviour of blender.
>
> I'm not sure what further information I can share with you. If you need
> any additional information snippets, please just ask back. I tried ato
> limit myself to the description of the symptoms and a diagnostic
> inferred from what I've learnt about hugepaging in the last few days.
>
> best regards & looking forward to your replies,
> Jonas
>
>    [1]: http://www.mjmwired.net/kernel/Documentation/vm/transhuge.txt
>    [2]: http://article.gmane.org/gmane.linux.kernel.mm/70032
>    [3]: System specification (possibly relevant parts):
>         blender: 2.59, 2.63a, 2.64a
>         linux: 3.6.6-1.fc16.x86_64
>         graphics (hopefully not relevant): nvidia proprietary 304.60
>         memory (for reference): 8 GB
> _______________________________________________
> Bf-committers mailing list
> Bf-committers at blender.org
> http://lists.blender.org/mailman/listinfo/bf-committers


More information about the Bf-committers mailing list