It&#39;d be interesting to modify my bucketing system to store actual faces/strands per tiles, and not just references to them, and see if zbuffering goes faster.<div><br></div><div>BTW, what about reading one stream, and using it to write another? &nbsp;That&#39;s basically what DSM does in a lot of its code. &nbsp;This is really interesting; I may need to do this sort of optimization (I&#39;m nearing, or at, the limits of what algorithmic improvements can give me). &nbsp;If you can find that google tech talk that&#39;d be really great.</div>

<div><br></div><div>Joe<br><br><div class="gmail_quote">On Fri, Dec 19, 2008 at 1:09 PM, Timothy Baldridge <span dir="ltr">&lt;<a href="mailto:tbaldridge@gmail.com">tbaldridge@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div class="Ih2E3d">&gt; I&#39;m not sure how you&#39;d avoid cache misses though. . .we simply have to deal<br>

&gt; with too much data. &nbsp;About the only thing I can think of is sorting<br>

&gt; faces/strands (I actually do this in my DSM branch) per tile and using a<br>

&gt; more optimal render order then simply going over the scanlines. &nbsp;The ray<br>

&gt; tracing traversal could be made more efficient, but optimizing what the<br>

&gt; renderer does between could be more difficult.<br>

&gt; You know I think the CodeAnalyst profiling tool from AMD can measure cache<br>

&gt; misses, I&#39;ll have to try and figure out how it works.<br>

<br>

</div>You cannot avoid all cache misses, but it is possible to avoid many<br>

cache misses. Modern CPUs load cache lines in 64byte segements. This<br>

means that if you read one byte from memory the CPU really loads<br>

64bytes. Thus, if you can arrange data in such a way that it can be<br>

read and processed as sequential data, the performance will be greatly<br>

enhanced.<br>

<br>

I wish I could find it, but there is an excellent video on youtube<br>

from a Google Tech Talk. In the talk the speaker explains these<br>

caches, and goes to show that reading items from a linked list or<br>

vector can be (IIRC) up to a order of magnitude slower than reading<br>

items from an array. That is if the entire set does not lie in memory.<br>

This is due to the fact that linked lists require allot of jumping<br>

around in memory, which causes the cache to be come less useful.<br>

<font color="#888888"><br>

Timothy<br>

</font><div><div></div><div class="Wj3C7c">_______________________________________________<br>

Bf-committers mailing list<br>

<a href="mailto:Bf-committers@blender.org">Bf-committers@blender.org</a><br>

<a href="http://lists.blender.org/mailman/listinfo/bf-committers" target="_blank">http://lists.blender.org/mailman/listinfo/bf-committers</a><br>

</div></div></blockquote></div><br></div>