<div dir="ltr"><div><font face="monospace, monospace">Hey Sergey,</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">I am experiencing a approximately 15-20% speed increase with the Fermi cards,</font></div><div><font face="monospace, monospace">sm_20 with GTX 580 &amp; 590. I think that this speed increase is worthwhile switching over from </font><span style="font-family:monospace,monospace">CUDA 7.5 to CUDA 8 for these cards as well.</span></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">Kind Regards</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">Carlo</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">&gt; Hey again,</font></div><div><font face="monospace, monospace">&gt; </font></div><div><font face="monospace, monospace">&gt; Spent majority of the day trying to solve the regression, without much</font></div><div><font face="monospace, monospace">&gt; success. Even simplest kernel needed for BMW scene is about 10% slower.</font></div><div><font face="monospace, monospace">&gt; This is mainly coming from bump nodes. Enabling all other features makes</font></div><div><font face="monospace, monospace">&gt; things even worse performance wise.</font></div><div><font face="monospace, monospace">&gt; </font></div><div><font face="monospace, monospace">&gt; I did some tweaks again to make sure all functions are inlined in the same</font></div><div><font face="monospace, monospace">&gt; manner by CUDA 8.0 as they used to be before. So now PTax output shows</font></div><div><font face="monospace, monospace">&gt; exactly same function, but for some reason spills are just higher with new</font></div><div><font face="monospace, monospace">&gt; toolkit and at the same time stack usage is reasonably slower. Not sure yet</font></div><div><font face="monospace, monospace">&gt; what&#39;s going on here and think we&#39;d better leave this alone for until</font></div><div><font face="monospace, monospace">&gt; official toolkit is released.</font></div><div><font face="monospace, monospace">&gt; </font></div><div><font face="monospace, monospace">&gt; For the time being i&#39;ve switched buildbots to use more complicated setup,</font></div><div><font face="monospace, monospace">&gt; using CUDA 7.5 for all kernels except sm_60 and sm_61 (new generation</font></div><div><font face="monospace, monospace">&gt; cards) and using new toolkit only for new kernels.</font></div><div><font face="monospace, monospace">&gt; </font></div><div><font face="monospace, monospace">&gt; So hopefully now all maxwell and lower crads have same performance as</font></div><div><font face="monospace, monospace">&gt; before. And yet users of new cards can have some degree of GPU rendering.</font></div></div>