<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>I am a little late however I can confirm that on CUDA 8 and 1060

      (6GB) it does not consistently  increase performance.</p>

    System:<br>

    Debian SID<br>

    Source compiled with gcc 7.2<br>

    Cuda compled with clang-3.8<br>

    cuda-compile tools: 8.0<br>

    <p>

      CUDA_KERNEL_MAX_REGISTERS 48<br>

      BMW: 2:38<br>

      Classroom: 7:43<br>

      Fishy Cat: 7:20<br>

      Koro: 14:03<br>

      Pavillion: 15:12<br>

      <br>

      CUDA_KERNEL_MAX_REGISTERS 64<br>

      BMW: 2:46<br>

      Classroom: 8:03<br>

      Fishy Cat: 7:10<br>

      Koro: 12:46<br>

      Pavillion: 16:06</p>

    <p><br>

    </p>

    <p>Greetings Knork<br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 11/16/2017 12:11 AM, Brecht Van

      Lommel wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAKFUgC3DQM+4ZKmLuz642zj+Mv+xS6g2MaV5x_NO71trec=Eqg@mail.gmail.com">

      <div dir="ltr">Still I suggest to commit this change for CUDA 9,

        checking with __CUDACC_VER_MAJOR__. We can ask NVidia to take a

        look and see if there's a way to get back the performance from

        the early CUDA 9.0.102 release (which was a beta I think). But

        avoiding the major slowdown for now is good.

        <div><br>

        </div>

        <div>Here's a graph relative to CUDA 8 for completeness.</div>

        <div><a href="https://developer.blender.org/F1142667"

            moz-do-not-send="true">https://developer.blender.org/F1142667</a></div>

        <div><br>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Wed, Nov 15, 2017 at 10:37 PM,

          Stefan Werner <span dir="ltr"><<a

              href="mailto:stewreo@gmail.com" target="_blank"

              moz-do-not-send="true">stewreo@gmail.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div dir="ltr">

              <div>Seems to be not just the CUDA version but only the

                chip model. I now ran my benchmarks on a GTX 1060 too,

                there the difference betwen 48 and 64 registers is close

                to nothing:<br>

                <br>

                64 registers:<br>

                BMW: 2m41s<br>

                Classroom: 8m02s<br>

                Fish Cat: 6m39s<br>

                Koro: 11m17s<br>

                Pavillion: 13m38s<br>

                <br>

                48 registers:<br>

                BMW: 2m43s<br>

                Classroom: 7m56s<br>

                Fishy Cat: 6m52s<br>

                Koro: 12m17s<br>

                Pavillion: 13m50s<br>

                <br>

              </div>

              Maybe here it's the ratio of bandwidth/core that makes

              register spilling less costly on the 1060 than on the

              1080Ti?<br>

              <div><br>

              </div>

              <div>Well, there go my dreams of a one-line commit that

                brings 10-20% performance boost.</div>

              <span class="HOEnZb"><font color="#888888">

                  <div><br>

                  </div>

                  <div>-Stefan<br>

                  </div>

                </font></span></div>

            <div class="HOEnZb">

              <div class="h5">

                <div class="gmail_extra"><br>

                  <div class="gmail_quote">On Wed, Nov 15, 2017 at 6:28

                    PM, Brecht Van Lommel <span dir="ltr"><<a

                        href="mailto:brechtvanlommel@pandora.be"

                        target="_blank" moz-do-not-send="true">brechtvanlommel@pandora.be</a>></span>

                    wrote:<br>

                    <blockquote class="gmail_quote" style="margin:0 0 0

                      .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      <div dir="ltr">It seems to be related to the CUDA

                        version, 9.0.176 has a performance regression

                        compared to 9.0.102. Increasing the registers

                        partially compensates for that, but not

                        entirely.

                        <div><a

                            href="https://developer.blender.org/F1141999"

                            target="_blank" moz-do-not-send="true">https://developer.blender.org/<wbr>F1141999</a><br>

                        </div>

                        <div>

                          <div class="m_-5574327464039304880h5">

                            <div class="gmail_extra"><br>

                              <div class="gmail_quote">On Wed, Nov 15,

                                2017 at 12:49 PM, Stefan Werner <span

                                  dir="ltr"><<a

                                    href="mailto:stewreo@gmail.com"

                                    target="_blank"

                                    moz-do-not-send="true">stewreo@gmail.com</a>></span>

                                wrote:<br>

                                <blockquote class="gmail_quote"

                                  style="margin:0 0 0

                                  .8ex;border-left:1px #ccc

                                  solid;padding-left:1ex">

                                  <div dir="ltr">

                                    <div>Wow, those results are almost

                                      the complete opposite of what I'm

                                      seeing. I re-ran the tests on

                                      Linux:<br>

                                      <br>

                                      Nvidia 1080Ti, driver 384.90,

                                      installed as secondary GPU (no

                                      display attached)<br>

                                      Xubuntu 17.04, CUDA 9.0.176, gcc

                                      6.3.0<br>

                                    </div>

                                    master branch,

                                    556b13f03e561b54d4f0186e207f08<wbr>0c786f8b66<br>

                                    <div>

                                      <div><br>

                                        48 registers:<br>

                                        BMW: 1m28s<br>

                                        Classroom: 3m12s<br>

                                        Fish Cat: 3m07s<br>

                                        Koro: 5m40s<br>

                                        Pavillion: 6m52s<br>

                                        Victor: 15m01s<br>

                                        <br>

                                         64 registers:<br>

                                         BMW: 1m11s<br>

                                         Classroom: 2m59s<br>

                                         Fishy Cat: 2m51s<br>

                                         Koro: 4m39s<br>

                                         Pavillion: 5m32s<br>

                                         Victor: 12m19s</div>

                                      <div><br>

                                      </div>

                                      <div>(Victor had a tile size of

                                        32, all others were the

                                        *_gpu.blend files with the

                                        default 256 tile size)<br>

                                      </div>

                                      <div><br>

                                      </div>

                                      <div>On Windows, all GTX cards are

                                        treated as display cards,

                                        regardless of whether a monitor

                                        is plugged in or not. Only

                                        Quadro, Tesla and Titan cards

                                        can be set to TCC, that mode is

                                        not available for my GTX.</div>

                                      <div><br>

                                      </div>

                                      <div>I wonder what's behind the

                                        difference we're seeing? The

                                        GPUs themselves shoudln't be

                                        that different, both are based

                                        on GP102, where only the 1080Ti

                                        has two SMX units disabled.<span

class="m_-5574327464039304880m_-7340134897853140633m_759512126984442797HOEnZb"><font

                                            color="#888888"><br>

                                          </font></span></div>

                                      <span

class="m_-5574327464039304880m_-7340134897853140633m_759512126984442797HOEnZb"><font

                                          color="#888888">

                                          <div><br>

                                          </div>

                                          <div>-Stefan<br>

                                          </div>

                                        </font></span></div>

                                  </div>

                                  <div

class="m_-5574327464039304880m_-7340134897853140633m_759512126984442797HOEnZb">

                                    <div

class="m_-5574327464039304880m_-7340134897853140633m_759512126984442797h5">

                                      <div class="gmail_extra"><br>

                                        <div class="gmail_quote">On Wed,

                                          Nov 15, 2017 at 1:35 AM,

                                          Brecht Van Lommel <span

                                            dir="ltr"><<a

                                              href="mailto:brechtvanlommel@pandora.be"

                                              target="_blank"

                                              moz-do-not-send="true">brechtvanlommel@pandora.be</a>></span>

                                          wrote:<br>

                                          <blockquote

                                            class="gmail_quote"

                                            style="margin:0 0 0

                                            .8ex;border-left:1px #ccc

                                            solid;padding-left:1ex">

                                            <div dir="ltr">Hi,

                                              <div><br>

                                              </div>

                                              <div>The registers were

                                                set based on benchmarks

                                                with a GTX 1080 on

                                                Linux, when we first

                                                optimized the code for

                                                Pascal. But that was

                                                more than a year ago.

                                                Going from 63 to 64

                                                registers should be fine

                                                if it's faster.</div>

                                              <div><br>

                                              </div>

                                              <div>Here's a benchmarks

                                                with a Titan Xp,

                                                Linux, driver 384.90.

                                                Results are not so good

                                                there:</div>

                                              <div>CUDA 8.0.61: <a

                                                  href="https://developer.blender.org/F1137606"

                                                  target="_blank"

                                                  moz-do-not-send="true">https://developer<wbr>.blender.org/F1137606</a></div>

                                              <div>CUDA 9.0.102: <a

                                                  href="https://developer.blender.org/F1137502"

                                                  target="_blank"

                                                  moz-do-not-send="true">https://developer.ble<wbr>nder.org/F1137502</a></div>

                                              <div><br>

                                              </div>

                                              <div>Which driver and CUDA

                                                version are you using?</div>

                                              <div><br>

                                              </div>

                                              <div>One difference

                                                between Windows and

                                                Linux is the compute

                                                preemption support. It

                                                might be useful to test

                                                if that min_blocks *= 8

                                                helps on Windows, if

                                                your GTX 1080Ti is used

                                                for display.</div>

                                              <div><a

                                                  href="https://developer.blender.org/rBe360d003e"

                                                  target="_blank"

                                                  moz-do-not-send="true">https://developer.blender.org/<wbr>rBe360d003e</a></div>

                                              <div><br>

                                              </div>

                                              <div>Regards,</div>

                                              <div>Brecht.</div>

                                              <div><br>

                                              </div>

                                            </div>

                                            <div class="gmail_extra"><br>

                                              <div class="gmail_quote">

                                                <div>

                                                  <div

class="m_-5574327464039304880m_-7340134897853140633m_759512126984442797m_4961958422127873663h5">On

                                                    Tue, Nov 14, 2017 at

                                                    11:48 PM, Stefan

                                                    Werner <span

                                                      dir="ltr"><<a

                                                        href="mailto:stewreo@gmail.com"

                                                        target="_blank"

moz-do-not-send="true">stewreo@gmail.com</a>></span> wrote:<br>

                                                  </div>

                                                </div>

                                                <blockquote

                                                  class="gmail_quote"

                                                  style="margin:0 0 0

                                                  .8ex;border-left:1px

                                                  #ccc

                                                  solid;padding-left:1ex">

                                                  <div>

                                                    <div

class="m_-5574327464039304880m_-7340134897853140633m_759512126984442797m_4961958422127873663h5">

                                                      <div dir="ltr">Hello,

                                                        <div><br>

                                                        </div>

                                                        <div>currently

                                                          the Cuda

                                                          kernel uses

                                                          the same

                                                          launch bounds

                                                          for Pascal (SM

                                                          6.x) as for

                                                          Maxwell (SM

                                                          5.x) hardware,

                                                          that is 63

                                                          registers for

                                                          branched path

                                                          tracing and 48

                                                          registers for

                                                          path tracing.

                                                          Are all of

                                                          those derived

                                                          from

                                                          benchmarks or

                                                          is the value

                                                          for Pascal

                                                          just being

                                                          carried over

                                                          from Maxwell?</div>

                                                        <div><br>

                                                        </div>

                                                        <div>The reason

                                                          I'm asking is

                                                          that I'm

                                                          observing a

                                                          performance

                                                          increase on

                                                          Pascal when I

                                                          increase the

                                                          number of

                                                          registers to

                                                          64 for path

                                                          tracing. Here

                                                          are

                                                          before/after

                                                          benchmarks

                                                          from a GTX

                                                          1080Ti/Win10:</div>

                                                        <div><br>

                                                        </div>

                                                        <div>

                                                          <div>48

                                                          registers (as

                                                          is):</div>

                                                          <div>BMW: 1m52</div>

                                                          <div>Classroom:

                                                          3m31s</div>

                                                          <div>Fishy

                                                          Cat: 4m33s</div>

                                                          <div>Koro:

                                                          8m30s</div>

                                                          <div>Pavillion:

                                                          7m39s</div>

                                                        </div>

                                                        <div><br>

                                                        </div>

                                                        <div>

                                                          <div>64

                                                          registers:</div>

                                                          <div>BMW:

                                                          1m36s</div>

                                                          <div>Classroom:

                                                          3m34s</div>

                                                          <div>Fishy

                                                          Cat: 3m57s</div>

                                                          <div>Koro:

                                                          6m45s</div>

                                                          <div>Pavillion:

                                                          6m39s</div>

                                                        </div>

                                                        <div><br>

                                                        </div>

                                                        <div>With the

                                                          exception of

                                                          the classroom

                                                          scene, all

                                                          benchmarks

                                                          show

                                                          significantly

                                                          better

                                                          performance.

                                                          If there are

                                                          no objections,

                                                          I'd like to

                                                          commit that

                                                          register

                                                          increase for

                                                          SM 6.x to

                                                          master.</div>

                                                        <div><br>

                                                        </div>

                                                        <div>Running the

                                                          same test on a

                                                          Quadro M4000

                                                          (Maxwell)

                                                          shows much

                                                          smaller

                                                          differences,

                                                          so I'd leave

                                                          SM 5.x as is:</div>

                                                        <div><br>

                                                        </div>

                                                        <div>48

                                                          registers (as

                                                          is):</div>

                                                        <div>

                                                          <div>BMW:

                                                          4m38s</div>

                                                          <div>Classroom:

                                                          12m32s</div>

                                                          <div>Fishy

                                                          Cat: 11m18s</div>

                                                          <div>Koro:

                                                          20m38s</div>

                                                          <div>Pavillion:

                                                          21m12s</div>

                                                        </div>

                                                        <div><br>

                                                        </div>

                                                        <div>

                                                          <div>64

                                                          registers:<br>

                                                          </div>

                                                          <div>BMW:

                                                          4m38s</div>

                                                          <div>Classroom:

                                                          13m07s</div>

                                                          <div>Fishy

                                                          Cat: 10m52s</div>

                                                          <div>Koro:

                                                          18m51s</div>

                                                          <div>Pavillion:

                                                          21m32s</div>

                                                        </div>

                                                        <div><br>

                                                        </div>

                                                        <div>Another

                                                          note: 63

                                                          registers was

                                                          a hard limit

                                                          for SM 2.x

                                                          hardware. Is

                                                          63 instead of

                                                          64 as register

                                                          limit for

                                                          kernels SM 3.x

                                                          and higher

                                                          just carried

                                                          over or is

                                                          there a reason

                                                          to not go to

                                                          64 registers?</div>

                                                        <span

class="m_-5574327464039304880m_-7340134897853140633m_759512126984442797m_4961958422127873663m_-2063734483547205841HOEnZb"><font

color="#888888">

                                                          <div><br>

                                                          </div>

                                                          <div>-Stefan</div>

                                                          </font></span>

                                                        <div>PS: I'd

                                                          love it if

                                                          someone would

                                                          sacrifice the

                                                          time to run

                                                          48/64 register

                                                          comparison

                                                          benchmarks on

                                                          other Pascal

                                                          hardware

                                                          and/or on

                                                          Linux.</div>

                                                      </div>

                                                      <br>

                                                    </div>

                                                  </div>

______________________________<wbr>_________________<br>

                                                  Bf-cycles mailing list<br>

                                                  <a

                                                    href="mailto:Bf-cycles@blender.org"

                                                    target="_blank"

                                                    moz-do-not-send="true">Bf-cycles@blender.org</a><br>

                                                  <a

                                                    href="https://lists.blender.org/mailman/listinfo/bf-cycles"

                                                    rel="noreferrer"

                                                    target="_blank"

                                                    moz-do-not-send="true">https://lists.blender.org/mail<wbr>man/listinfo/bf-cycles</a><br>

                                                  <br>

                                                </blockquote>

                                              </div>

                                              <br>

                                            </div>

                                            <br>

______________________________<wbr>_________________<br>

                                            Bf-cycles mailing list<br>

                                            <a

                                              href="mailto:Bf-cycles@blender.org"

                                              target="_blank"

                                              moz-do-not-send="true">Bf-cycles@blender.org</a><br>

                                            <a

                                              href="https://lists.blender.org/mailman/listinfo/bf-cycles"

                                              rel="noreferrer"

                                              target="_blank"

                                              moz-do-not-send="true">https://lists.blender.org/mail<wbr>man/listinfo/bf-cycles</a><br>

                                            <br>

                                          </blockquote>

                                        </div>

                                        <br>

                                      </div>

                                    </div>

                                  </div>

                                  <br>

                                  ______________________________<wbr>_________________<br>

                                  Bf-cycles mailing list<br>

                                  <a href="mailto:Bf-cycles@blender.org"

                                    target="_blank"

                                    moz-do-not-send="true">Bf-cycles@blender.org</a><br>

                                  <a

                                    href="https://lists.blender.org/mailman/listinfo/bf-cycles"

                                    rel="noreferrer" target="_blank"

                                    moz-do-not-send="true">https://lists.blender.org/mail<wbr>man/listinfo/bf-cycles</a><br>

                                  <br>

                                </blockquote>

                              </div>

                              <br>

                            </div>

                          </div>

                        </div>

                      </div>

                      <br>

                      ______________________________<wbr>_________________<br>

                      Bf-cycles mailing list<br>

                      <a href="mailto:Bf-cycles@blender.org"

                        target="_blank" moz-do-not-send="true">Bf-cycles@blender.org</a><br>

                      <a

                        href="https://lists.blender.org/mailman/listinfo/bf-cycles"

                        rel="noreferrer" target="_blank"

                        moz-do-not-send="true">https://lists.blender.org/mail<wbr>man/listinfo/bf-cycles</a><br>

                      <br>

                    </blockquote>

                  </div>

                  <br>

                </div>

              </div>

            </div>

            <br>

            ______________________________<wbr>_________________<br>

            Bf-cycles mailing list<br>

            <a href="mailto:Bf-cycles@blender.org"

              moz-do-not-send="true">Bf-cycles@blender.org</a><br>

            <a

              href="https://lists.blender.org/mailman/listinfo/bf-cycles"

              rel="noreferrer" target="_blank" moz-do-not-send="true">https://lists.blender.org/<wbr>mailman/listinfo/bf-cycles</a><br>

            <br>

          </blockquote>

        </div>

        <br>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Bf-cycles mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Bf-cycles@blender.org">Bf-cycles@blender.org</a>

<a class="moz-txt-link-freetext" href="https://lists.blender.org/mailman/listinfo/bf-cycles">https://lists.blender.org/mailman/listinfo/bf-cycles</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Jan Scheffczy

w: <a class="moz-txt-link-freetext" href="https://knork.org">https://knork.org</a>

</pre>

  </body>

</html>