[Bf-cycles] Req: Access to new branch for rendering on Intel Xeon Phi

Wed Jul 26 11:20:25 CEST 2017

Hi,

Answers are inlined,

 -          Last time i looked into ISPC compiler examples, they were
> requiring adding some special hints on loops and such for an extra
> vectorization. Is that still a case ?
>
> I have to make more tests, because I use OpenMP now. But writing a code is
> much easier than writing intrisics code. We can use this compiler only for
> kernels (like NVCC). I will check how it looks with the future of ISPC
> compiler - GCC compiler has still big problem with SIMD vectorization.
>

I am not a believer that compiler can do optimal intrinsics on its's own.
We would still need to have math utilities implemented in explicit
iinstrinsics. And since that's something we need to do anyway, i'm not sure
why SIMD vectorization of GCC is listed as a concern here?

Question was more about whether it is required to have some ISPC syntax
constructions in the kernel, to get optimal performance. For example, there
are special versions of loops there. Trying to use them in kernel might
make code tricky for other platforms. Or you'll end up with some ISPC
specific kernel sources.

How would that work? Can you elaborate a bit more on this topic?

Also, while we are on the compiler's intrinsics discussion. Did you compare
CPU side performance of kernel compiled with ISPC and GCC? What would be
the performance if we disable explicit intrinsics in kernel?

>  -          Is it only Xeon Phi architecture which will benefit from ISPC
> ?
>
> ISPC is not only for Phi. It could bring the benefit for new CPUs. It
> currently supports the SSE2, SSE4, AVX1, AVX2, AVX512, and Xeon Phi
> "Knight's Corner" instruction sets.
>

Related to previous question, is it worth using ISPC for CPU
implementation? How portable it is? Can you limit instruction set for ISPC,
so we can run on older CPUs?

>  -          How's Xeon Phi performance compares to GTX1080 and RX480 ?
>
> The Xeon Phi has same features like CPU (for example Cosmos Laundromat and
> Agent327 could be rendered on this type of devices). I think there is some
> functionality which OpenCL or CUDA does not support. GPUs have less memory.
>
> KNC is slower (OpenMP without KNC vectorization), but KNL will be faster
> (OpenMP without AVX512 vectorization). The NVIDIA now developing OpenACC
> which is similar to OpenMP directives.
>

It's not a question about feature set. It is a question about speed. The
biggest missing feature is decoupled ray marching on GPU, but it is planned
to fill that gap in. Memory i'm not sure is very relevant here. AMD GPUs
are having 32Gb nowadays, and that figure will only increase. Surely, there
might be more memory on Xeom Phi, but is it economically efficient for
Blener/Cycles users, or they'd just stick to CPU side rendering?

All this depends on speed benefits users will have. The good estimate will
come from benchmarks. Can you get numbers of scenes from our current
benchmark bundle [1] ?

As for feature set, is it possible to get OSL to work on Xeon Phi?

[1] https://code.blender.org/2016/02/new-cycles-benchmark/

-- 
With best regards, Sergey Sharybin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.blender.org/pipermail/bf-cycles/attachments/20170726/aac06b10/attachment.htm