[Bf-cycles] osx + ati + cycles, looking for insights in debugging

Jens Verwiebe info at jensverwiebe.de
Wed Dec 7 17:42:07 CET 2011

As a hint i found these 2 writings:

So about that float padding[2]; in the struct definition. This is because of memoryalignment in OpenCL. The best explanation I’ve seen so far is by AndreasStahl which I will briefly summarize in relation to the struct above.
When interpreting a struct, OpenCL accesses the memory in blocks of 16 bytes, which is the same as 4 floats (each 4 bytes). So in our example if we did not have the padding, we would not be able to access our int because opencl would have interpreted it as the 3rd float out of the first 16bytes. This can get even more complicated if you have an array of structs, because then the size of you’re struct will need to be a multiple of 16, as explained in the linked forum post.
The orig AMD posting:


Conclusion: There are indeed places where the compiler does not take care of alignment ( bug? ), so we must do ourselfes.


Am 07.12.2011 um 16:38 schrieb Brecht Van Lommel:

> Hi,
> Maybe it would be possible to set up an SSH login on a computer so I
> can try and narrow this down further? There has to be a pattern here,
> but I'm not sure what to suggest you to test, what I would do is to
> keep removing code until it works, and find out exactly which kind of
> struct variable layout or operation is causing issues.
> On Wed, Dec 7, 2011 at 7:11 AM, Dalai Felinto <dfelinto at gmail.com> wrote:
>> "Good news", I believe both problems may indeed be related and of alignment.
>> At least the first one.
>> If I change the alignment in kernel_type.h in simple ways [*] I get the Z
>> problem fixed, but the viewport broken in situations it was working before.
>> If I go to camera fly mode the viewport is fine ('dirty' but fine) until I
>> stop then I get the (2) problem. So what are the rules for alignment here?
>> [*] http://www.pasteall.org/27124
> The alignment issue I was thinking of is when passing structs between
> the C++ and OpenCL. If they layout variables in structs differently,
> copying KernelData to the GPU would go wrong. So what I've tried to do
> is align everything like we do in makesdna, and make all struct sizes
> a multiple of 16 bytes.
> What is quite strange, is that the struct where you added alignment
> does not get passed between C++ and OpenCL, it stays purely in OpenCL,
> and so I wouldn't expect this to be a problem. Maybe there's a
> compiler bug related to nested structs?
>> From the specification:
> "The OpenCL compiler is responsible for aligning data items to the
> appropriate alignment as required by the data type. For arguments to a
> __kernel function declared to be a pointer to a data type, the OpenCL
> compiler can assume that the pointee is always appropriately aligned
> as required by the data type."
> So this means that in principle we shouldn't have to worry about
> alignment of structs that stay on the GPU, and only for KernelData we
> need to worry about alignment. But of course compiler bugs can make
> this an issue.
> "For 3-component vector data types, the size of the data type is 4 *
> sizeof(component). This means that a 3-component vector data type will
> be aligned to a 4 * sizeof(component) boundary."
> This seems to indicate that differential3 should in principle be
> aligned already. We can add padding in various structs, but according
> to the spec adding float2 to differential3 shouldn't actually improve
> alignment, so I'm not sure what the rule would be.
>> Regarding (2) viewport problem:
>> ----------------------------------------------
>> Does F12 rendering work (with resolutions that fail in the viewport) or is
>> it just viewport rendering? ...
>>>  http://www.pasteall.org/27116/diff
>> The dumped image is as bad as the viewport one. A note: the image is not
>> 'broken' right away. While the light calculation is still dirty, the image
>> is correct. Only after a few (set_tile() resolution > 8 or so) moments the
>> viewport breaks.
> Ok, so it's not an opengl issue.
>>> Another thing you could test is the workgroup_size in device_opencl.cpp in
>>> path_trace and tonemap.
>> workgroup_size is calculated as 16. If I change it to 1, 2, 4, 8, 16 it
>> produces the same result.
>> If I change it to 32 OpenCL throw an error on me (OpenCL error (-54):
>> Invalid work group size)
> Ok, so workgroup size is probably not the issue.
> Brecht.
> _______________________________________________
> Bf-cycles mailing list
> Bf-cycles at blender.org
> http://lists.blender.org/mailman/listinfo/bf-cycles


Jens Verwiebe
Allerskehre 44  -  22309 Hamburg

Tel.: +49 40 68 78 50
mobil: +49 172 400 49 07
mailto: info at jensverwiebe.de
web:  http://www.jensverwiebe.de

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.blender.org/pipermail/bf-cycles/attachments/20111207/a8711354/attachment.htm 

More information about the Bf-cycles mailing list