[Bf-committers] texture painting blend mode patch

Xavier Thomas xavier.thomas.1980 at gmail.com
Thu Nov 1 16:20:56 CET 2012


>
>
> > For the float version of the blend funtion SSE2 will certainly provide a
> 20
> > to 40% gain, and (depending on the compilator and the images used) you
> > might want to avoid code like this:
> >
> > if(test)
> >     temp=big float calculation;
> > else
> >     temp=other big float calc;
> >
> > which is not pipeline friendly, and prefer:
> >
> > temp1 = big float calculation;
> > temp2 = other big float calc;
> > temp = test ? temp1 : temp2;
>
> I'm sorry if this is off-topic for the mailing list, but I'm really
> curious as to how the second method can be faster, as it seems really
> unintuitive. Is this something to keep in mind for specific kinds of
> operations/ data types or just a good rule of thumb in general?
>

 Floating point number calculations takes a lot of processor cycles. To
compensate, modern processor execute instructions "in pipeline",  starting
an instruction every CPU cycle (before the previous instruction
is finished). This does not work very well with conditional branching,
specially the branching that cannot be predicted reliably because it
trigger a pipeline flush with is very slow.

It is absolutely not a good rule of thumb it depend greatly on your
compiler, compiler options, processor, the  input data (images). All modern
processor uses this pipe-lining feature but sometimes the big float
calculation can be more expensive than a pipeline flush anyway, or the same
branch is almost always taken so branch prediction does its job well. The
first thing to check would be that the compiler (in
release/optimized builds) optimize this line:

temp = test ? temp1 : temp2;

And use some arithmetic trick  instead of a conditional jump, a little
assembly knowledge is required but you might also use a macro to force your
own trick.

I think the trick would look like this:

test_bool = (test==0);
temp = test_bool * temp1 + (1-test_bool) * temp2;


This article explains it quite in details,  except it is for PPC which uses
the fsel instruction instead of an arithmetic trick:
http://www.altdevblogaday.com/2011/11/10/optimisation_lessons/

You are right it is out of topic, sorry for that.I responded anyway because
I think it is something that might interest peoples in this list.


Xavier


More information about the Bf-committers mailing list