[Bf-committers] VSE Strip-Wise Rendering

Leo Sutic leo.sutic at gmail.com
Wed Sep 29 09:59:51 CEST 2010


Hi Peter,

On 2010-09-28 20:39, Peter Schlaile wrote:
> Hi Leo,
> 
>> Looking at the code for the VSE it appears solid, but not very modular,
>> nor suitable for effects that need access to more than the current
>> frame. Since the tools I have fall into that category ? the anti-shake,
>> for example, needs to compute the optical flow for each pair of frames ?
>> it is currently near-impossible to port them over in a way that would
>> give a good user experience or remain modular enough to be maintainable.
> 
> Problem is: the idea behind the VSE is, that it should try to do most / 
> all things in realtime.
> 
> That doesn't alter the fact, that we need optical flow, so my idea was:
> add a optical flow builder, similar to the proxy builder in 2.49 and link 
> the generated optical flow files to the strips.
> 
> That makes it possible to:
> 
> a) use optical flow files generated by other software (like icarus
>     tracker)
> b) use optical flow information from scene files or even Open EXR-files
>     (I'd think, the vector pass together with the Z-pass could be used for
>     that)
> c) let the optical flow information be calculated in the background,
>     when none is available and reuse it later for realtime display.
> 

I view optical flow much like an alpha channel - it is something that
goes with the image data. Certainly we could use flow data from external
sources, but more about this below.

>>    for each frame:
>>        for each strip:
>>            render
>>        composite
>>
>> gets turned into:
>>
>>    for each strip:
>>        for each frame:
>>            render
>>    composite
> 
> I don't really know, how you want to do that in realtime. But maybe I got 
> you wrong.

Reading your comments, yes, I think I must have failed to explain what I
was trying to. Of course, we will only ever render those frames that are
absolutely necessary to produce the requested output frames. We will
also cache intermediate and final results.

> If you want to display one arbitrary frame in the middle of a Sequencer 
> Editing, what exactly does your code actually do?

Short answer: Ignoring caching for the sake of argument, we render that
frame and its dependent frames. Nothing more.

Let me illustrate the difference. Suppose we have two strips, A and B,
both being movie clips (a.avi and b.avi, for example), that are
composited to produce the final output.

Case 1: Render a single frame, say frame 10.

Here both methods work the same. We render:

 1. A 10
 2. B 10
 3. Final output frame from A 10 and B 10

Case 2: Render multiple frames, say frames 10-20.

This is where it gets different. Currently, we'd render:

 1. A 10
 2. B 10
 3. Final 10
 4. A 11
 5. B 11
 6. Final 11
 ...
 n-2. A 20
 n-1. B 20
 n. Final 20

I want to invert that so we render:

 1. A 10
 2. A 11
 3. A 12
 ...
 10. A 20
 11. B 10
 12. B 11
 ...
 20. B 20
 21. Final 10
 22. Final 11
 ...
 30. Final 20

Now, suppose we have one thousand strips with one thousand frames each
to render. In this case, rendering one million frames and then producing
final output from them would be crazy. So I would also allow the system
to break up the processing into smaller chunks. Using the same example
as above, we could:

 1. Render A10-A15
 2. Render B10-B15
 3. Render Final 10-Final 15
 4. Render A16-A20
 5. Render B16-B20
 6. Render Final 16-Final 20

Of course, we could use a chunk size of 1 and then we'd be back to the
way things work today, but having larger chunk sizes allows us to
amortize a fixed chunk cost over several output frames. For example,
when computing optical flow, you need two frames to produce OF for one
frame. That's twice the amount of frames. But to produce OF for two
frames, you need just three input frames, not four. For ten OF frames,
you need eleven input frames. In general, to produce N output frames,
you need N+1 input frames. That cost for that final frame is amortized
over the other N frames, so we want N, the chunk size, to be as big as
possible.

I hope that explains what I want to do.

> My understanding of your idea is currently: I'd have to render everything 
> from the beginning and that sounds, uhm, sloooow? :)

Yeah, that would suck. But we don't have to do that any more than we do
now. That is, not at all.

So what is the real gain with the kind of processing that I propose?

Your render/bake pass solves one problem - it gives the plugin access to
more than the current frame. But it also introduces problems:

 1. Every time a parameter is changed, the bake/render must re-run.
 2. When it runs, it processes, as you say, the entire strip.
 3. So you:
    a. don't get any real-time processing.
    b. can't split the processing over several nodes in the render farm.

What I propose is basically that we allow the render/bake to process
arbitrary parts of the strip, and we provide a way for the r/b process
to specify the minimum it needs to process in order to produce the
requested chunk of baked data.

Then we can

 1. Split it over several nodes.

 2. Run it in real time, on parameter changes etc.

I would then further argue that with these modifications, the
render/bake process is nothing but a VSE plugin. Which brings me to my
motivation for wanting to do heavy surgery on the VSE.

My experience with development is that the only way to get any kind of
velocity is to architect the system as a spine of connecting code and
then plug in functionality as modules. Of course, this is a
philosophical ideal - in reality, products don't separate cleanly into
modules all the time. The dream of "everything being modular" ignores
some hard facts about software development.

However, media processing is an area where plug-in architectures have
been very successful. It has proven to be a scalable and very efficient
way to add features to the host application.

Yet Blender isn't very modular. In Blender, the speed control strip, for
example, doesn't do any speed control - that is done in the sequencer,
in a special-cased if statement inside the render function. I don't see
how this can work in the long run, in terms of growing the code and
feature set.

First: Today it is optical flow, and yes, we can solve it the way you
propose. But what will we face tomorrow? Is it something that we
absolutely want to do as part of Blender, or would we rather offload it
on someone else and then just ship the plugin along with Blender?

Second: I think Blender's job is primarily to synchronize and provide a
runtime for video processing plugins locally or in a cluster. Not to
actually *do* effects. That is better handled via plugins. Blender
should provide an "optical flow" channel for images, much like it has an
alpha channel - this is where OF can be stored and where it can be read
from. But I don't think Blender should generate OF data from, for
example, videos. It can generate it as part of the 3D render, but that
is because it is a lot easier there.

Third: Even if we don't support dynamically loadable plugins, we need
clean internal interfaces that allow access to sequences of images, not
just single frames.

I realize that Blender is 15 years old, very complex, and that it's a
lot more to it than just to waltz in and doodle up a plugin system. But
I think it is necessary to try. Like I said, I will develop this in my
own branch and unless I succeed, you won't hear about it. But I would
like to gauge the interest for such a modification, because if I
succeed, I do want my code merged back into the trunk. Forking, or
maintaining my own build of Blender, is out of the question.

> Regarding the tools you have written, do you thing, that adding per
effect
> strip render/bake would solve your problems? (It could be done in such a
> way, that the bake function could request arbitrary frames from it's
input
> track.)

It would work, but I'd lose the real-time feedback and great UI I was
hoping for, making Blender a lot less attractive as a way of sharing my
code in a way that is useful for the target audience (artists).

/LS

>> This way, we could do frame rate conversion naturally. We could do
>> speedup/slowdown, interpolation, anti-shake, and everything easily.
>> Effects that only require access to the current frame would still work
>> as a kernel inside a strip.
> 
> Since the common base here is optical flow, I'd think, it is better, to 
> generate optical flow files and use them with the current design.
> 
> Anti-Shake or motion tracking sound like tools, that should run within a 
> seperate background rendering process. We could add something to the 
> interface, that enables an effect track to have a custom render/bake run. 
> Like: please render/bake motion tracking data into fcurves (which will 
> feed the entire strip into the effect bake function only once and we use the 
> fcurves later for actual frame translation and rotation.).
> 
> Since I have to rewrite proxy rendering for 2.5 anyways, we could add 
> something like that, too. (The 2.49 proxy builder didn't run in background 
> and was more or less a hack.)
> 

> 
> Cheers,
> Peter
> 
> --
> Peter Schlaile
> _______________________________________________
> Bf-committers mailing list
> Bf-committers at blender.org
> http://lists.blender.org/mailman/listinfo/bf-committers
> 



More information about the Bf-committers mailing list