Parallel API

Discussion related to the implementation of new features & algorithms to the Core Engine.

Moderators: jromang, tomb, zcott, coordinators

Parallel API

Postby guibou » Tue May 22, 2012 10:20 am

I've just pushed a new task api in luxrender for SPPM purpose. (commit 2d0b3ce1a6f5)

I think it can be used elsewhere in luxrender, everyplace where there is a for loop which is not parallelized (eg. during warmup).

I saw two libraries which are able to do this kind of tasks. OpenMP which does not work on Clang or Intel TBB which works well on the compilers we supports (VS, GCC, clang). The issue with TBB is that it does not allow task pausing and changing the number of threads at runtime. Hence the home-made implementation.

But do we need those features?

I think that changing the amount of used threads is not necessary. There is three use cases:

- User want to stop/pause the rendering because they want to totally free their system resources, this can be done through a system pause (SIGSTOP on Unix, the same may exists on Windows). Only requirement is that the rendering threads stays in a different process than the one doing the GUI.
- User want to do the rendering so full CPU power is to be used. There is no reason to use less threads than what the hardware provides. (In fact there may be some reasons, but users generally don't know them, when an API like TBB is able to adapt the number of threads to the system load).
- User want to do the rendering but giving more priority to another task in their system. It is a common scheduling/priority issue handled by all the operating system we target.

If we agree on those points, the LuxRender code can be dramatically simplified by using an API such as TBB.

What's your point on this ?
guibou
Developer
 
Posts: 271
Joined: Fri Dec 04, 2009 10:14 am

Re: Parallel API

Postby SATtva » Tue May 22, 2012 10:46 am

Sometimes i do use thread number controls on my workstation when i need to do some other multithread task in parallel (compiling for example), but i agree -- the same result can be achieved with task prioritization using OS tools. However there is one use case when thread controls or pause/resume is really needed: when you're rendering a hires image, lens effects and tonemapping operations can be made faster by pausing a rendering or reducing the threads count by 1. Could this be somehow accounted for? Except for that [minor] issue i have no objections to API simplification.
Linux builds packager
聞くのは一時の恥、聞かぬのは一生の恥
User avatar
SATtva
Developer
 
Posts: 5546
Joined: Tue Apr 07, 2009 12:19 pm
Location: from Siberia with love

Re: Parallel API

Postby cwichura » Tue May 22, 2012 10:46 pm

Since this is a thread pool to parallelize for() loops, how many places are there that this can come into play? Lux is already creating multiple render threads, so you wouldn't necessarily want to use additional sub-threads within the existing render threads, right? So it strikes me that it's benefit will mostly be limited to startup, which isn't really a big win, unless you want to do a major rewrite of Lux's threading model and get rid of the current render thread paradigm. Granted, it can't hurt to add to startup. But I'm curious what other places would benefit from being re-written to use this.

I will also add that I regularly make use of reducing thread counts in LuxRender. E.g., when I'm testing scene setups, I sometimes have two or three instances going each working on its own test, and reduce the thread counts on each so they don't thrash the cores. They are all of equal priority (which I always set to "low" from an OS perspective so I can keep working in other apps), so just changing their process priorities doesn't work in this case.
cwichura
 
Posts: 375
Joined: Sun Feb 12, 2012 11:31 pm

Re: Parallel API

Postby cwichura » Tue May 22, 2012 11:21 pm

More thoughts:

1) If you intend to convert the threading model in Lux entirely to one of this approaches, that strikes me as a lot of work and something that should probably wait until after 1.0 is released. I do see the merit in it, though. It allows doing parallelization at a much more granular level in the code, which should theoretically allow for better overall maximized concurrency.

2) Does Intel's TBB share the thread pool across all processes using TBB, like Grand Central Dispatch does? Or does each process get its own thread pool? If the thread pool is shared across the entire system, then I am less worried about having user control over the number of threads allocated, since multiple Lux instances will automatically distribute their workload across the available system resources in a fair manner. But if the thread pool is allocated per-process, then you could easily over-subscribe the system by running multiple Lux instances.

Wikipedia indicates there are Unix and Windows ports of Grand Central Dispatch available (though I don't know if they're actually any good or not). So if evaluating different thread pool managers, it wouldn't hurt to evaluate GCD as well. (And I say this as someone running Lux on Windows and Linux, not Apple.)
cwichura
 
Posts: 375
Joined: Sun Feb 12, 2012 11:31 pm

Re: Parallel API

Postby jeanphi » Wed May 23, 2012 2:17 am

Hi,

A thread pool API is mostly useful if we completely move to a data parallel architecture. This is the case for hybrid rendering but most of Lux is mostly independent parallel sequences. That's a huge change and I don't know what can come out of it. I think a data parallel architecture might be more difficult to understand and thus maintain.

Jeanphi
jeanphi
Developer
 
Posts: 6624
Joined: Mon Jan 14, 2008 7:21 am

Re: Parallel API

Postby guibou » Wed May 23, 2012 8:00 am

jeanphi wrote:A thread pool API is mostly useful if we completely move to a data parallel architecture. This is the case for hybrid rendering but most of Lux is mostly independent parallel sequences. That's a huge change and I don't know what can come out of it. I think a data parallel architecture might be more difficult to understand and thus maintain.


I'm not planning on rewriting everything, but simple discussion. I had hit the limits of "independant parallel sequences" in SPPM because of the differents passes and all the synchronization involved.

Usually a data parallel architecture is simpler. You can take a look on the new SPPM code, but it is strait forward, it looks like single threaded code, except that the functions with a for are called differently. But you've got the point in the fact that I'm not sure how this approach can be applied to the current integrators code.

Since this is a thread pool to parallelize for() loops, how many places are there that this can come into play?


Weird integrators with multiples passes such as SPPM, hybrid Bidir, a future merge between bidir and SPPM. The current model was not wise at all with SPPM because it leads to important CPU usage drop. Also in thoses case the code is simpler because the synchronization points appear only in the API but does not clutter the code (please, take a count of the barrier->wait() which was in the previous SPPM code to get an idea. Any of them where important and any lacks may have leads to weird behaviors.

Here we also have some really huges scenes composed of really big models which tooks time to load, when the loading of any ply files can be done in parallel, on the "war machine" we have here, it may gives our users the feedback about their scene in 10 seconds instead of twos minutes.

We also use internally lot of photons map to replace lights (I should one day commit that stuff in main luxrender repository). Each photon map can be composed of millions of photons and each photons map must be scaled, projected back, inserted inside a data structure.

Some scenes here takes 10 minutes to load, when on the bi-xeon machines with 16 threads we have, it may only take 40 seconds if a parallel_for is used.

It is why I initially take a look on TBB an so on and decided to work on improving SPPM, but the main ideas behind is accelerating the loading process.

Lux is already creating multiple render threads, so you wouldn't necessarily want to use additional sub-threads within the existing render threads, right? So it strikes me that it's benefit will mostly be limited to startup, which isn't really a big win, unless you want to do a major rewrite of Lux's threading model and get rid of the current render thread paradigm. Granted, it can't hurt to add to startup. But I'm curious what other places would benefit from being re-written to use this.


My points was not to rewrite lux NOW, but to ask a few questions about what's important for lux. To be honest, if the answer of this discussion is "We don't care at all about changing the number of threads", I just change my code in SPPM to use TBB or GCD instead. It will go quicker and be less simple to use.

Also my points was to discuss which way to go for the future of the parallel API. Developers may be more interested in using a parallel_for in their code during warmup if one is available in the lux API.

[quote]
I will also add that I regularly make use of reducing thread counts in LuxRender. E.g., when I'm testing scene setups, I sometimes have two or three instances going each working on its own test, and reduce the thread counts on each so they don't thrash the cores. They are all of equal priority (which I always set to "low" from an OS perspective so I can keep working in other apps), so just changing their process priorities doesn't work in this case.
[/code]

As far as I know thoses API main purpose (such as TBB and GCD) is to be smart about the usage of thread. I saw that TBB does not launch new thread if I use a parallel_for inside a thread, and I have read in the documentation that TBB scale itself to the system load (but not read how).
guibou
Developer
 
Posts: 271
Joined: Fri Dec 04, 2009 10:14 am

Re: Parallel API

Postby jensverwiebe » Wed May 23, 2012 8:48 am

For clarification: on OSX i have no problems with scheduling, so i don´t see a need for changes in rendering in general here ( i don´t speak for other OS ), but for things like parsing, setting up photonmap
and loading scenedata, a speedup due more parallel tasks is for sure welcome.
So having a generalized parallel API is a good step.

Also i would welcome using ( testing it initially ) GCD, for it is implemented in OSX and just needs an include and you're done ( aside from coding queues indeed ;) )
I can´t speak for other OS here again, dunno how far libdcd is exposed crossplatform.

I just can say my experiences with openMP are too bad expecially on mashines with lot of cores. This wouldn´t be worth the efford IMHO ( and is not supported well by other than gcc > 4.4 afaik )

my 2 cent.... Jens
User avatar
jensverwiebe
Developer
 
Posts: 2130
Joined: Wed Apr 02, 2008 4:34 pm

Re: Parallel API

Postby guibou » Wed May 23, 2012 9:56 am

jensverwiebe wrote:For clarification: on OSX i have no problems with scheduling, so i don´t see a need for changes in rendering in general here ( i don´t speak for other OS ), but for things like parsing, setting up photonmap
and loading scenedata, a speedup due more parallel tasks is for sure welcome.
So having a generalized parallel API is a good step.

Also i would welcome using ( testing it initially ) GCD, for it is implemented in OSX and just needs an include and you're done ( aside from coding queues indeed ;) )
I can´t speak for other OS here again, dunno how far libdcd is exposed crossplatform.

I just can say my experiences with openMP are too bad expecially on mashines with lot of cores. This wouldn´t be worth the efford IMHO ( and is not supported well by other than gcc > 4.4 afaik )

my 2 cent.... Jens


Agree for OpenMP. I should take on look en GCD. For scheduling issues, test a high resolution scene with SPPM before my last commit, something with a plane everywhere and a really complexe object in the corner. You may see that your thread will just stop working for an important part of the time. if only 10% of your thread are used during 15 secondes every minutes, it means that you loose a lot of computing power.
guibou
Developer
 
Posts: 271
Joined: Fri Dec 04, 2009 10:14 am

Re: Parallel API

Postby jeanphi » Wed May 23, 2012 12:35 pm

Hi,

Let's try the parallel API, there are obvious use cases for it, and we could even try to use it for more stuff (like the imaging pipeline). Doesn't TBB allow thread control by tweaking the scheduler?

Jeanphi
jeanphi
Developer
 
Posts: 6624
Joined: Mon Jan 14, 2008 7:21 am

Re: Parallel API

Postby cwichura » Wed May 23, 2012 12:40 pm

jeanphi wrote:Doesn't TBB allow thread control by tweaking the scheduler?

I've been reading through the docs for TBB, and the only thing I can find is an optional parameter to override the number of worker threads it spins up, with a strong admonishment saying "you really shouldn't override this, though". I can find no documentation on how it "plays nice" with system loads. The only real answer will come from looking through their thread management source code, but I suspect they are just spinning up one thread for each execution unit in the box.
cwichura
 
Posts: 375
Joined: Sun Feb 12, 2012 11:31 pm

Next

Return to Architecture & Design

Who is online

Users browsing this forum: No registered users and 1 guest