NoneScattering Volume Integretor

Discussion related to the implementation of new features & algorithms to the Core Engine.

Moderators: jromang, tomb, zcott, coordinators

NoneScattering Volume Integretor

Postby Dade » Mon Apr 30, 2012 8:16 am

I use LuxBall5 as a regular test scene and, while working on hybrid rendering, I have noticed that a huge amount of time was spent inside the SingleScattering volume integrator. This was surprising because LuxBall5 has not scattering at all. After a bit of digging in the sources I discover the 2 main causes:

1) if you have a "world" (i.e. empty space) volume defined it leads to a huge amount of computations (like if there was a scattering media) just to produce a 0.0 (i.e. nothing, nada, niente);
2) the expf() is widely used across SingleScattering/MultiScattering volume integrator and it is extremely slow. This is a know problem of expf() however it is even more annoying to wast time for computing expf(0.0) because the result is 1.0, no matter how you turn it.

You can partially avoid this problem by not defining a "world" volume however I think is handy to have NoneScattering volume integrator in order to have the best performance no matter how your scene is defined.

This is LuxBall5 rendered with Hybrid BiDir and SingleScattering:

luxball5-single.png


This is LuxBall5 rendered with Hybrid BiDir and NoneScattering:

luxball5-none.png


In this particular case, NoneScattering produces exactly the same output of SingleScattering but it is about a 40% faster :shock:

Even no hybrid rendering benefits from NoneScattering. This normal BiDir with SingleScattering:

luxball5-normal-single.png


This normal BiDir with NoneScattering:

luxball5-normal-none.png


Again, it is about a 40% faster.

Indeed, with NoneScattering, you loose the support for stuff like SSS and media scattering but by using NoneScattering you are sure to achieve max. performance if you don't need that features. As explained, you can observe a large speedup with NoneScattering only if you were defining a "world" volume.
User avatar
Dade
Developer
 
Posts: 4795
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: NoneScattering Volume Integretor

Postby Lord Crc » Mon Apr 30, 2012 8:34 am

Dade wrote:2) the expf() is widely used across SingleScattering/MultiScattering volume integrator and it is extremely slow.


I read in the Mitsuba release note that logf() and expf() was extremely slow on Linux x64. Mitsuba worked around it by using the double version of those calls and casting the result. Perhaps we should do the same?
May contain traces of nuts.
User avatar
Lord Crc
Developer
 
Posts: 4450
Joined: Sat Nov 17, 2007 2:10 pm

Re: NoneScattering Volume Integretor

Postby J the Ninja » Mon Apr 30, 2012 10:58 am

How is NoneScattering different from the existing "emission" volume integrator? I was under the impression that was what "emission" was for, as a no-scattering integrator.
-Jason

Material DB Admin
User avatar
J the Ninja
Developer
 
Posts: 2210
Joined: Wed May 19, 2010 9:54 pm
Location: Portland, USA

Re: NoneScattering Volume Integretor

Postby Dade » Mon Apr 30, 2012 1:37 pm

J the Ninja wrote:How is NoneScattering different from the existing "emission" volume integrator? I was under the impression that was what "emission" was for, as a no-scattering integrator.


"emission" suffers of the same 2 problems of "single" volume integrator:

1) if you define a "world" (i.e. air) volume, "emission" will be still very slow (only +10% faster than "single" while "none" is more than 40% faster, just try it);
2) missing the "expf(0.0)" optimization and using the very slow expf() in general.

In my opinion, we have to optimize the above cases across all volume integrators (the performance is simple too bad in this, very common, case). The "none" integrator is just temporary solution.

@LordCRC: Intel and AMD have published the code for doing 4 ways fast expf(), logf(), sinf(), etc. with SSE2 (perfect for the SWCSpectrum::Exp() and other SWCSpectrum methods). It is all the SWCSpectrum class to need some attention.
User avatar
Dade
Developer
 
Posts: 4795
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: NoneScattering Volume Integretor

Postby Lord Crc » Mon Apr 30, 2012 2:42 pm

Dade wrote:@LordCRC: Intel and AMD have published the code for doing 4 ways fast expf(), logf(), sinf(), etc. with SSE2 (perfect for the SWCSpectrum::Exp() and other SWCSpectrum methods). It is all the SWCSpectrum class to need some attention.


Even for unaligned data? Otherwise we'll have to ensure SWCSpectrum is 16 byte aligned.
May contain traces of nuts.
User avatar
Lord Crc
Developer
 
Posts: 4450
Joined: Sat Nov 17, 2007 2:10 pm

Re: NoneScattering Volume Integretor

Postby Dade » Tue May 01, 2012 7:55 am

Lord Crc wrote:
Dade wrote:@LordCRC: Intel and AMD have published the code for doing 4 ways fast expf(), logf(), sinf(), etc. with SSE2 (perfect for the SWCSpectrum::Exp() and other SWCSpectrum methods). It is all the SWCSpectrum class to need some attention.


Even for unaligned data? Otherwise we'll have to ensure SWCSpectrum is 16 byte aligned.


The code I was thinking to is available here: http://gruntthepeon.free.fr/ssemath/

The above functions take a __m128 argument so it supposed to be aligned if read from memory however we can write a small glue to work with any alignment. BTW, I think AVX has introduced some gather/scatter instruction for reading data with any alignment.
User avatar
Dade
Developer
 
Posts: 4795
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: NoneScattering Volume Integretor

Postby Lord Crc » Tue May 01, 2012 8:32 am

Dade wrote:The above functions take a __m128 argument so it supposed to be aligned if read from memory however we can write a small glue to work with any alignment.


Would indeed be interesting to do some performance tests on that.
May contain traces of nuts.
User avatar
Lord Crc
Developer
 
Posts: 4450
Joined: Sat Nov 17, 2007 2:10 pm

Re: NoneScattering Volume Integretor

Postby Dade » Tue May 01, 2012 1:16 pm

Lord Crc wrote:
Dade wrote:The above functions take a __m128 argument so it supposed to be aligned if read from memory however we can write a small glue to work with any alignment.


Would indeed be interesting to do some performance tests on that.


Pencil and paper are faster than Linux expf() for sure :lol:

But is it a Linux specific problem or does it happen also on Windows ?
User avatar
Dade
Developer
 
Posts: 4795
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: NoneScattering Volume Integretor

Postby Lord Crc » Tue May 01, 2012 4:34 pm

Dade wrote:But is it a Linux specific problem or does it happen also on Windows ?


Here are some numbers using the sse_mathfun_test program from above.

Windows (i7 2700k @ 3.8GHz):
Code: Select all
x86:
sinf .. ->   12.9 millions of vector evaluations/second
expf .. ->   17.8 millions of vector evaluations/second
sin_ps .. ->   48.5 millions of vector evaluations/second
exp_ps .. ->   38.4 millions of vector evaluations/second

x64:
sinf .. ->   30.6 millions of vector evaluations/second
expf .. ->   35.1 millions of vector evaluations/second
sin_ps .. ->   47.9 millions of vector evaluations/second
exp_ps .. ->   38.0 millions of vector evaluations/second


Linux (i7 860 @ 2.9GHz)
Code: Select all
x64:
sinf .. ->   10.9 millions of vector evaluations/second
expf .. ->    1.0 millions of vector evaluations/second
sin_ps .. ->   37.2 millions of vector evaluations/second
exp_ps .. ->   30.7 millions of vector evaluations/second


Notice the lackluster expf performance on Linux!

In any case, seems that there's a possibly nice improvement anyway, even on Windows. I think it's worthwhile to include this (and the glue) and see how it pans out. I think it's only worth using the SSE2 stuff (which was slightly faster on my end), and control it using a define so that SSE1 can fall back to plain expf etc.
May contain traces of nuts.
User avatar
Lord Crc
Developer
 
Posts: 4450
Joined: Sat Nov 17, 2007 2:10 pm

Re: NoneScattering Volume Integretor

Postby cwichura » Tue May 01, 2012 7:00 pm

Dade wrote:In my opinion, we have to optimize the above cases across all volume integrators (the performance is simple too bad in this, very common, case). The "none" integrator is just temporary solution.


If the use case for the none integrator is for when you only have a single world volume, why not recognize this situation after parsing the scene files and automatically substitute in the none logic, rather than requiring the creator of the scene file to know about it? Especially if it's temporary and may go away in the future. Adding it as a callable syntax now has a hefty support penalty to pay, in that if people start targeting it specifically, you will always have to provide legacy support for it going forward, even if the things that are making stuff slow (like the expf() call) are eventually improved upon.

If was writing an exporter (which I'm not... heh) and going to specifically target the none integrator, would I be better off just simply omitting the world volume definition instead and skipping none alltogether?
cwichura
 
Posts: 351
Joined: Sun Feb 12, 2012 11:31 pm

Next

Return to Architecture & Design

Who is online

Users browsing this forum: No registered users and 0 guests