NoneScattering Volume Integretor

Discussion related to the implementation of new features & algorithms to the Core Engine.

Moderators: jromang, tomb, zcott, coordinators

NoneScattering Volume Integretor

I use LuxBall5 as a regular test scene and, while working on hybrid rendering, I have noticed that a huge amount of time was spent inside the SingleScattering volume integrator. This was surprising because LuxBall5 has not scattering at all. After a bit of digging in the sources I discover the 2 main causes:

1) if you have a "world" (i.e. empty space) volume defined it leads to a huge amount of computations (like if there was a scattering media) just to produce a 0.0 (i.e. nothing, nada, niente);
2) the expf() is widely used across SingleScattering/MultiScattering volume integrator and it is extremely slow. This is a know problem of expf() however it is even more annoying to wast time for computing expf(0.0) because the result is 1.0, no matter how you turn it.

You can partially avoid this problem by not defining a "world" volume however I think is handy to have NoneScattering volume integrator in order to have the best performance no matter how your scene is defined.

This is LuxBall5 rendered with Hybrid BiDir and SingleScattering:

This is LuxBall5 rendered with Hybrid BiDir and NoneScattering:

In this particular case, NoneScattering produces exactly the same output of SingleScattering but it is about a 40% faster

Even no hybrid rendering benefits from NoneScattering. This normal BiDir with SingleScattering:

This normal BiDir with NoneScattering:

Again, it is about a 40% faster.

Indeed, with NoneScattering, you loose the support for stuff like SSS and media scattering but by using NoneScattering you are sure to achieve max. performance if you don't need that features. As explained, you can observe a large speedup with NoneScattering only if you were defining a "world" volume.

Posts: 4854
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: NoneScattering Volume Integretor

Dade wrote:2) the expf() is widely used across SingleScattering/MultiScattering volume integrator and it is extremely slow.

I read in the Mitsuba release note that logf() and expf() was extremely slow on Linux x64. Mitsuba worked around it by using the double version of those calls and casting the result. Perhaps we should do the same?
May contain traces of nuts.

Lord Crc

Posts: 4518
Joined: Sat Nov 17, 2007 2:10 pm

Re: NoneScattering Volume Integretor

How is NoneScattering different from the existing "emission" volume integrator? I was under the impression that was what "emission" was for, as a no-scattering integrator.
-Jason

J the Ninja

Posts: 2250
Joined: Wed May 19, 2010 9:54 pm
Location: Portland, USA

Re: NoneScattering Volume Integretor

J the Ninja wrote:How is NoneScattering different from the existing "emission" volume integrator? I was under the impression that was what "emission" was for, as a no-scattering integrator.

"emission" suffers of the same 2 problems of "single" volume integrator:

1) if you define a "world" (i.e. air) volume, "emission" will be still very slow (only +10% faster than "single" while "none" is more than 40% faster, just try it);
2) missing the "expf(0.0)" optimization and using the very slow expf() in general.

In my opinion, we have to optimize the above cases across all volume integrators (the performance is simple too bad in this, very common, case). The "none" integrator is just temporary solution.

@LordCRC: Intel and AMD have published the code for doing 4 ways fast expf(), logf(), sinf(), etc. with SSE2 (perfect for the SWCSpectrum::Exp() and other SWCSpectrum methods). It is all the SWCSpectrum class to need some attention.

Posts: 4854
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: NoneScattering Volume Integretor

Dade wrote:@LordCRC: Intel and AMD have published the code for doing 4 ways fast expf(), logf(), sinf(), etc. with SSE2 (perfect for the SWCSpectrum::Exp() and other SWCSpectrum methods). It is all the SWCSpectrum class to need some attention.

Even for unaligned data? Otherwise we'll have to ensure SWCSpectrum is 16 byte aligned.
May contain traces of nuts.

Lord Crc

Posts: 4518
Joined: Sat Nov 17, 2007 2:10 pm

Re: NoneScattering Volume Integretor

Lord Crc wrote:
Dade wrote:@LordCRC: Intel and AMD have published the code for doing 4 ways fast expf(), logf(), sinf(), etc. with SSE2 (perfect for the SWCSpectrum::Exp() and other SWCSpectrum methods). It is all the SWCSpectrum class to need some attention.

Even for unaligned data? Otherwise we'll have to ensure SWCSpectrum is 16 byte aligned.

The code I was thinking to is available here: http://gruntthepeon.free.fr/ssemath/

The above functions take a __m128 argument so it supposed to be aligned if read from memory however we can write a small glue to work with any alignment. BTW, I think AVX has introduced some gather/scatter instruction for reading data with any alignment.

Posts: 4854
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: NoneScattering Volume Integretor

Dade wrote:The above functions take a __m128 argument so it supposed to be aligned if read from memory however we can write a small glue to work with any alignment.

Would indeed be interesting to do some performance tests on that.
May contain traces of nuts.

Lord Crc

Posts: 4518
Joined: Sat Nov 17, 2007 2:10 pm

Re: NoneScattering Volume Integretor

Lord Crc wrote:
Dade wrote:The above functions take a __m128 argument so it supposed to be aligned if read from memory however we can write a small glue to work with any alignment.

Would indeed be interesting to do some performance tests on that.

Pencil and paper are faster than Linux expf() for sure

But is it a Linux specific problem or does it happen also on Windows ?

Posts: 4854
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: NoneScattering Volume Integretor

Dade wrote:But is it a Linux specific problem or does it happen also on Windows ?

Here are some numbers using the sse_mathfun_test program from above.

Windows (i7 2700k @ 3.8GHz):
Code: Select all
x86: sinf .. ->   12.9 millions of vector evaluations/secondexpf .. ->   17.8 millions of vector evaluations/secondsin_ps .. ->   48.5 millions of vector evaluations/secondexp_ps .. ->   38.4 millions of vector evaluations/secondx64:sinf .. ->   30.6 millions of vector evaluations/second expf .. ->   35.1 millions of vector evaluations/secondsin_ps .. ->   47.9 millions of vector evaluations/secondexp_ps .. ->   38.0 millions of vector evaluations/second

Linux (i7 860 @ 2.9GHz)
Code: Select all
x64:sinf .. ->   10.9 millions of vector evaluations/secondexpf .. ->    1.0 millions of vector evaluations/secondsin_ps .. ->   37.2 millions of vector evaluations/secondexp_ps .. ->   30.7 millions of vector evaluations/second

Notice the lackluster expf performance on Linux!

In any case, seems that there's a possibly nice improvement anyway, even on Windows. I think it's worthwhile to include this (and the glue) and see how it pans out. I think it's only worth using the SSE2 stuff (which was slightly faster on my end), and control it using a define so that SSE1 can fall back to plain expf etc.
May contain traces of nuts.

Lord Crc

Posts: 4518
Joined: Sat Nov 17, 2007 2:10 pm

Re: NoneScattering Volume Integretor

Dade wrote:In my opinion, we have to optimize the above cases across all volume integrators (the performance is simple too bad in this, very common, case). The "none" integrator is just temporary solution.

If the use case for the none integrator is for when you only have a single world volume, why not recognize this situation after parsing the scene files and automatically substitute in the none logic, rather than requiring the creator of the scene file to know about it? Especially if it's temporary and may go away in the future. Adding it as a callable syntax now has a hefty support penalty to pay, in that if people start targeting it specifically, you will always have to provide legacy support for it going forward, even if the things that are making stuff slow (like the expf() call) are eventually improved upon.

If was writing an exporter (which I'm not... heh) and going to specifically target the none integrator, would I be better off just simply omitting the world volume definition instead and skipping none alltogether?
cwichura

Posts: 375
Joined: Sun Feb 12, 2012 11:31 pm

Next