LuxRender v0.8
LuxRender v0.8 introduced both the new concept of Renderer and the first implementation of an hybrid (i.e. CPU+GPU) Renderer. LuxRender v0.8 uses a RayBuffer of 8192 rays for each rendering thread and 8192 SurfaceIntegratorState to fill the RayBuffer with rays to send to the GPU. For some technical reason the memory required for storing 8192 SurfaceIntegratorState for each rendering thread is really HUGE.
Reducing the size of SurfaceIntegratorState class
The first and most obvious route I took was trying to reduce the memory footprint of SurfaceIntegratorState class. I reduced (with some effort) the size from 616 to 552 bytes. I soon realized this was a pretty pointless route: there is vast ramification of dependencies starting form SurfaceIntegratorState to Sample, BSDF, Sampler, SemplerData and more. In order to have a noticeable improvement, I would have to reduce the size of tons of classes.
Reducing the number of SurfaceIntegratorState used
So I took another route, something I was already using in SLG: I start with just a very small number of SurfaceIntegratorState (I start with 2 in SLG) and I increase the size only if they are unable to fill the RayBuffer. This can work because a SurfaceIntegratorState can produce multiple rays (i.e. one shadow rays and a ray to estimate the next path vertex) and you don't really need 8192 SurfaceIntegratorState to produce 8192 rays.
This was a successful route, the code raise the amount SurfaceIntegratorState needed up to about 4500 than stop because they are enough to fill the RayBuffer all the time. This was effectively cutting the amount of used ram in half.
If you run LuxRender with "-V" option, there is some DEBUG print with information of dynamic resize of the set of SurfaceIntegratorState:
- Code: Select all
[Lux 2011-Jun-17 22:40:50 DEBUG : 0] New allocated IntegratorStates: 1024 => 1152 [RayBuffer size = 16384]
[Lux 2011-Jun-17 22:40:50 DEBUG : 0] New allocated IntegratorStates: 1024 => 2176 [RayBuffer size = 16384]
Further reducing the number of SurfaceIntegratorState used
Given the success of this route I have further improved the code by adding the support for tracing multiple shadow rays and ALL_UNIFORM (and AUTO) light strategies. For instance, by using ALL_UNIFORM light strategy and just 4 shadow rays (instead of the default 1), I can further reduce the amount of SurfaceIntegratorState required, from ~4500 to ~1150 (however, please note, the amount of memory used by a single SurfaceIntegratorState is slightly increased in order to store the multiple shadow rays).
Using the saved memory to improve performances
Once I have saved a noticeable amount of ram, I have started to add options that permit to improve the performances by using more memory. The first one is "integer raybuffersize" and it permits to set the size of the RayBuffer to a number larger than 8192. This help to reduce the overhead of transmitting more buffers via PCIe bus, etc.I use 64k rays in SLG but this number doesn't fit LuxRender architecture, 8k/16k work pretty well on my hardware but your mileage may vary.
Second step was to add the "integer statebuffercount" option: it allows the use of multiple set of SurfaceIntegratorState to overlap the CPU and GPU work. This option is quite useful, using a value of 2 will help to improve the performance.
Let see the combined result of all the changes. I used a matte LuxBall5 with the following options as benchmark:
- Code: Select all
Renderer "hybrid"
"integer opencl.platform.index" [-1]
"string opencl.devices.select" ["100"]
"integer statebuffercount" [2]
"integer raybuffersize" [16000]
SurfaceIntegrator "path"
"integer maxdepth" [8]
"string lightstrategy" ["all"]
"integer shadowraycount" [4]
This is a rendering with standard path tracing (28.29k samples/sec 1.74M contribution/sec):
And this with the new hybrid code (43.62k samples/sec 2.84M contribution/sec):
It is about 60% faster with hybrid rendering.
Multi-GPUs support
I have also added the support for multiple GPUs and options to select which GPU to use. However this is not yet particularly useful because LuxRender has quite some problem feeding more than one GPU.
- Code: Select all
Renderer "hybrid"
"integer opencl.platform.index" [-1]
"string opencl.devices.select" ["100"]
Hybrid BiDir
The good news is that hybrid BiDir is going to be even better that hybrid Path tracing in term of number of rays generated by each single state (i.e. not only multiple shadow rays but also all the rays to connect eye and light path).
