Luxrender and OpenCL
From LuxRender Wiki
This is an older page. If you are looking for a user guide for GPU rendering in LuxRender, please see this page: http://www.luxrender.net/wiki/GPU
This is a brief recap of all arguments, results, video, screenshots, etc. discussed in this thread. This page is about the progress in the process to introduce the support for OpenCL in Luxrender.
What are GPGPUs ?
Quoted from www.gpgpu.org:
GPGPU stands for General-Purpose computation on Graphics Processing Units, also known as GPU Computing. Graphics Processing Units (GPUs) are high-performance many-core processors capable of very high computation and data throughput. Once specially designed for computer graphics and difficult to program, today’s GPUs are general-purpose parallel processors with support for accessible programming interfaces and industry-standard languages such as C. Developers who port their applications to GPUs often achieve speedups of orders of magnitude vs. optimized CPU implementations.
Indeed GPGPUs are going to be a very useful tool for Luxrender.
What is OpenCL ?
Quoted from www.khronos.org:
OpenCL™ is the first open, royalty-free standard for cross-platform, parallel programming of modern processors found in personal computers, servers and handheld/embedded devices. OpenCL (Open Computing Language) greatly improves speed and responsiveness for a wide spectrum of applications in numerous market categories from gaming and entertainment to scientific and medical software.
Luxrender developer team want to continue to support a wide range of platforms and OS (i.e. Linux, Window, MacOS, ATI GPGPUS, NVIDIA GPUs, etc.). Before OpenCL, each vendor has its own (i.e. proprietary) set of tools to develop GPGPUs application: NVIDIA CUDA, ATI Stream SDK, etc. OpenCL solves this problem and has been seen by Luxrender developer team as the way to go since the release of the first specifications.
Introducing OpenCL in Luxrender
This is a list of the steps done up to now to introduce OpenCL support in Luxrender. Indeed it is a quite complex task that requires a lot of tests, experiments and steps.
ATI released OpenCL beta SDK with hardware support
The first step was the ATI release of a OpenCL beta SDK in October with the support for GPGPUs in HD4xxx and new HD5xxx generation. This allowed Luxrender developers to do the first tests.
The first test: MandelCPU Vs MandelGPU
MandelGPU has been the very first test written by Dade to compare the performances obtainable with GPGPUs and CPU.
There result were quite impressive: MandelGPU was 62 time faster than MandelCPU.
The second test: checking ray tracing performances with SmallptCPU Vs SmallptGPU
SmallptGPU has been developed by Dade in order to check the kind of performances obtainable in our main field of interest with GPGPUs.
SmallptGPU is a porting of original Kevin Beason's  to OpenCL. The performances were again quite impressive: SmallptGPU was about 10 time faster than the SmallptCPU. A video of SmallptGPU can be find here or here. It has attracted a noticeable amount of interest and has been published on the front page of Khronos group too. It has been used as benchmark at Beyond3D, XtremeSystems and in other websites. Some really impressive results have been achieved by running SmallptGPU on an overclocked ATI HD5870 by Lightman:
This is about 45 time faster than SmallptCPU running on an Intel Q6600. Talonman has achieved some really high number of samples (on a very simple scene however) by using one of the 2 GPUs available with a NVIDIA GTX 295:
114,000,000 of samples/sec is really a high number.
SmallptGPU in Direct Lighting mode
SmallptGPU is a path tracer but other rendering modes (like in Luxrender) can be easily introduced by using a different OpenCL kernel, this is SmallptGPU running in direct lighting mode (i.e. classic Whitted-like ray tracing):
Indeed it is really fast.
SmallptGPU Vs Luxrender
While it is not fair to compare SmallptGPU samples with the one generate by Luxrender, it is still an interesting test. This is exactly the same scene, rendered with Luxrender for 60 seconds with Metropolis sampler and Path surface integrator:
The same scene rendered for 60 seconds with SmallptGPU:
SmallptGPU on Linux, Windows and MacOS
One of the most interesting feature of the OpenCL is that they are a cross-platform standard. Dade has successful run SmallptGPU under Linux and Windows. Jens has been able to test and run SmallptGPU on MacOS:
Some more experience with OpenCL
Some more experience with OpenCL has been gained by working on 2 more simple OpenCL ray tracers. JuliaGPU:
SmallptGPU2 sources and binaries are available here. SmallptGPU2 was used as an experiment to evaluate how the OpenCL support for multiple device (i.e. multiple GPUs) works. This is a screenshot of SmallptGPU2 running on GPU OpenCL device + CPU OpenCL device:
NOTE: SmallLuxGPU is now part of LuxRays. It is included there as a demo of the features available.
SmallptGPU has been an interesting experiment but it represents an approach hardly replicable in Luxrender: all the rendering is is done on the GPGPU. Porting 300,000 lines of code to OpenCL is not possible (i.e. OpenCL kernels are usually few lines long, in the order of hundred of lines, not thousands). SmallLuxGPU has been instead a great step forward in the process to introduce OpenCL in Luxrender because it introduces an approach replicable in Luxrender.
The idea is to use the GPGPUs only for ray intersections in order to minimize the amount of the brand new code to write and to not loose any of the functionality already available in Luxrender. In order to test this idea, Dade wrote a very simplified path tracer and ported Luxrender's BVH accelerator to OpenCL. The path integrator has the particularity to work on a huge set (i.e. 300,000+) of paths at the same time in order to generate a large amount of rays to trace and to keep the GPGPU feed.
SmallLuxGPU has shown good results from the first version, clearly outperforming CPU-only rendering:
Version 1.1 has introduced the support for multiple OpenCL device and for native thread base rendering. This is SmallLuxGPU v1.1 running on Dade's i7 860 + HD5870 with 8 native threads and 1 GPU:
This is SmallLuxGPU v1.1 running on Jens's MacPro:
And this is SmallLuxGPU v1.1 running on Talonman's Q6600 with a NVIDIA 295 GTX and a 280 GTX (for a total of 3xGPUs):
- support for multiple OpenCL devices;
- support for native threads;
- new thread and double buffer architecture for devices;
- "low latency" and "high bandwidth" rendering modes;
Command line arguments
Follow a brief description of SmallLuxGPU command line arguments:
- can be 0 (high bandwidth mode) or 1 (low latency mode). It selects the rendering mode, "low latency" is intended for fast response to user inputs while "high bandwidth" is for achieving high number of samples/sec (very slow to respond user inputs);
- number of native rendering threads to spawn (they should be used to consume any spare cycle left by GPU host threads);
- 1 to use all OpenCL CPU devices if available, 0 to skip them;
- 1 to use all OpenCL GPU devices if available, 0 to skip them;
- used force GPUs workgroup size. 0 means use the default value;
- width of the rendering window;
- heigh of the rendering window;
- name of the scene file.
Rendering 2,700,000+ triangles in less than 256MB of GPU RAM
SmallLuxGPU has successful rendered in near real-time a scene with more than 2,700,000 triangles:
SmallLuxGPU and BulletPhysics
A new Chiaroscuro's animation is available here. He as posted also some beautiful still rendering:
- added on screen statistic for OpenCL device load average;
- added batch mode for rendering images;
- added benchmark mode;
- optimized memory access pattern (about 2 times faster on NVIDIA GPUs);
- more stable samples/sec statistic;
- added support for normal interpolation;
- optimization for no visible lights (about 2 times faster on loft.scn);
- added support for vertex colours interpolation;
- added support for configuration file;
- added support for OpenCL platform and devices selection via configuration file;
- new surface integrator architecture, it is able to generate 2 rays per step.
- replaced BVH accelerator with QBVH (4x time faster on CPU, about 30% faster on GPU);
- new architecture allows multiple thread to be assigned to multiple GPUs;
- added support for multiple shadowrays;
- added support for camera rotation/translation with the mouse;
- solved several problems related to light sampling and fireflies;
- added support for gaussian filtering;
- new preview mode with large gaussian filter;
Version 1.4 and above
SmallLuxGPU is now part of LuxRays. It is included there as a demo of the features available.
While the first development steps have been taken, the majority of the work is still ahead of us. The current project is highly experimental and is not ready for general use. You should expect that sources may not compile straight away and that modifications are necessary to get it to compile. Unless some kind soul has the knowledge and time to help you out, you will unfortunately have to figure the installation process out by yourself as at the moment we prefer spending time on development over giving installation instructions or ready-made binaries.
We understand that many of you are very keen on trying out the latest and greatest, but until the code has matured enough to be ready for general testing, we have to ask you to be patient. We would be happy to see this project in a usable state by the end of 2010.
Time table for the introduction of OpenCL support in Luxrender
We aim to have full OpenCL support for v0.8. The v0.7 is scheduled before summer 2010. We should have the first v0.8 code running at the end of the summer and v0.8 release before the end of the year.
GPGPUs, GPUs and dummy GPUs
Please, note, not all GPUs are created equal. You need a fairly recent GPU to obtain a noticeable advantage in using them for rendering. Most old and low-end GPUs are missing the basic features required of a good and fast OpenCL support.
LuxrenderGPU is an experimental version of Luxrender integrating the support for OpenCL. It is intended as a final prototype before the start of work on Luxrender v0.8 where full OpenCL support is planed. LuxrenderGPU includes the first prototype LuxRays library too.
In date 2010/02/04, the Luxrender Developer Team has presented the very first rendering of LuxrenderGPU. While it represents an important milestone in the process to introduce OpenCL support in Luxrender, we are still far from having a tool usable from an end-user and we are still talking of a prototype/experiment. This mean you can not expect to see a binary package of LuxrenderGPU available soon (however the sources are available in the Luxrender source repository).
First results on a high-end system
Luxrender Classic Vs LuxrenderGPU on a i7 860 + ATI HD 5870: 57K Samples/sec Vs 202K Samples/sec
First results on a low-end system
Luxrender Classic Vs LuxrenderGPU on a Q6600 + NVIDIA 240GT: 31K Samples/sec Vs 59K Samples/sec
CPU + GPU + Network Rendering
Thanks to the approach used by LuxrenderGPU (i.e. GPU used only for ray intersections), it has a large list of features inherited from LuxrenderClassic: all materials/textures, all kind of light sources, etc. are supported even from the very first rendering. Including the support for network rendering:
LuxRays is the name we have chosen for the part of Luxrender dedicated to accelerate the ray intersection process by using GPUs. LuxRays is intended as an open source component for accelerating any ray tracing application.
CPU Vs GPU ? No, thanks, CPU + GPU + Network rendering is better
One of the fundamental aspect planned while introducing OpenCL in Luxrender was to not lose any of the feature available. It means GPGPU will be just an addition to the wide set of features available to reduce the rendering time. You will still able to spawn multiple rendering threads to take advantage of your multi-core CPUs and to use Network rendering in order to let multiple computers to work on the same image. You will be able to finally take advantage of the GPUs available on a single computer and/or among all computers available on your networks.
[To be continued...]