Luxrender and OpenCL - LuxRender Wiki
Luxrender GPL Physically Based Renderer

Luxrender and OpenCL

Personal tools

From LuxRender Wiki

Jump to: navigation, search

This is an older page. If you are looking for a user guide for GPU rendering in LuxRender, please see this page: http://www.luxrender.net/wiki/GPU


This is a brief recap of all arguments, results, video, screenshots, etc. discussed in this thread. This page is about the progress in the process to introduce the support for OpenCL in Luxrender.

Contents

What are GPGPUs ?

Quoted from www.gpgpu.org:

GPGPU stands for General-Purpose computation on Graphics Processing Units, also known as GPU Computing. Graphics Processing Units (GPUs) are high-performance many-core processors capable of very high computation and data throughput. Once specially designed for computer graphics and difficult to program, today’s GPUs are general-purpose parallel processors with support for accessible programming interfaces and industry-standard languages such as C. Developers who port their applications to GPUs often achieve speedups of orders of magnitude vs. optimized CPU implementations.

Indeed GPGPUs are going to be a very useful tool for Luxrender.

What is OpenCL ?

Quoted from www.khronos.org:

OpenCL™ is the first open, royalty-free standard for cross-platform, parallel programming of modern processors found in personal computers, servers and handheld/embedded devices. OpenCL (Open Computing Language) greatly improves speed and responsiveness for a wide spectrum of applications in numerous market categories from gaming and entertainment to scientific and medical software.

Luxrender developer team want to continue to support a wide range of platforms and OS (i.e. Linux, Window, MacOS, ATI GPGPUS, NVIDIA GPUs, etc.). Before OpenCL, each vendor has its own (i.e. proprietary) set of tools to develop GPGPUs application: NVIDIA CUDA, ATI Stream SDK, etc. OpenCL solves this problem and has been seen by Luxrender developer team as the way to go since the release of the first specifications.

Introducing OpenCL in Luxrender

This is a list of the steps done up to now to introduce OpenCL support in Luxrender. Indeed it is a quite complex task that requires a lot of tests, experiments and steps.

ATI released OpenCL beta SDK with hardware support

The first step was the ATI release of a OpenCL beta SDK in October with the support for GPGPUs in HD4xxx and new HD5xxx generation. This allowed Luxrender developers to do the first tests.

The first test: MandelCPU Vs MandelGPU

MandelGPU has been the very first test written by Dade to compare the performances obtainable with GPGPUs and CPU.

OpenCL-MandelGPU.jpg

There result were quite impressive: MandelGPU was 62 time faster than MandelCPU.

The second test: checking ray tracing performances with SmallptCPU Vs SmallptGPU

SmallptGPU has been developed by Dade in order to check the kind of performances obtainable in our main field of interest with GPGPUs.

OpenCL-smallptGPU-Gallery01.jpg OpenCL-smallptGPU-Gallery03.jpg OpenCL-smallptGPU-Gallery04.jpg

SmallptGPU is a porting of original Kevin Beason's [1] to OpenCL. The performances were again quite impressive: SmallptGPU was about 10 time faster than the SmallptCPU. A video of SmallptGPU can be find here or here. It has attracted a noticeable amount of interest and has been published on the front page of Khronos group too. It has been used as benchmark at Beyond3D, XtremeSystems and in other websites. Some really impressive results have been achieved by running SmallptGPU on an overclocked ATI HD5870 by Lightman:

OpenCL-record-smallptgpu.png

This is about 45 time faster than SmallptCPU running on an Intel Q6600. Talonman has achieved some really high number of samples (on a very simple scene however) by using one of the 2 GPUs available with a NVIDIA GTX 295:

OpenCL-smallptGPU-high-smaples.jpg

114,000,000 of samples/sec is really a high number.

SmallptGPU in Direct Lighting mode

SmallptGPU is a path tracer but other rendering modes (like in Luxrender) can be easily introduced by using a different OpenCL kernel, this is SmallptGPU running in direct lighting mode (i.e. classic Whitted-like ray tracing):

OpenCL-smallptGPU-Gallery05.jpg

Indeed it is really fast.

SmallptGPU Vs Luxrender

While it is not fair to compare SmallptGPU samples with the one generate by Luxrender, it is still an interesting test. This is exactly the same scene, rendered with Luxrender for 60 seconds with Metropolis sampler and Path surface integrator:

OpenCL-smallptGPU-Test-lux.jpg

The same scene rendered for 60 seconds with SmallptGPU:

OpenCL-smallptGPU-Test-smallptgpu.jpg

SmallptGPU on Linux, Windows and MacOS

One of the most interesting feature of the OpenCL is that they are a cross-platform standard. Dade has successful run SmallptGPU under Linux and Windows. Jens has been able to test and run SmallptGPU on MacOS:

OpenCL-smallptGPU-macos.png

Some more experience with OpenCL

Some more experience with OpenCL has been gained by working on 2 more simple OpenCL ray tracers. JuliaGPU:

OpenCL-juliaGPU-gallery03.jpg

A video of JuliaGPU is availabel here or here. And MadelbulbGPU:

OpenCL-mandelbulbGPU-Gallery01.jpg

SmallptGPU2

SmallptGPU2 sources and binaries are available here. SmallptGPU2 was used as an experiment to evaluate how the OpenCL support for multiple device (i.e. multiple GPUs) works. This is a screenshot of SmallptGPU2 running on GPU OpenCL device + CPU OpenCL device:

OpenCL-smalptGPU2-multi-devices.jpg

SmallLuxGPU

NOTE: SmallLuxGPU is now part of LuxRays. It is included there as a demo of the features available.

SmallptGPU has been an interesting experiment but it represents an approach hardly replicable in Luxrender: all the rendering is is done on the GPGPU. Porting 300,000 lines of code to OpenCL is not possible (i.e. OpenCL kernels are usually few lines long, in the order of hundred of lines, not thousands). SmallLuxGPU has been instead a great step forward in the process to introduce OpenCL in Luxrender because it introduces an approach replicable in Luxrender.

The idea is to use the GPGPUs only for ray intersections in order to minimize the amount of the brand new code to write and to not loose any of the functionality already available in Luxrender. In order to test this idea, Dade wrote a very simplified path tracer and ported Luxrender's BVH accelerator to OpenCL. The path integrator has the particularity to work on a huge set (i.e. 300,000+) of paths at the same time in order to generate a large amount of rays to trace and to keep the GPGPU feed.

OpenCL-smallluxGPU-Gallery01.jpg OpenCL-smallluxGPU-Gallery02.jpg OpenCL-smallluxGPU-Gallery03.jpg OpenCL-smallluxGPU-Gallery04.jpg

SmallLuxGPU has shown good results from the first version, clearly outperforming CPU-only rendering:

OpenCL-smallluxgpu-CPUvsGPU.jpg

A video of SmallLuxGPU in action is available here or here.

Version 1.1

Version 1.1 has introduced the support for multiple OpenCL device and for native thread base rendering. This is SmallLuxGPU v1.1 running on Dade's i7 860 + HD5870 with 8 native threads and 1 GPU:

OpenCL-smallluxGPU-v1.1-bigmonkey.jpg

This is SmallLuxGPU v1.1 running on Jens's MacPro:

OpenCL-smallluxGPU-v1.1-macpro.png

And this is SmallLuxGPU v1.1 running on Talonman's Q6600 with a NVIDIA 295 GTX and a 280 GTX (for a total of 3xGPUs):

OpenCL-smallluxGPU-v1.1-loft.jpg

A new video of v1.1 is available here and the sources and binaries of v1.1beta2 are available here.

What's new

  • support for multiple OpenCL devices;
  • support for native threads;
  • new thread and double buffer architecture for devices;
  • "low latency" and "high bandwidth" rendering modes;

Command line arguments

Follow a brief description of SmallLuxGPU command line arguments:

  1. can be 0 (high bandwidth mode) or 1 (low latency mode). It selects the rendering mode, "low latency" is intended for fast response to user inputs while "high bandwidth" is for achieving high number of samples/sec (very slow to respond user inputs);
  2. number of native rendering threads to spawn (they should be used to consume any spare cycle left by GPU host threads);
  3. 1 to use all OpenCL CPU devices if available, 0 to skip them;
  4. 1 to use all OpenCL GPU devices if available, 0 to skip them;
  5. used force GPUs workgroup size. 0 means use the default value;
  6. width of the rendering window;
  7. heigh of the rendering window;
  8. name of the scene file.

Rendering 2,700,000+ triangles in less than 256MB of GPU RAM

SmallLuxGPU has successful rendered in near real-time a scene with more than 2,700,000 triangles:

OpenCL-smallluxGPU-v1.1-27000000.jpg

SmallLuxGPU and BulletPhysics

Chiaroscuro has posted a link to an animation rendered with SmallLuxGPU and BulletPhysics: http://www.youtube.com/watch?v=33rU1axSKhQ

OpenCL-smallluxGPU-v1.1-bulletphysics.jpg

A new Chiaroscuro's animation is available here. He as posted also some beautiful still rendering:

OpenCL-smallluxGPU-v1.2-StanfordLucy.jpg

Version 1.2

What's new

  • added on screen statistic for OpenCL device load average;
  • added batch mode for rendering images;
  • added benchmark mode;
  • optimized memory access pattern (about 2 times faster on NVIDIA GPUs);
  • more stable samples/sec statistic;
  • added support for normal interpolation;
  • optimization for no visible lights (about 2 times faster on loft.scn);
  • added support for vertex colours interpolation;
  • added support for configuration file;
  • added support for OpenCL platform and devices selection via configuration file;
  • new surface integrator architecture, it is able to generate 2 rays per step.

Version 1.3

What's new

  • replaced BVH accelerator with QBVH (4x time faster on CPU, about 30% faster on GPU);
  • new architecture allows multiple thread to be assigned to multiple GPUs;
  • added support for multiple shadowrays;
  • added support for camera rotation/translation with the mouse;
  • solved several problems related to light sampling and fireflies;
  • added support for gaussian filtering;
  • new preview mode with large gaussian filter;

Version 1.4 and above

SmallLuxGPU is now part of LuxRays. It is included there as a demo of the features available.

Luxrender

Status

While the first development steps have been taken, the majority of the work is still ahead of us. The current project is highly experimental and is not ready for general use. You should expect that sources may not compile straight away and that modifications are necessary to get it to compile. Unless some kind soul has the knowledge and time to help you out, you will unfortunately have to figure the installation process out by yourself as at the moment we prefer spending time on development over giving installation instructions or ready-made binaries.

We understand that many of you are very keen on trying out the latest and greatest, but until the code has matured enough to be ready for general testing, we have to ask you to be patient. We would be happy to see this project in a usable state by the end of 2010.

Time table for the introduction of OpenCL support in Luxrender

We aim to have full OpenCL support for v0.8. The v0.7 is scheduled before summer 2010. We should have the first v0.8 code running at the end of the summer and v0.8 release before the end of the year.

GPGPUs, GPUs and dummy GPUs

Please, note, not all GPUs are created equal. You need a fairly recent GPU to obtain a noticeable advantage in using them for rendering. Most old and low-end GPUs are missing the basic features required of a good and fast OpenCL support.

LuxrenderGPU

LuxrenderGPU is an experimental version of Luxrender integrating the support for OpenCL. It is intended as a final prototype before the start of work on Luxrender v0.8 where full OpenCL support is planed. LuxrenderGPU includes the first prototype LuxRays library too.

In date 2010/02/04, the Luxrender Developer Team has presented the very first rendering of LuxrenderGPU. While it represents an important milestone in the process to introduce OpenCL support in Luxrender, we are still far from having a tool usable from an end-user and we are still talking of a prototype/experiment. This mean you can not expect to see a binary package of LuxrenderGPU available soon (however the sources are available in the Luxrender source repository).

First results on a high-end system

Luxrender Classic Vs LuxrenderGPU on a i7 860 + ATI HD 5870: 57K Samples/sec Vs 202K Samples/sec

LuxrenderGPU-first-rendering.jpg

First results on a low-end system

Luxrender Classic Vs LuxrenderGPU on a Q6600 + NVIDIA 240GT: 31K Samples/sec Vs 59K Samples/sec

CPU + GPU + Network Rendering

Thanks to the approach used by LuxrenderGPU (i.e. GPU used only for ray intersections), it has a large list of features inherited from LuxrenderClassic: all materials/textures, all kind of light sources, etc. are supported even from the very first rendering. Including the support for network rendering:

LuxrenderGPU-first-rendering-network.jpg

LuxRays

LuxRays is the name we have chosen for the part of Luxrender dedicated to accelerate the ray intersection process by using GPUs. LuxRays is intended as an open source component for accelerating any ray tracing application.

CPU Vs GPU ? No, thanks, CPU + GPU + Network rendering is better

One of the fundamental aspect planned while introducing OpenCL in Luxrender was to not lose any of the feature available. It means GPGPU will be just an addition to the wide set of features available to reduce the rendering time. You will still able to spawn multiple rendering threads to take advantage of your multi-core CPUs and to use Network rendering in order to let multiple computers to work on the same image. You will be able to finally take advantage of the GPUs available on a single computer and/or among all computers available on your networks.

[To be continued...]