ATI released OpenCL SDK with hardware support

Discussions related to GPU Acceleration in LuxRender

Moderators: Dade, jromang, tomb, coordinators

Re: ATI released OpenCL SDK with hardware support

Postby Dade » Sun Jan 10, 2010 9:07 am

jensverwiebe wrote:In dualmode the cpu is not maxed out any longer , so is the gpu.( spending lot of time in sys, again a bloody threading thing?


True, could be, I use boost::barrier and they are implemented over boost::mutex and we know all the problems boost::mutex have under MacOS ... I can not believe this problem has not yet been fixed, grrr.

I guess we will have to replace boost::barrier with something implemented over Luxrender's lux::fast_mutex for the Mac.

@Eros: your results are quite good, if you download the 2.0 version, you should be able even to double them by using both your GPUs.
User avatar
Dade
Developer
 
Posts: 8356
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: ATI released OpenCL SDK with hardware support

Postby jensverwiebe » Sun Jan 10, 2010 10:42 am

Dade , iám not sure about this short-living locks. But if the problem remains, we should anyway reuse Lord´s fast:mutex.h and do this for
windows too. Aka use spinlocks on Mac and CRITICAL_SECTION on win.
In OSX 10.6, all vanilla mutexes had been fixed, so this might be the wrong guess.

Another way could be to use the better optimized chronos stuff:

CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_APPLE_gl_sharing
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, FLOAT 1, DOUBLE 1


These are highly optimized for 32-bit scalar registers , like NVidia is using dedicated. This seems btw the standard in CL.
Recommende reading aside from chronosgroup: http://www.khronos.org/opencl/sdk/1.0/docs/man/xhtml/
Apple openCL guide: http://developer.apple.com/mac/library/ ... ction.html

Have to read more about that....

Jens
User avatar
jensverwiebe
Developer
 
Posts: 3407
Joined: Wed Apr 02, 2008 4:34 pm

Re: ATI released OpenCL SDK with hardware support

Postby jensverwiebe » Sun Jan 10, 2010 1:59 pm

Dade

I think on other sys the hybrid cpu/gpu does not work atm ? I only see multgpu-results around.
Hypercrush too verifyed, "hybrid" is not working for him on linux, but his both gpu alone.

Seems this is then simply a question of handling workload different.
I played a bit and set balancing 70/30 and 80/20 and other way round---> always speed is like max. from 1 device, not added.
But if i fidlle with workload math, i get much higher scores, but only half the picture ( i hope this is not the reason for higher scores )


Example:
Normally i got with cpu/gpu 1130, here it is 2190. Though i think s/px is not affected by half the rendersize ( am i right? s/px is per px ),
it shows the directing of workload is the clou to solve, balancing between gpu´s seems more easy to handle than between cpu and gpu.

Jens
Attachments
test.png
User avatar
jensverwiebe
Developer
 
Posts: 3407
Joined: Wed Apr 02, 2008 4:34 pm

Re: ATI released OpenCL SDK with hardware support

Postby Dade » Sun Jan 10, 2010 3:45 pm

jensverwiebe wrote:Another way could be to use the better optimized chronos stuff:

CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_APPLE_gl_sharing
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, FLOAT 1, DOUBLE 1



The OpenCL atomic operations are used in OpenCL C inside the kernel code (i.e. on the GPU) to synchronize the threads inside the same workgroup. We can not use them to synchronize the execution of normal C++ code compiled by the GCC on the CPU.

The performances are measured as the time spent to render a pass / the window width * window height ... if you render only half image, this is going to incorrectly show 2x samples/sec ;)
User avatar
Dade
Developer
 
Posts: 8356
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: ATI released OpenCL SDK with hardware support

Postby Eros » Sun Jan 10, 2010 4:18 pm

Thanks again Dade for the help to get everything compiled etc. With version 2, by comparison I boost up to 30M Samples/S in the simple scene, and to about 8M on the cornell. This is about a 50% or so increase in the simple scene and a 100% increase in the cornell.

I guess this would be expected, has to be some limit to how quickly you can throw things around right?

With this kind of speed id love to see the Cubo scene with all those lovely caustics rendered :D
User avatar
Eros
 
Posts: 418
Joined: Wed Jul 22, 2009 8:37 am

Re: ATI released OpenCL SDK with hardware support

Postby jensverwiebe » Mon Jan 11, 2010 1:54 am

Dade wrote:The performances are measured as the time spent to render a pass / the window width * window height ... if you render only half image, this is going to incorrectly show 2x samples/sec ;)


..some dreaming may be allowed ? ;)

Another found in Apples NBody example ( v.2 ):
CPU-GPU hybrid shown at WWDC, is removed for now since it requires careful load balancing.


So this is what i see with v1: 1 cpu = 12 GFlop, 4 cpu = 44 GFlop, gpu = 190 GFlop, "hybrid" gpu/4 CPU= 60 GFlop ---> quod erat demonstrandum :twisted:

Jens
User avatar
jensverwiebe
Developer
 
Posts: 3407
Joined: Wed Apr 02, 2008 4:34 pm

Re: ATI released OpenCL SDK with hardware support

Postby Dade » Mon Jan 11, 2010 3:18 am

Jens, the initial profiling part of SmallptGPU is not doing a very good job (i.e. probably too short and too simple scene). At Beyond3D have suggested me to add add the capability to hand tune the workload % on each device so we can do some test by hand tuning the workload between CPU/GPU and see how the performance change.
User avatar
Dade
Developer
 
Posts: 8356
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: ATI released OpenCL SDK with hardware support

Postby psychotron » Mon Jan 11, 2010 2:07 pm

interesting reaction on radiance's openCL report from erwin (ripoff from blenderartists.org octane thread)

What OpenCL extension do you need? Can you please be more specific?

Have you already reported those issues to the OpenCL developers, either publically using the OpenCL forums (http://www.khronos.org/message_board...forum.php?f=27) or privately to the AMD/NVidia beta feedback?

We are currently preparing towards OpenCL 1.1 in Khronos and it would be good to check to make sure that your required extensions are included.
Thanks,
Erwin


so is there something you guys want to report towards new openCL 1.1? :)
User avatar
psychotron
Developer
 
Posts: 836
Joined: Tue Jan 15, 2008 4:04 am
Location: Pleiades

Re: ATI released OpenCL SDK with hardware support

Postby DingTo » Mon Jan 11, 2010 2:29 pm

Very nice, keep it up. :)

@psychotron: Really exciting to see, there is beeing work done towards OpenCL 1.1.
Blender Developer since 2009
Blog - DingTo
User avatar
DingTo
 
Posts: 17
Joined: Tue Jan 13, 2009 11:25 am

Re: ATI released OpenCL SDK with hardware support

Postby mitchde » Mon Jan 11, 2010 3:42 pm

Hi, i have compile problems with the 2.0beta1 and Mac OS X

In file included from displayfunc.h:39,
from smallptGPU.cpp:27:
renderconfig.h:38:25: error: OpenCL/cl.hpp: No such file or directory
renderconfig.h:43:35: error: boost/thread/thread.hpp: No such file or directory
renderconfig.h:44:36: error: boost/thread/barrier.hpp: No such file or directory
In file included from renderconfig.h:48,
from displayfunc.h:39,
from smallptGPU.cpp:27:

Indeed within the src there is no cl.hpp and i dont have boost ?
Can someone upload the Mac OS X binary (with the pre..kernel ) or some src for Mac OS X within all needed external (cl.hpp, boost.lib) ? Thanks
mitchde
 
Posts: 256
Joined: Fri Dec 25, 2009 2:13 am

PreviousNext

Return to GPU Acceleration

Who is online

Users browsing this forum: No registered users and 1 guest