ATI released OpenCL SDK with hardware support

Discussions related to GPU Acceleration in LuxRender

Moderators: Dade, jromang, tomb, coordinators

Re: ATI released OpenCL SDK with hardware support

Postby jensverwiebe » Sun Jan 03, 2010 7:24 pm

Hi

ATM i just can conclude this for me on OSX/NVidia:

All openCL apps provided by Apple work ( Apple is a bit emphsized on NVidia )
The Julia for example works well with the Apple-code but Dade´s adaption with ATI-sdk fails here.
All CL-examples from other persons written for OSX work fine too with NV. ( galaxies, julia, opencl_aobench_2, ... )

My conclusion:
-either: Code lacks some adaption strongly needed to get it work with NVidia´s ( bufferread for example fails with smallGPU 1.5beta3 with 16/32 workgroups adjusted, higher crash the system ( kernel? ) .
( Failed to read the OpenCL pixel buffer: -36 )
-or the smallGPU and smallLuxGPU use the one special function that is buggy in CL for NVidia.

Here some tests with NV-sdk-apps provided:
Code: Select all
/Developer/GPU Computing/OpenCL/bin/darwin/release/oclBandwidthTest Starting...

WARNING: NVIDIA OpenCL platform not found - defaulting to first platform!

Running on...

 Device GeForce 8800 GT
 Quick Mode

 Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
   Transfer Size (Bytes)   Bandwidth(MB/s)
   33554432         2402.2

 Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
   Transfer Size (Bytes)   Bandwidth(MB/s)
   33554432         3090.7

 Device to Device Bandwidth, 1 Device(s)
   Transfer Size (Bytes)   Bandwidth(MB/s)
   33554432         27204.3


TEST PASSED


Press <Enter> to Quit...
-----------------------------------------------------------

jens-macpro:release jensverwiebe$ /Developer/GPU\ Computing/OpenCL/bin/darwin/release/oclVolumeRender
/Developer/GPU Computing/OpenCL/bin/darwin/release/oclVolumeRender Starting...

 Press '=' and '-' to change density
       ']' and '[' to change brightness
       ';' and ''' to modify transfer function offset
       '.' and ',' to modify transfer function scale

  CL_DEVICE_NAME:          GeForce 8800 GT
  CL_DEVICE_VENDOR:          NVIDIA
  CL_DRIVER_VERSION:          CLH 1.0
  CL_DEVICE_TYPE:         CL_DEVICE_TYPE_GPU
  CL_DEVICE_MAX_COMPUTE_UNITS:      14
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:   3
  CL_DEVICE_MAX_WORK_ITEM_SIZES:   512 / 512 / 64
  CL_DEVICE_MAX_WORK_GROUP_SIZE:   512
  CL_DEVICE_MAX_CLOCK_FREQUENCY:   1500 MHz
  CL_DEVICE_ADDRESS_BITS:      32
  CL_DEVICE_MAX_MEM_ALLOC_SIZE:      128 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE:      512 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT:   no
  CL_DEVICE_LOCAL_MEM_TYPE:      local
  CL_DEVICE_LOCAL_MEM_SIZE:      16 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:   64 KByte
  CL_DEVICE_QUEUE_PROPERTIES:      CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT:      1
  CL_DEVICE_MAX_READ_IMAGE_ARGS:   128
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:   8

  CL_DEVICE_IMAGE <dim>         2D_MAX_WIDTH    8192
               2D_MAX_HEIGHT    8192
               3D_MAX_WIDTH    2048
               3D_MAX_HEIGHT    2048
               3D_MAX_DEPTH    2048

  CL_DEVICE_EXTENSIONS:         cl_khr_byte_addressable_store
               cl_khr_global_int32_base_atomics
               cl_khr_global_int32_extended_atomics
               cl_APPLE_gl_sharing
               cl_APPLE_SetMemObjectDestructor
               cl_APPLE_ContextLoggingFunctions
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>   CHAR 1, SHORT 1, INT 1, FLOAT 1, DOUBLE 1


 Read '../../../src/oclVolumeRender/data/Bucky.raw', 32768 bytes
 Raw file data loaded...

 Volume Render | W: 512  H: 512
 Volume Render | W: 512  H: 512



Jens

P:S: other than reported on linux and win, on OSX CPU works well multithreaded on 4 core´s, no alternating use of 1 core , but well load across all cores !
Comparing this to : http://forum.beyond3d.com/showthread.php?s=1ef2bea62a9d0550645006f55d123bc2&t=55913&page=4
the advantage melts down...take my result and calculate it when done with a modern Xeon octa 2.93 MacPro, would be 5 times faster minimum.
Attachments
Bildschirmfoto 2010-01-04 um 02.01.09.png
User avatar
jensverwiebe
Developer
 
Posts: 3429
Joined: Wed Apr 02, 2008 4:34 pm

Re: ATI released OpenCL SDK with hardware support

Postby tomb » Mon Jan 04, 2010 3:56 am

Again; fantastic work Dade! :)

I stumbled upon this http://techreport.com/discussions.x/18201 just now - apparently the 4xxx series has known performance problems in OpenCL, and was not
designed for OpenCL in the first place...

Edit: nevermind - this is related to the local/global memory emulation issue noted earlier in the thread (by Dade)

T
User avatar
tomb
Developer
 
Posts: 2677
Joined: Thu Oct 11, 2007 4:23 pm
Location: Oslo, Norway

Re: ATI released OpenCL SDK with hardware support

Postby psychotron » Mon Jan 04, 2010 4:38 am

what's the conclusion about nvidia cards right now? it's better to wait for new generation?
I want to stick with nvidia cause ati have always some problems with opengl and blender
User avatar
psychotron
Developer
 
Posts: 836
Joined: Tue Jan 15, 2008 4:04 am
Location: Pleiades

Re: ATI released OpenCL SDK with hardware support

Postby Dade » Mon Jan 04, 2010 5:12 am

jensverwiebe wrote:P:S: other than reported on linux and win, on OSX CPU works well multithreaded on 4 core´s, no alternating use of 1 core , but well load across all cores !
Comparing this to : http://forum.beyond3d.com/showthread.php?s=1ef2bea62a9d0550645006f55d123bc2&t=55913&page=4
the advantage melts down...take my result and calculate it when done with a modern Xeon octa 2.93 MacPro, would be 5 times faster minimum.


Jens, the fact alone the the code works well with OpenCL CPU devices (and all ATI OpenCL GPUs) could be an hint of a problem with NVIDIA OpenCL GPU drivers. If you have followed the all the discussion at http://forum.beyond3d.com/showthread.php?s=1ef2bea62a9d0550645006f55d123bc2&t=55913&page=4, you have probably realized how "delicate" is moving the code from one hardware platform (i.e. ATI) to another (i.e. NVIDIA).

There is also the "problem" that the new generation of ATI GPUs offers a lot better support for OpenCL than anything else available today. We are at "risk" to compare the old NVIDIA generation with the new ATI GPUs. It would not be fair, old generation of GPUs weren't designed to run OpenCL, we have to wait for Fermi before to know exactly how NVIDIA GPUs will run.

The guys at http://forum.beyond3d.com/showthread.php?t=55913&page=4 and http://www.xtremesystems.org/forums/sho ... 904&page=2 are posting some crazy screenshot:

Image

Look at this ! 17,000,000 of samples on at 5850 !

Image

77,000,000 of samples/sec (always on a 5850)

Image

114,000,000 of samples/sec on a GTX 295 (using just one GPU). So NVIDIA GPUs can run well too. It looks like my code has problem of "register pressure" (it runs out of registers on old NVIDIA GPUs).

Indeed, we are talking about "samples" not directly comparable with Luxrender one but ... 114,000,000 samples/sec is a ridiculous high number even for an empty scene :D
User avatar
Dade
Developer
 
Posts: 8404
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: ATI released OpenCL SDK with hardware support

Postby Anthor » Mon Jan 04, 2010 5:37 am

Dade wrote:@Anthor: ciao, the error means you have not enough hardware resources to run the application. I'm afraid the 8600 GTS isn't powerful enough to run SmallLuxGPU (may be not enough registers or ram ?).



Ok.I tried on the PC in the office with an gts 250 (1 Gb. ram) and it works only with 'simple.scn' with other scenes appears the same error:

ERROR: clEnqueueReadBuffer(-5)

note: smallptgpu works well with all the his scenes

ps. I tried with linux (debian) and a window appears for a few seconds.

Ciao :)
Anthor
 
Posts: 16
Joined: Fri Nov 13, 2009 3:00 am
Location: Italy

Re: ATI released OpenCL SDK with hardware support

Postby zsouthboy » Mon Jan 04, 2010 12:55 pm

Dade,

For the realtime movement of camera idea, take a look at the what Ian (droid) was doing with Radium before he disappeared ( :( :( :( ). I have the .jar still hanging around here somewhere if you never saw/used it.
He allowed different preview rendering strategies, PT was one of them (and the slowest).
It was sort of an integrated material editor, you could change materials and see the scene update as well.
When you attempted to move the camera, the scene would be rendered at half (IIRC) resolution until you stop moving the camera, which helped with responsiveness.

I am impressed as hell with what you're doing right now - do you want someone to take up a donations to get you a 58xx board and possibly an nvidia board?
zsouthboy
 
Posts: 327
Joined: Sun Oct 14, 2007 9:28 pm

Re: ATI released OpenCL SDK with hardware support

Postby Poly_Pusher » Mon Jan 04, 2010 1:56 pm

zsouthboy wrote:Dade,
I am impressed as hell with what you're doing right now - do you want someone to take up a donations to get you a 58xx board and possibly an nvidia board?


If that would help development I would be willing to throw in...
Poly_Pusher
 
Posts: 13
Joined: Sun Dec 13, 2009 1:17 pm

Re: ATI released OpenCL SDK with hardware support

Postby Dade » Tue Jan 05, 2010 4:49 am

zsouthboy wrote:For the realtime movement of camera idea, take a look at the what Ian (droid) was doing with Radium before he disappeared ( :( :( :( ). I have the .jar still hanging around here somewhere if you never saw/used it.
He allowed different preview rendering strategies, PT was one of them (and the slowest).
It was sort of an integrated material editor, you could change materials and see the scene update as well.
When you attempted to move the camera, the scene would be rendered at half (IIRC) resolution until you stop moving the camera, which helped with responsiveness.


I use similar trick, based on max. path depth: I trace the first few samples with a small path depth in order to update the screen faster. You may have also noticed as in the video, sometime, I force the max. path depth to 1, effectively changing the path integrator in just a directlighting-like integrator. Indeed working on screen resolution would improve the responsiveness even more.

zsouthboy wrote:I am impressed as hell with what you're doing right now - do you want someone to take up a donations to get you a 58xx board and possibly an nvidia board?


Thanks but I don't feel the hardware is a limitation for me at the moment. There is sooo much work yet to do on the software side (not only on my side but also on the vendor: more I work on OpenCL and more I understand how young are the current drivers and how many problems they still have). I'm going to buy a new PC soon and I'm thinking to buy a cheap NVIDIA card for the old one so I can have a testing platform for NVIDIA too.

Just to give you an idea, check this post: http://forums.nvidia.com/index.php?showtopic=154710 (it looks like NVIDIA does I/O operations with spin locks (i.e. busy waits), this is quite terrible for anyone like us interested to still use CPUs for rendering too).
User avatar
Dade
Developer
 
Posts: 8404
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: ATI released OpenCL SDK with hardware support

Postby jensverwiebe » Tue Jan 05, 2010 5:30 am

Uh ? http://forums.nvidia.com/index.php?showtopic=154710

On my sys i get 0.75% from possible 100 % cpu-saturation ( one core ), so practically none.
Code: Select all
jens-macpro:~ jensverwiebe$ cd /Developer/GPU\ Computing/OpenCL/bin/darwin/release
jens-macpro:release jensverwiebe$ /Developer/GPU\ Computing/OpenCL/bin/darwin/release/oclNbody
/Developer/GPU Computing/OpenCL/bin/darwin/release/oclNbody Starting...

clCreateContextFromType
clGetContextInfo

  CL_DEVICE_NAME:          GeForce 8800 GT
  CL_DEVICE_VENDOR:          NVIDIA
  CL_DRIVER_VERSION:          CLH 1.0
  CL_DEVICE_TYPE:         CL_DEVICE_TYPE_GPU
  CL_DEVICE_MAX_COMPUTE_UNITS:      14
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:   3
  CL_DEVICE_MAX_WORK_ITEM_SIZES:   512 / 512 / 64
  CL_DEVICE_MAX_WORK_GROUP_SIZE:   512
  CL_DEVICE_MAX_CLOCK_FREQUENCY:   1500 MHz
  CL_DEVICE_ADDRESS_BITS:      32
  CL_DEVICE_MAX_MEM_ALLOC_SIZE:      128 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE:      512 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT:   no
  CL_DEVICE_LOCAL_MEM_TYPE:      local
  CL_DEVICE_LOCAL_MEM_SIZE:      16 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:   64 KByte
  CL_DEVICE_QUEUE_PROPERTIES:      CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT:      1
  CL_DEVICE_MAX_READ_IMAGE_ARGS:   128
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:   8

  CL_DEVICE_IMAGE <dim>         2D_MAX_WIDTH    8192
               2D_MAX_HEIGHT    8192
               3D_MAX_WIDTH    2048
               3D_MAX_HEIGHT    2048
               3D_MAX_DEPTH    2048

  CL_DEVICE_EXTENSIONS:         cl_khr_byte_addressable_store
               cl_khr_global_int32_base_atomics
               cl_khr_global_int32_extended_atomics
               cl_APPLE_gl_sharing
               cl_APPLE_SetMemObjectDestructor
               cl_APPLE_ContextLoggingFunctions
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>   CHAR 1, SHORT 1, INT 1, FLOAT 1, DOUBLE 1


clCreateCommandQueue

AllocateNBodyArrays m_dPos

AllocateNBodyArrays m_dVel


CreateProgramAndKernel _noMT...
Loading Uncompiled kernel from .cl file, using oclNbodyKernel.cl

oclLoadProgSource
clCreateProgramWithSource
clBuildProgram
clCreateKernel

CreateProgramAndKernel _MT...
Loading Uncompiled kernel from .cl file, using oclNbodyKernel.cl

oclLoadProgSource
clCreateProgramWithSource
clBuildProgram
clCreateKernel

Reset Nbody System...

Running standard oclNbody simulation...


Reset Nbody System...
OpenCL for GPU Nbody Demo (30720 bodies)


I was thinking about getting an ATI too, but atm Apple does not support 5980 and lower would be a bad jump, have to wait until new Mac´s come out ( march perhaps )
I still be curious if theres not a solution even on slightly older HW. My 8800GT has 190 GFlops, would be o.k. for the first ( GTX 285 has 400GFlops, afaik ATI 5980 reaches 1400 GFlpos :o )

Jens
User avatar
jensverwiebe
Developer
 
Posts: 3429
Joined: Wed Apr 02, 2008 4:34 pm

Re: ATI released OpenCL SDK with hardware support

Postby Dade » Tue Jan 05, 2010 7:44 am

jensverwiebe wrote:On my sys i get 0.75% from possible 100 % cpu-saturation ( one core ), so practically none.


Jens, I'm not sure if the Apple's OpenCL includes code form NVIDIA, it includes the CPU device support (not available from NVIDIA) so it could be a totally different code base. The link above is about people working under Windows.

OpenCL is an Apple's trademark loaned to Khronos Group, the very first OpenCL draft has been proposed by Apple. The first available implementation was from Apple. Apple OpenCL driver could be very different from NVIDIA one (even if working on the same hardware).
User avatar
Dade
Developer
 
Posts: 8404
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

PreviousNext

Return to GPU Acceleration

Who is online

Users browsing this forum: Google [Bot] and 8 guests