ATI released OpenCL SDK with hardware support

Discussions related to GPU Acceleration in LuxRender

Moderators: Dade, jromang, tomb, coordinators

ATI released OpenCL SDK with hardware support

Postby Dade » Tue Oct 20, 2009 6:14 am

ATI has released a new beta of their SDK with hardware GPU support (older version were working only on the CPU): http://developer.amd.com/GPU/ATISTREAMS ... fault.aspx (it includes Linux support !)

NVIDIA release their own OpenCL implementation few weeks ago.

Yum, yum :idea:
User avatar
Dade
Developer
 
Posts: 8311
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: ATI released OpenCL SDK with hardware support

Postby tomb » Tue Oct 20, 2009 6:24 am

Cool, unfortunately it seems it's not quite ready for prime time yet http://forums.amd.com/devforum/messagev ... erthread=y

This is an interesting forum too, btw:
http://forum.beyond3d.com/forumdisplay.php?f=42

T
User avatar
tomb
Developer
 
Posts: 2677
Joined: Thu Oct 11, 2007 4:23 pm
Location: Oslo, Norway

Re: ATI released OpenCL SDK with hardware support

Postby Dade » Tue Nov 24, 2009 4:07 pm

I'm running the ATI OpenCL beta driver on Linux box (ATI HD4870):

Code: Select all
For test only: Expires on Sun Feb 28 00:00:00 2010
Number of platforms:             1
  Plaform Profile:             FULL_PROFILE
  Plaform Version:             OpenCL 1.0 ATI-Stream-v2.0-beta4
  Plaform Name:                ATI Stream
  Plaform Vendor:             Advanced Micro Devices, Inc.


  Plaform Name:                ATI Stream
Number of devices:             2
  Device Type:                CL_DEVICE_TYPE_CPU
  Device ID:                4098
  Max compute units:             4
  Max work items dimensions:          3
    Max work items[0]:             1024
    Max work items[1]:             1024
    Max work items[2]:             1024
  Max work group size:             1024
  Preferred vector width char:          16
  Preferred vector width short:          8
  Preferred vector width int:          4
  Preferred vector width long:          2
  Preferred vector width float:          4
  Preferred vector width double:       0
  Max clock frequency:             1596Mhz
  Address bits:                64
  Max memeory allocation:          1073741824
  Image support:             No
  Max size of kernel argument:          4096
  Alignment (bits) of base address:       1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                Yes
    Quiet NaNs:                Yes
    Round to nearest even:          Yes
    Round to zero:             No
    Round to +ve and infinity:          No
    IEEE754-2008 fused multiply-add:       No
  Cache type:                Read/Write
  Cache line size:             64
  Cache size:                65536
  Global memory size:             3221225472
  Constant buffer size:             65536
  Max number of constant args:          8
  Local memory type:             Global
  Local memory size:             32768
  Profiling timer resolution:          1
  Device endianess:             Little
  Available:                Yes
  Compiler available:             Yes
  Execution capabilities:            
    Execute OpenCL kernels:          Yes
    Execute native function:          No
  Queue properties:            
    Out-of-Order:             No
    Profiling :                Yes
  Platform ID:                0
  Name:                   Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
  Vendor:                GenuineIntel
  Driver version:             1.0
  Profile:                FULL_PROFILE
  Version:                OpenCL 1.0 ATI-Stream-v2.0-beta4
  Extensions:                cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store
  Device Type:                CL_DEVICE_TYPE_GPU
  Device ID:                4098
  Max compute units:             10
  Max work items dimensions:          3
    Max work items[0]:             256
    Max work items[1]:             256
    Max work items[2]:             256
  Max work group size:             256
  Preferred vector width char:          16
  Preferred vector width short:          8
  Preferred vector width int:          4
  Preferred vector width long:          2
  Preferred vector width float:          4
  Preferred vector width double:       0
  Max clock frequency:             750Mhz
  Address bits:                32
  Max memeory allocation:          134217728
  Image support:             No
  Max size of kernel argument:          1024
  Alignment (bits) of base address:       2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                No
    Quiet NaNs:                Yes
    Round to nearest even:          Yes
    Round to zero:             No
    Round to +ve and infinity:          No
    IEEE754-2008 fused multiply-add:       No
  Cache type:                None
  Cache line size:             0
  Cache size:                0
  Global memory size:             134217728
  Constant buffer size:             65536
  Max number of constant args:          8
  Local memory type:             Global
  Local memory size:             16384
  Profiling timer resolution:          1
  Device endianess:             Little
  Available:                Yes
  Compiler available:             Yes
  Execution capabilities:            
    Execute OpenCL kernels:          Yes
    Execute native function:          No
  Queue properties:            
    Out-of-Order:             No
    Profiling :                Yes
  Platform ID:                0
  Name:                   ATI RV770
  Vendor:                Advanced Micro Devices, Inc.
  Driver version:             CAL 1.4.467
  Profile:                FULL_PROFILE
  Version:                OpenCL 1.0 ATI-Stream-v2.0-beta4
  Extensions:               


Quite interesting, I have 4 CPU compute units (my Q6600) and 10 GPU units (the ATI HD4870). It looks like the units are 4-way SIMD and have a very limited amount of global memory (only 128MB) and even a more limited local memory (16K) ... I guess my old ATI is good for demo and not very much else.

Now I need only to finish my work of porting smallpt (http://www.kevinbeason.com/smallpt/) to OpenCL to have the feeling how fast/slow are the mysterious 10 GPUs :?:
User avatar
Dade
Developer
 
Posts: 8311
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: ATI released OpenCL SDK with hardware support

Postby psor » Tue Nov 24, 2009 9:37 pm

I'm very curious about your port. Looking forward to see some benchmarks in the future. ;o))




take care
psor
"The sleeper must awaken"
User avatar
psor
 
Posts: 293
Joined: Mon Oct 22, 2007 7:16 pm
Location: Berlin, GER

Re: ATI released OpenCL SDK with hardware support

Postby jeanphi » Wed Nov 25, 2009 3:15 am

Hi,

You can get extreme speeds with a CUDA path tracer, so I guess it's going to be the same with OpenCL. Unfortunately OpenCL apps currently segfault on my PC, so I'm not yet able to test it.

Jeanphi
jeanphi
Developer
 
Posts: 7943
Joined: Mon Jan 14, 2008 7:21 am

Re: ATI released OpenCL SDK with hardware support

Postby Dade » Wed Nov 25, 2009 3:48 am

jeanphi wrote:You can get extreme speeds with a CUDA path tracer, so I guess it's going to be the same with OpenCL. Unfortunately OpenCL apps currently segfault on my PC, so I'm not yet able to test it.


You are working under Linux now, right ? The process was quite simple in my case, I just downloaded http://developer.amd.com/Downloads/ati- ... a4-lnx.zip executed the installation script, rebooted, installed the SDK http://developer.amd.com/Downloads/ati- ... -lnx64.tgz and compiled the demo apps. May be it something related to your new 5xxx board.

Psor, I'm very curious too, it is exactly the point of this little scholastic exercise: making a comparison between a CPU and GPU implementation of the same program ... on my PC, not with some video on Youtube.
User avatar
Dade
Developer
 
Posts: 8311
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: ATI released OpenCL SDK with hardware support

Postby jeanphi » Wed Nov 25, 2009 4:09 am

Hi,

That's what I thought too, it might be an issue with my 5770 card.
Have you seen the video of VRay demo at the last SIGGRAPH? Pretty amazing stuff. I've also seen another GPU path tracer results, and it had confirmed the potential.

Jeanphi
jeanphi
Developer
 
Posts: 7943
Joined: Mon Jan 14, 2008 7:21 am

Re: ATI released OpenCL SDK with hardware support

Postby Dade » Wed Nov 25, 2009 10:19 am

jeanphi wrote:Have you seen the video of VRay demo at the last SIGGRAPH? Pretty amazing stuff. I've also seen another GPU path tracer results, and it had confirmed the potential.


Yup, even today I was looking at http://www.youtube.com/watch?v=RxkBq2_e ... re=channel and http://www.youtube.com/watch?v=z5WyOQe4 ... re=channel

It is impressive, very impressive ... but than I read it has been done with 15 Tesla boards ... I want see how this stuff runs on a normal PC :?:
User avatar
Dade
Developer
 
Posts: 8311
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: ATI released OpenCL SDK with hardware support

Postby Dade » Sun Nov 29, 2009 7:13 am

Dade wrote:I want see how this stuff runs on a normal PC :?:


Short version

omg....

Long version

I ported smallpt to C, converted from double usage to float, changed from a recursive implementation to an iterative one, etc. but I was too impatient so I decided to rush for a simpler test while porting smallpt: Mandelbrot set. Yeah, I have a huge fantasy. I wrote the first implementation.
First impression was "Wow, it works", second was "wtf, it is slow". It was just a bit faster than a single-core implementation on the CPU. Scratched my head for a while than I realized that the code I copied from ATI's samples is quite misleading and I was using only one OpenCL "local thread" for each compute unit instead of 256. Flipped the switch and BOOM !

The test

For testing pourpuse I run at 1024x768 with an insane value for maximum number of iterations: 10000.

mandelCPU

This is just a simple mono-thread CPU implementation (no OpenCL involved). Result:

Rendering time: 9.630000 secs (Sample/sec 81665 Max. Iterations 10000)

mandelGPU (on CPU)

This is the OpenCL implementation using only the CPU device. Result:

For test only: Expires on Sun Feb 28 00:00:00 2010
OpenCL Device 0: Type = TYPE_CPU
OpenCL Device 0: Name = Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
OpenCL Device 0: Compute units = 4
Reading file 'rendering_kernel.c' (size 996 bytes)
Rendering time: 9.800000 secs (Sample/sec 80248 Max. Iterations 10000)


It uses the 4 cores but it has the same performance of mandelCPU (with only one core). I guess CPU devices are useful only for developing purpose (i.e. when you don't have a fast GPU available).

mandelGPU (on GPU)

This is the OpenCL implementation using only the GPU device. Result:

For test only: Expires on Sun Feb 28 00:00:00 2010
OpenCL Device 0: Type = TYPE_GPU
OpenCL Device 0: Name = ATI RV770
OpenCL Device 0: Compute units = 10
Reading file 'rendering_kernel.c' (size 996 bytes)
Rendering time: 0.340000 secs (Sample/sec 2313035 Max. Iterations 10000)


What the hell ... it is 38 time faster than the single-thread CPU implementation ! MandelGPU is quite amazing to use. I recorded a small video (sorry for the low quality) just to give you the idea of how fast it is: http://vimeo.com/7876686

This stuff rocks, looking forward to how smallpt will run :o
User avatar
Dade
Developer
 
Posts: 8311
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: ATI released OpenCL SDK with hardware support

Postby SATtva » Sun Nov 29, 2009 7:38 am

I'm watching this thread... closely. ¬__¬
Linux builds packager
聞くのは一時の恥、聞かぬのは一生の恥
User avatar
SATtva
Developer
 
Posts: 7162
Joined: Tue Apr 07, 2009 12:19 pm
Location: from Siberia with love

Next

Return to GPU Acceleration

Who is online

Users browsing this forum: Yahoo [Bot] and 3 guests