ATI released OpenCL SDK with hardware support

Discussions related to GPU Acceleration in LuxRender

Moderators: Dade, jromang, tomb, coordinators

Re: ATI released OpenCL SDK with hardware support

Postby salvation » Mon Jan 11, 2010 3:55 pm

hi
i tried to run SmallptGPU 1.6 but i get the error "Failed to get OpenCL platform IDs" ?
why this?

I have GeForce 8800 GTS 640MB
driver 190.89 with opencl support
win7 64bit
salvation
 
Posts: 3
Joined: Sun Jun 29, 2008 2:01 pm

Re: ATI released OpenCL SDK with hardware support

Postby Dade » Mon Jan 11, 2010 7:38 pm

salvation wrote:hi
i tried to run SmallptGPU 1.6 but i get the error "Failed to get OpenCL platform IDs" ?
why this?

I have GeForce 8800 GTS 640MB
driver 190.89 with opencl support
win7 64bit


It usually happen when there is no OpenCL support available. I think you need to upgrade your NVIDIA drivers.

@Jens, I upload new version of smallptgpu with the capability to hand tune the workload on OpenCL devices at http://davibu.interfree.it/opencl/small ... alpha2.tgz

You can press 'n' or 'm' to select the OpenCL device and 'v' or 'b' to dynamically increase/decrease the workload. This should allow you to do some test with the Apple CPU+GPU support.
User avatar
Dade
Developer
 
Posts: 8404
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: ATI released OpenCL SDK with hardware support

Postby jensverwiebe » Tue Jan 12, 2010 2:00 am

@ mitchde, you must read a bit more backwards in this thread, i had posted all for OSX days ago: viewtopic.php?f=21&t=2947&start=260#p29466

This version is C++ and needs the c++ bindings from Kronos though and copy to the header in openCL-framework.

@ Dade: this version has the same problem as SmallLuxGPU, performs very poor on GPU but o.k. on CPU.
So balancing sets GPU to 1 % only.
I remember you recoded the way memory-objects work in smalptgpu 1.6 when peformance started to go up.( aka crashes disappeared )
Will take a look is that later too.
BTW: beyond3D forum reports the same.

Jens
User avatar
jensverwiebe
Developer
 
Posts: 3429
Joined: Wed Apr 02, 2008 4:34 pm

Re: ATI released OpenCL SDK with hardware support

Postby Dade » Tue Jan 12, 2010 3:43 am

I know what it is, it is the way memory objects are allocated (it is the only other change I have done). I will rollback to the old method but it is very strange, it is supposed to be an optimization, it is explicitly suggested in the NVIDIA documentation :?
User avatar
Dade
Developer
 
Posts: 8404
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: ATI released OpenCL SDK with hardware support

Postby jensverwiebe » Tue Jan 12, 2010 6:34 am

K, recoded the cl_mempointers to get back former performance for now.

SmallptGPU-v2.0alpha2_OSX:http://www.jensverwiebe.de/LuxRender/SmallptGPU-v2.0alpha2_OSX.zip
New version allows for manual balancing device-loads, see help-overlay how to use.

Code: Select all
diff /Volumes/Daten250GB/openCL-testing/SmallptGPU-v2.0alpha2/Makefile /Volumes/Daten250GB/openCL-testing/SmallptGPU-v2.0alpha2_OSX/Makefile
8,9c8,9
< CCFLAGS=-O3 -ftree-vectorize -msse -msse2 -msse3 -mssse3 -fvariable-expansion-in-unroller -Wall \
<    -I$(ATISTREAMSDKROOT)/include -L$(ATISTREAMSDKROOT)/lib/x86_64 -lglut -lOpenCL -lboost_thread-gcc43-mt-1_39
---
> #CCFLAGS=-O3 -ftree-vectorize -msse -msse2 -msse3 -mssse3 -fvariable-expansion-in-unroller -Wall \
> #   -I$(ATISTREAMSDKROOT)/include -L$(ATISTREAMSDKROOT)/lib/x86_64 -lglut -lOpenCL -lboost_thread-gcc43-mt-1_39
11,12c11,12
< #CCFLAGS=-O3 -ftree-vectorize -msse -msse2 -msse3 -mssse3 -undefined dynamic_lookup -fvariable-expansion-in-unroller \
< #   -cl-fast-relaxed-math -cl-mad-enable -Wall -framework OpenCL -framework OpenGl -framework Glut
---
> CCFLAGS=-O3 -ftree-vectorize -msse -msse2 -msse3 -mssse3 -undefined dynamic_lookup -fvariable-expansion-in-unroller \
>    -cl-fast-relaxed-math -cl-mad-enable -Wall -framework OpenCL -framework OpenGl -framework Glut -lboost_thread-xgcc40-mt-1_39
Only in /Volumes/Daten250GB/openCL-testing/SmallptGPU-v2.0alpha2_OSX: OSX.txt
Only in /Volumes/Daten250GB/openCL-testing/SmallptGPU-v2.0alpha2_OSX: diff_alpha2.txt
diff /Volumes/Daten250GB/openCL-testing/SmallptGPU-v2.0alpha2/renderdevice.cpp /Volumes/Daten250GB/openCL-testing/SmallptGPU-v2.0alpha2_OSX/renderdevice.cpp
86,88c86,88
<             CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,
<             sizeof(Camera),
<             camera);
---
>             CL_MEM_READ_ONLY,
>                 sizeof(Camera));
>
92,94c92,93
<          CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,
<          sizeof(Sphere) * sphereCount,
<          spheres);
---
>          CL_MEM_READ_ONLY,
>          sizeof(Sphere) * sphereCount);
180,182c179,180
<          CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
<          sizeof(Vec) * workAmount,
<          colors);
---
>          CL_MEM_READ_WRITE,
>          sizeof(Vec) * workAmount);
186,188c184,185
<          CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR,
<          sizeof(unsigned int) * workAmount,
<          &pixels[workOffset]);
---
>          CL_MEM_WRITE_ONLY,
>          sizeof(unsigned int) * workAmount);
199,201c196,197
<          CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
<          sizeof(unsigned int) * workAmount * 2,
<          seeds);
---
>          CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR,
>          sizeof(unsigned int) * workAmount * 2, seeds);
Common subdirectories: /Volumes/Daten250GB/openCL-testing/SmallptGPU-v2.0alpha2/scenes and /Volumes/Daten250GB/openCL-testing/SmallptGPU-v2.0alpha2_OSX/scenes
Binary files /Volumes/Daten250GB/openCL-testing/SmallptGPU-v2.0alpha2/smallptGPU and /Volumes/Daten250GB/openCL-testing/SmallptGPU-v2.0alpha2_OSX/smallptGPU differ



EDIT: turned the diff other way round :oops:

Tested various balancings, but as i have only 1 gpu can´t test the balancing between different gpu, cpu/gpu stll lacks as before, getting same speed as with each type alone.

Jens
Last edited by jensverwiebe on Tue Jan 12, 2010 8:38 am, edited 2 times in total.
User avatar
jensverwiebe
Developer
 
Posts: 3429
Joined: Wed Apr 02, 2008 4:34 pm

Re: ATI released OpenCL SDK with hardware support

Postby ryan » Tue Jan 12, 2010 7:00 am

Just for info a new unbiased renderer using the GPU is being developed called Octane Render. Works on Nividia cuda only at the moment but the results look quite impressive (they claim 10 to 15x speed up with a single GPU). Video here:

http://www.refractivesoftware.com/videos.html
ryan
 
Posts: 16
Joined: Wed May 27, 2009 6:04 am

Re: ATI released OpenCL SDK with hardware support

Postby jensverwiebe » Tue Jan 12, 2010 7:09 am

@ryan, this is a luxrenderthread, please don´t pollute here with offtopic posts, we just discuss lux-development tasks here.

Jens
User avatar
jensverwiebe
Developer
 
Posts: 3429
Joined: Wed Apr 02, 2008 4:34 pm

Re: ATI released OpenCL SDK with hardware support

Postby Dade » Tue Jan 12, 2010 7:37 am

ryan wrote:Just for info a new unbiased renderer using the GPU is being developed called Octane Render. Works on Nividia cuda only at the moment but the results look quite impressive (they claim 10 to 15x speed up with a single GPU).


Ryan, there is already an ongoing discussion about Octane here: viewtopic.php?f=17&t=3317
User avatar
Dade
Developer
 
Posts: 8404
Joined: Sat Apr 19, 2008 6:04 pm
Location: Italy

Re: ATI released OpenCL SDK with hardware support

Postby mitchde » Tue Jan 12, 2010 8:14 am

Thanks JENS for OS X 2b2 !!!!
Runs well on my 8800GTX
mitchde
 
Posts: 256
Joined: Fri Dec 25, 2009 2:13 am

Re: ATI released OpenCL SDK with hardware support

Postby jensverwiebe » Tue Jan 12, 2010 10:23 am

Dade

Had succes with smallLuxGPU for OSX :)

Not optimal yet ( maybe take a look ) but it works now on GPU too:

smallLuxGPU_OSX.png
hehehe :)


The diff so far:
Code: Select all
Only in /Volumes/Daten250GB/openCL-testing/smallluxGPU_OSX: OSX.txt
diff /Volumes/Daten250GB/openCL-testing/smallluxGPU_OSX_broken/bvh_kernel.cl /Volumes/Daten250GB/openCL-testing/smallluxGPU_OSX/bvh_kernel.cl
63,64c63,64
<       __constant Point *verts,
<       __constant Triangle *tris) {
---
>       __global Point *verts,
>       __global Triangle *tris) {
67,69c67,69
<    __constant Point *p0 = &verts[tris[currentIndex].v[0]];
<    __constant Point *p1 = &verts[tris[currentIndex].v[1]];
<    __constant Point *p2 = &verts[tris[currentIndex].v[2]];
---
>    __global Point *p0 = &verts[tris[currentIndex].v[0]];
>    __global Point *p1 = &verts[tris[currentIndex].v[1]];
>    __global Point *p2 = &verts[tris[currentIndex].v[2]];
177,178c177,178
<       __constant Point *verts,
<       __constant Triangle *tris,
---
>       __global Point *verts,
>       __global Triangle *tris,
181c181
<       __constant BVHAccelArrayNode *bvhTree) {
---
>       __global BVHAccelArrayNode *bvhTree) {
Common subdirectories: /Volumes/Daten250GB/openCL-testing/smallluxGPU_OSX_broken/core and /Volumes/Daten250GB/openCL-testing/smallluxGPU_OSX/core
diff /Volumes/Daten250GB/openCL-testing/smallluxGPU_OSX_broken/path.cpp /Volumes/Daten250GB/openCL-testing/smallluxGPU_OSX/path.cpp
142c142
<          CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,
---
>          CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR,
147c147
<          CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR,
---
>          CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR,
152c152
<          CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,
---
>          CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR,
157c157
<          CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,
---
>          CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR,
162c162
<          CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,
---
>          CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR,
Common subdirectories: /Volumes/Daten250GB/openCL-testing/smallluxGPU_OSX_broken/plymesh and /Volumes/Daten250GB/openCL-testing/smallluxGPU_OSX/plymesh
Common subdirectories: /Volumes/Daten250GB/openCL-testing/smallluxGPU_OSX_broken/scenes and /Volumes/Daten250GB/openCL-testing/smallluxGPU_OSX/scenes
Binary files /Volumes/Daten250GB/openCL-testing/smallluxGPU_OSX_broken/smallluxGPU and /Volumes/Daten250GB/openCL-testing/smallluxGPU_OSX/smallluxGPU differ


Will try now some variations with the mem-pointers ( CL_MEM_COPY_HOST_PTR, CL_MEM_USE_HOST_PTR,CL_MEM_ALLOC_HOST_PTR )

EDIT: Exchanging the pointerbahaviour from USE to COPY works for smallptgpu too without any further changes !!!!!
New diff for smallptgpu ( only the renderdevice.cpp part changed ):
Code: Select all
diff /Volumes/Daten250GB/openCL-testing/SmallptGPU-v2.0alpha2/renderdevice.cpp /Volumes/Daten250GB/openCL-testing/SmallptGPU-v2.0alpha2_OSX/renderdevice.cpp
86c86
<             CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,
---
>             CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
92c92
<          CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,
---
>          CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
180c180
<          CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
---
>          CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR,
186c186
<          CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR,
---
>          CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR,
199c199
<          CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
---
>          CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR,


Additional info about this:
When the CL_MEM_USE_HOST_PTR flag is set, the OpenCL implementation has the option of caching the data on the OpenCL device, but it keeps the buffers on the two devices synchronized; when that flag is not set, it always allocates the memory on the host device. When the CL_MEM_COPY_HOST_PTR flag is set, on the other hand, the OpenCL implementation allocates the buffer on the device. In either case, it is initialized from the data in host memory pointed to by the fourth parameter. If you set the CL_MEM_USE_HOST_PTR flag, you can force OpenCL to allocate the data on the host device by also specifying the CL_MEM_ALLOC_HOST_PTR option. You can use these options to initialize the memory buffer, to synchronize memory buffers, and to make data accessible to multiple applications. However, keep in mind that transferring data between devices is costly.

It is perfectly acceptable to create a buffer object without specifying a corresponding pointer to data on the host device. By providing the clCreateBuffer function with NULL values for the options and for the host pointer, you create a buffer object that is independent of any pointers on the host. If there is specific host data that you’d like to place in that buffer object, you can do so by enqueuing a command to write to the buffer object using the clEnqueueWriteBuffer function, as discussed in the following section, “Reading, Writing, and Copying Buffer Objects.”







Jens
User avatar
jensverwiebe
Developer
 
Posts: 3429
Joined: Wed Apr 02, 2008 4:34 pm

PreviousNext

Return to GPU Acceleration

Who is online

Users browsing this forum: No registered users and 1 guest