Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0001005LuxRenderCorepublic2011-03-30 06:012013-05-20 12:54
Reportersramij 
Assigned ToDade 
PrioritynormalSeveritymajorReproducibilityhave not tried
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Version0.8RC2 
Target Version1.3Fixed in Version 
Summary0001005: OpenCL local work group size is not calculated correctly
DescriptionAt pathgpu.cpp:957, the benchmark queries the OpenCL sdk for the CL_KERNEL_WORK_GROUP_SIZE for this kernel. Later on, the returned value is used ("as is") as input to NDRange localworkgroupsize parameter. This is not right. because the benchmark needs to to provide as LocalWorkGroupSize a value which is divisible by the GlobalWorkSize, and it doesn't validate that whatever was returned by the SDK does divide the GlobalWorkSize.

In a good scenario, the benchmark needs to take the returned value X from the query CL_KERNEL_WORK_GROUP_SIZE, and calculate a new value Y such that:
((Y <= X) and (GLOBAL_WORK_SIZE % Y = 0) )

TagsNo tags attached.
Mercurial Changeset #
Requires Documentation UpdateNo
Requires Exporter Update
Attached Files

- Relationships

-  Notes
(0002895)
Dade (developer)
2011-04-06 05:30

You are talking about SLG, not LuxRender, right ? Or LuxMark ?

The workgroup size is always forced to 64 in all the example scenes and in any scene saved with the Blender exporter. The forced value can be changed by the user.

This because the returned suggested workgroup size by OpenCL has proven to be always not optimal (i.e. it always ends to provide worser performances than 64) in any test I have done.

64 is the usual workgroup size for any ATI hardware. NVIDIA could work with 32 but 64 works well there too.
(0002914)
sramij (reporter)
2011-04-09 23:59

No, i am talking about LuxMark benchmark. Actually i don't know what is SLG.
The OpenCL doesn't receive 64 as input to NDrange but it receives the same values previosly returned upon calling the query CL_KERNEL_WORK_GROUP_SIZE.
(0003921)
Dade (developer)
2013-05-16 03:33

I fixed this probable by rounding up the GLOBAL_WORK_SIZE to a multiple of WORKGROUP_SIZE. Indeed, the kernel includes already the code to do nothing if the global ID is out of the real GLOBAL_WORK_SIZE.
(0003922)
sramij (reporter)
2013-05-16 04:20

So why are you querying for KernelWorkGroupSize from the underlying framework?
(0003923)
jensverwiebe (developer)
2013-05-16 06:57
edited on: 2013-05-16 10:03

Hi Dade

I once forced the cpu workgroups to 1 for a good reason: there is a bug reporting intel cpu with 1024
possible workgroups. Somehow this results always in wrong work-items etc.. At least there was never
a benefit from having more than 1 wg.

As you did now i have a calculated size of 128 for my Xeon CPU which results again in only 1/10th
the speed i had before.

See some hints here: http://wiki.tiker.net/OpenCLOddities [^]

Quote: Apple, CPU - Only allows one work item per work group. (mapping to one thread per CPU)

This is atm still a valid fact at least on OSX/Apple, also the AMD sdk even as it allows for more workgroups per cpu, does not take any advatage from it.

If you have other experiences with newer sdk's, we should again use 1 workgroup per cpu on apple then. ( i have a codesnippet ready to get get the cpu count if needed ).


Jens

(0003927)
Dade (developer)
2013-05-17 01:18
edited on: 2013-05-17 01:23

Sramij, I query the framework for a valid workgroup size than I round up the task count (i.e. GLOBAL_WORK_SIZE) to valid number when queuing the kernel execution. What is the problem ?

An example of what I did: http://src.luxrender.net/luxrays/file/af1954657805/src/slg/engines/pathocl/pathoclthread.cpp#l1386 [^]

(0003929)
Dade (developer)
2013-05-17 01:21

Jens, the Apple driver shouldn't really return a workgroup size that lead to 1/10th of the performance. Said that, I agree that we can easily solve this problem with an "#ifdef __APPLE__" in order to have 1 as default workgroup size on apple platform for CPU devices.
(0003932)
jensverwiebe (developer)
2013-05-17 02:49
edited on: 2013-05-17 03:18

Dade,

The huge speedloss can be explained due slg4 based luxmark reduced my cpu benchmarks
significantly anyway ( around 1/5 slower ), but i never complained due it is all wip.

I should soon show up some numbers here ....


EDIT: render-hdr.cfg ( luxball )

- luxmark/slg4 opencl.cpu.workgroup.size = 1 -> 3336
- auto set ( 128 ? ) -> 484

- old luxmark 2.0b2 ( with my fix ): 4225

Jens

(0003937)
jensverwiebe (developer)
2013-05-19 06:15
edited on: 2013-05-19 06:16

I added an apple condtional and set wg to default 1 again.

Back to around 3300 in bench again, still wondering where the other
performance gets lost.

From 4200 to 3300 is still a significant speedloss ...

Jens


- Issue History
Date Modified Username Field Change
2011-03-30 06:01 sramij New Issue
2011-03-30 06:02 sramij Severity minor => major
2011-04-06 05:23 Dade Assigned To => Dade
2011-04-06 05:23 Dade Status new => assigned
2011-04-06 05:30 Dade Note Added: 0002895
2011-04-09 23:59 sramij Note Added: 0002914
2011-05-24 12:55 jeanphi Target Version => 1.0
2012-08-21 06:51 jeanphi Target Version 1.0 => 1.1
2012-08-21 10:16 jeanphi Target Version 1.1 =>
2013-02-25 05:22 jeanphi Target Version => 1.3
2013-05-16 03:33 Dade Note Added: 0003921
2013-05-16 03:33 Dade Status assigned => feedback
2013-05-16 04:20 sramij Note Added: 0003922
2013-05-16 04:20 sramij Status feedback => assigned
2013-05-16 04:20 sramij Status assigned => feedback
2013-05-16 06:57 jensverwiebe Note Added: 0003923
2013-05-16 07:00 jensverwiebe Note Edited: 0003923 View Revisions
2013-05-16 07:02 jensverwiebe Note Edited: 0003923 View Revisions
2013-05-16 07:03 jensverwiebe Note Edited: 0003923 View Revisions
2013-05-16 07:09 jensverwiebe Note Edited: 0003923 View Revisions
2013-05-16 07:12 jensverwiebe Note Edited: 0003923 View Revisions
2013-05-16 07:12 jensverwiebe Note Edited: 0003923 View Revisions
2013-05-16 07:14 jensverwiebe Note Edited: 0003923 View Revisions
2013-05-16 07:15 jensverwiebe Note Edited: 0003923 View Revisions
2013-05-16 10:03 jensverwiebe Note Edited: 0003923 View Revisions
2013-05-17 01:18 Dade Note Added: 0003927
2013-05-17 01:21 Dade Note Added: 0003929
2013-05-17 01:23 Dade Note Edited: 0003927 View Revisions
2013-05-17 02:49 jensverwiebe Note Added: 0003932
2013-05-17 02:49 jensverwiebe Note Edited: 0003932 View Revisions
2013-05-17 03:18 jensverwiebe Note Edited: 0003932 View Revisions
2013-05-19 06:15 jensverwiebe Note Added: 0003937
2013-05-19 06:15 jensverwiebe Note Edited: 0003937 View Revisions
2013-05-19 06:16 jensverwiebe Note Edited: 0003937 View Revisions
2013-05-20 12:54 Dade Status feedback => closed
2013-05-20 12:54 Dade Resolution open => fixed


Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker