Dade wrote:KyungSoo wrote:Chiaroscuro wrote:That's interesting... one and two GPUs can be mostly utilized (although it still only says 85%?), but then at three the efficiency really starts to drop (individually; together they plateau) from there on (I wonder if the bus also gets busy?). Can you imagine if all 8 could be mostly utilized, you'd be getting over 12 million samples per second on that scene. Seems like it's going to be a challenge.
I think, it is possible, if LuxRay developers move some CPU task to GPU, something like rayBuffer feeding.
By removing the CPU dependency, CPU tasks will be done more efficiently, too.
While this is doable (and I have done it in SmallptGPU, btw, you could give a spin to SmallptGPU2 for a GPU-only Vs CPU+GPU comparison), it is not applicable to Luxrender so it is bit out of my scope. I can not port all the Luxrender code the GPU. Not only it would require an insane amount of time but it would not even work (too many materials/textures/light sources types, etc.). I'm looking for a solution where I can port the 1% of the code (i.e. only ray intersections code) and still have the 99% of the Classic Luxrender features.
Please, note, your system is highly asymmetrical (1CPU + 8GPU), it isn't really a surprise if a CPU+GPU architecture doesn't scale well there. Most users have just 1 CPU + 1 GPU or 1 CPU + 2x GPUs Your system configured with 2 CPUs would produce the awesome results Chiaroscuro was talking about (i.e. 12M samples/sec).
P.S. Are your 8xGPUs installed on the same PCIe 2.0 bus ? All 16x ? 8x ? I wonder too, like Chiaroscuro, if PCI bus performance have some influence. Both ATI and NVIDIA provide a sample application in their SDKs to evaluate PCI performance. It would be quite interesting to patch the application to work with multiple GPUs and to evaluate how the available PCI bus bandwidth changes with multiple GPUs.
I understood your near future plan, thank.
For the PCIe band width issue, of course overall performance depends on PCIe band width, too.
And, that is another reason why we should let GPU be independent ( work without feed from CPU ).
Following screen dump is made by all 16x slots machine, which has a high end CPU .