KyungSoo wrote:I'm afraid I don't agree with that the optimal CPU/GPU ratio would be 3 : 7.
3:7 but you have to factor I was talking of a single core of a CPU and assuming a ratio between CPU and GPU performance of 1:1 (that isn't usually true). I mean if we factor the average 4-cores CPU and average GPU performances, it looks like we need 1xCPU to drive 3 GPUs with our current architecture (your tests seems confirm this number too). I think it fits well the average Luxrender users.
KyungSoo wrote:Could you let me know the result of a profiling session of a single GPU thread, too?
This the data of the thread running on the CPU:
- Code: Select all
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
55.29 34.41 34.41 171963320 0.00 0.00 Path::AdvancePath(Scene*, Sampler*, RayBuffer const*, SampleBuffer*)
22.27 48.26 13.86 4020 0.00 0.00 PathIntegrator::FillRayBuffer(RayBuffer*)
9.05 53.89 5.63 572215008 0.00 0.00 RandomSampler::GetLazyValue(Sample*)
4.21 56.51 2.62 4015 0.00 0.01 PathIntegrator::AdvancePaths(RayBuffer const*)
4.07 59.05 2.54 57712772 0.00 0.00 Path::Init(Scene*, Sampler*)
3.37 61.15 2.10 57712871 0.00 0.00 RandomSampler::GetNextSample(Sample*)
0.79 61.64 0.49 45 0.01 0.01 std::deque<RayBuffer*, std::allocator<RayBuffer*> >::_M_reallocate_map(unsigned long, bool)
0.22 61.78 0.14 1 0.14 0.23 QBVHAccel::BuildTree(unsigned int, unsigned int, unsigned int*, BBox*, Point*, BBox const&, BBox const&, int, int, int)
0.10 61.84 0.06 5291168 0.00 0.00 Union(BBox const&, Point const&)
0.05 61.87 0.03 11198604 0.00 0.00 Union(BBox const&, BBox const&)
Just to give an idea of the times: 9% spent in "RandomSampler::GetLazyValue(Sample*)" is just a call to the Tausworthe random number generator.
