Network Rendering Set-Up Times

Please use this forum for general user support and related questions.

Moderator: coordinators

Forum rules
Please include your operating system type/version, LuxRender version and Exporter version used when submitting a support post.

Make sure you have read the Release forum thread for Release and RC (Release Candidates) builds as these threads contain information on known problems and workarounds: Test Builds Forum

Network Rendering Set-Up Times

Postby Reggie68 » Sun Jun 03, 2012 4:35 am

Since 1.0 RC1, I've noticed that initial scene transmission can take up to 20 minutes. That's on a local 1GB network, where it used to take around 30 secs to transmit the same scene with a late 0.9 weekly.

The scenes are made up of individual ply files for each surface due to the way Reality exports by default. When watching the cli on teh server, it looks like the main node waits a second or so in between files. When you have over 500 seperate files, this can mean at least a 10 minute transmission and scene set-up time.
User avatar
Reggie68
 
Posts: 79
Joined: Tue Sep 14, 2010 2:35 pm

Re: Network Rendering Set-Up Times

Postby cwichura » Sun Jun 03, 2012 5:05 am

If you run it with verbose logging, you can see there is a lot of messages that you don't see without verbose (was the same for 0.9). If you run the slave that way, do you still see a pause between timestamps?

I'm also a Reality user with similar file counts to a scene, and the new setup logic typically takes about 2.5-3.5 minutes on local LAN, and about 7-8 when connected in remotely via VPN (assuming all the files are already cached on the slave, such as after restarting to tweak something). On local LAN in the office, that's similar/faster than what 0.9 was for me, and for VPN, it's WAY faster since it doesn't have to re-transmit all the files (and at this point, I've got a lot of my common textures, IBLs, etc., cached on the slaves from usage).

There is still the random slave crash on startup with the new network logic, though, which I haven't been able to figure out how to reliably reproduce. And it hasn't happened since I asked Lord CRC to include the .pdbs with the testing builds so that we can provide more detailed info when it does crash (assuming you have a debugger on the machine).

FWIW, I'm running the 2012-05-28 build from the weekly testing forum on Windows 7 64-bit.
cwichura
 
Posts: 351
Joined: Sun Feb 12, 2012 11:31 pm

Re: Network Rendering Set-Up Times

Postby Reggie68 » Sun Jun 03, 2012 6:55 am

Are you exporting as Binary PLY files or Lux Native? I've only noticed it with the latest 1.0 dev builds (2205 as IES doesn't work in 2805), I can't remember if it happened with the RC1.

If I export as Lux Native, then apart from the size of the lxo file, it isn't a real problem. It's when there are loads of small PLY format files to transfer, from 1K to 1M in size and it takes a second for each to transfer.
User avatar
Reggie68
 
Posts: 79
Joined: Tue Sep 14, 2010 2:35 pm

Re: Network Rendering Set-Up Times

Postby Lord Crc » Sun Jun 03, 2012 8:46 am

I've added some code to cache files on the slaves just before RC1. This involves more slave <-> master communication, and for some reason it seems it's not flushing properly somewhere. This means that when either the slave or the master (not sure which yet) sends a message to the other, it has to wait for a buffer time out before sending the packet. It's only a second or so per message but this adds up quickly.

I'm hunting it down but it's a bit tricky.
May contain traces of nuts.
User avatar
Lord Crc
Developer
 
Posts: 4451
Joined: Sat Nov 17, 2007 2:10 pm

Re: Network Rendering Set-Up Times

Postby cwichura » Sun Jun 03, 2012 1:07 pm

Reggie68 wrote:Are you exporting as Binary PLY files or Lux Native? I've only noticed it with the latest 1.0 dev builds (2205 as IES doesn't work in 2805), I can't remember if it happened with the RC1.

I use binary PLYs.
cwichura
 
Posts: 351
Joined: Sun Feb 12, 2012 11:31 pm

Re: Network Rendering Set-Up Times

Postby Lord Crc » Sun Jun 10, 2012 6:05 am

Seems my explicit flushes were not enough. The solution I've come up with is to disable the naggle algorithm entirely, which does seem to solve the issue.

This will cause a bit of extra packet traffic, hopefully it won't be too much of an issue over slower links.

I'll make a weekly build tonight, if you want to check it out.
May contain traces of nuts.
User avatar
Lord Crc
Developer
 
Posts: 4451
Joined: Sat Nov 17, 2007 2:10 pm

Re: Network Rendering Set-Up Times

Postby cwichura » Sun Jun 10, 2012 6:51 am

Lord Crc wrote:This will cause a bit of extra packet traffic, hopefully it won't be too much of an issue over slower links

Does the master wait for a confirmation after every message it sends (other than sending un-cached file data, which you chunk into 1meg writes anyway, so shouldn't be an issue for nagling bloating things)? If it's waiting for confirmation from the client after every message anyway, then I don't think nagling will add much overhead, since nagling wouldn't have had multiple packets to glom together in the first place.
cwichura
 
Posts: 351
Joined: Sun Feb 12, 2012 11:31 pm

Re: Network Rendering Set-Up Times

Postby jeanphi » Sun Jun 10, 2012 7:12 am

Hi,

You also have replaced all endl by "\n", was endl causing issues?

Jeanphi
jeanphi
Developer
 
Posts: 6573
Joined: Mon Jan 14, 2008 7:21 am

Re: Network Rendering Set-Up Times

Postby Lord Crc » Sun Jun 10, 2012 8:13 am

jeanphi wrote:You also have replaced all endl by "\n", was endl causing issues?


No, but endl causes an explicit flush, and I saw no point in that as I set up my code to flush when needed. It was the naggle algorithm which caused the issue adding a 200ms delay on each message, ie 400ms for a simple request + answer. I foolishly expected "flush" to well, flush...
May contain traces of nuts.
User avatar
Lord Crc
Developer
 
Posts: 4451
Joined: Sat Nov 17, 2007 2:10 pm


Return to LuxRender User Support

Who is online

Users browsing this forum: No registered users and 1 guest