So with the new weekly build, things are a bit different, but still bad.
Started a slave with 'luxconsole -V -s -W'. Started up a render on the master with 'luxconsole -V -i 300 -u 192.168.41.150 scene.lxs'. Waited for it to get to the point where it was actually transmitting the film over the wire (it takes a while to first write the film to disk). I then pulled the network uplink cable from my switch to the core switch (so neither the slave or the master sees their own network interface change state, but the network link between them is broken).
I didn't get the usual IO exception (failbit set) exception on the master, though that might be because I was using luxconsole for the verbose output where the master is usually running the full LuxRender interface. After waiting a good 30 seconds for the master to report the failure, I gave up and plugged the network cable back in.
On the master, it still had the (now dead) TCP session showing in netstat. On the slave, the TCP session is gone; it's not listed in netstat. The slave console output at the end is (starting from when it gets the initial get film request):
- Code: Select all
[Lux 2012-May-14 10:16:13 DEBUG : 0] Server receiving commands...
[Lux 2012-May-14 10:16:13 DEBUG : 0] ... processing command: 'luxGetFilm'
[Lux 2012-May-14 10:16:13 DEBUG : 0] Validating SID: e3670bcd-58f6-41af-9854-e4359f67d766 = e3670bcd-58f6-41af-9854-e4359f67d766
[Lux 2012-May-14 10:16:13 INFO : 0] Transmitting film samples
[Lux 2012-May-14 10:16:13 DEBUG : 0] Writing film samples to file 'server_resume_18018.flm.temp'
[Lux 2012-May-14 10:16:13 DEBUG : 0] Transmitting film (little endian=true)
[Lux 2012-May-14 10:16:17 DEBUG : 0] Transmitted 2.49542e+007 samples for buffer group 0 (buffer config size: 2)
[Lux 2012-May-14 10:16:18 INFO : 0] 00:05:06 - 16 T: 6.74 S/p 81.51 kS/s 114.52 kC/s 141% Eff
[Lux 2012-May-14 10:16:22 DEBUG : 0] Transmitted 2.52951e+007 samples for buffer group 1 (buffer config size: 2)
[Lux 2012-May-14 10:16:23 INFO : 0] 00:05:11 - 16 T: 6.84 S/p 81.39 kS/s 114.36 kC/s 140% Eff
[Lux 2012-May-14 10:16:26 DEBUG : 0] Transmitted 2.56328e+007 samples for buffer group 2 (buffer config size: 2)
[Lux 2012-May-14 10:16:28 INFO : 0] 00:05:16 - 16 T: 6.94 S/p 81.30 kS/s 114.23 kC/s 141% Eff
[Lux 2012-May-14 10:16:31 DEBUG : 0] Transmitted 2.59779e+007 samples for buffer group 3 (buffer config size: 2)
[Lux 2012-May-14 10:16:33 INFO : 0] 00:05:21 - 16 T: 7.05 S/p 81.21 kS/s 114.10 kC/s 140% Eff
[Lux 2012-May-14 10:16:36 DEBUG : 0] Transmitted 2.63251e+007 samples for buffer group 4 (buffer config size: 2)
[Lux 2012-May-14 10:16:38 INFO : 0] 00:05:26 - 16 T: 7.15 S/p 81.12 kS/s 113.96 kC/s 140% Eff
[Lux 2012-May-14 10:16:40 DEBUG : 0] Transmitted 2.66566e+007 samples for buffer group 5 (buffer config size: 2)
[Lux 2012-May-14 10:16:43 INFO : 0] 00:05:31 - 16 T: 7.25 S/p 81.03 kS/s 113.83 kC/s 140% Eff
[Lux 2012-May-14 10:16:45 DEBUG : 0] Transmitted 2.69997e+007 samples for buffer group 6 (buffer config size: 2)
[Lux 2012-May-14 10:16:48 INFO : 0] 00:05:36 - 16 T: 7.35 S/p 80.94 kS/s 113.70 kC/s 140% Eff
[Lux 2012-May-14 10:16:53 INFO : 0] 00:05:41 - 16 T: 7.45 S/p 80.85 kS/s 113.56 kC/s 140% Eff
[Lux 2012-May-14 10:16:58 INFO : 0] 00:05:46 - 16 T: 7.55 S/p 80.76 kS/s 113.45 kC/s 140% Eff
[Lux 2012-May-14 10:17:03 INFO : 0] 00:05:51 - 16 T: 7.65 S/p 80.68 kS/s 113.32 kC/s 140% Eff
[Lux 2012-May-14 10:17:08 INFO : 0] 00:05:56 - 16 T: 7.76 S/p 80.60 kS/s 113.20 kC/s 140% Eff
[Lux 2012-May-14 10:17:13 INFO : 0] 00:06:01 - 16 T: 7.86 S/p 80.52 kS/s 113.10 kC/s 140% Eff
[Lux 2012-May-14 10:17:18 INFO : 0] 00:06:06 - 16 T: 7.96 S/p 80.43 kS/s 112.98 kC/s 140% Eff
[Lux 2012-May-14 10:17:23 INFO : 0] 00:06:11 - 16 T: 8.06 S/p 80.35 kS/s 112.87 kC/s 140% Eff
[Lux 2012-May-14 10:17:28 INFO : 0] 00:06:16 - 16 T: 8.16 S/p 80.27 kS/s 112.75 kC/s 140% Eff
[Lux 2012-May-14 10:17:33 INFO : 0] 00:06:21 - 16 T: 8.26 S/p 80.20 kS/s 112.66 kC/s 140% Eff
[Lux 2012-May-14 10:17:38 INFO : 0] 00:06:26 - 16 T: 8.36 S/p 80.13 kS/s 112.56 kC/s 140% Eff
[Lux 2012-May-14 10:17:43 DEBUG : 0] Transmitted a film with 1.81841e+008 samples
[Lux 2012-May-14 10:17:43 INFO : 0] Film transmission done (456648 Kbytes sent)
[Lux 2012-May-14 10:17:43 DEBUG : 0] Transmitting film samples from file 'server_resume_18018.flm'
[Lux 2012-May-14 10:17:43 INFO : 0] 00:06:31 - 16 T: 8.46 S/p 80.06 kS/s 112.46 kC/s 140% Eff
[Lux 2012-May-14 10:17:48 INFO : 0] 00:06:36 - 16 T: 8.56 S/p 80.01 kS/s 112.39 kC/s 140% Eff
[Lux 2012-May-14 10:17:53 INFO : 0] 00:06:41 - 16 T: 8.67 S/p 79.96 kS/s 112.31 kC/s 140% Eff
And then it keeps calculating. It never reports an error about the TCP session dying, or anything else besides the statistics updates once every 5 seconds. I let it run for ten minutes this way. The only output on the master was also the statistics updates. I know from experience running the full LuxRender GUI that the mutex the master grabs when fetching from slaves is still held, and blocks pretty much everything, even the tonemapper from updating the framebuffer.
I then manually killed the master (ctrl-C) and the dead TCP session is cleaned up. I cursor'd up to re-launch the master and this is as far as it gets:
- Code: Select all
C:\t>luxconsole -i 300 -V -u 192.168.41.150 reality_scene.lxs
[Lux 2012-May-14 10:29:45 INFO : 0] Lux version 1.0RC1 of May 11 2012 at 20:47:0
5
[Lux 2012-May-14 10:29:45 INFO : 0] Threads: 8
[Lux 2012-May-14 10:29:45 INFO : 0] Connecting server: 192.168.41.150:18018
At this point, a netstat on both the master and the slave shows the TCP session is established. But the slave never responds, and the master is hung. It doesn't even get to start loading the scene itself, since it's hung waiting for that initial response from the slave that is off in lala land.