by cwichura » Wed Feb 22, 2012 2:54 pm
Riddle me this...
I have luxrender running on my (fairly beefy) laptop, as that's where I also use DAZ Studio+Reality to create scenes. At work, we have a server that sits idle a lot of the time, so I installed lux on it as well to be a render slave using luxconsole. Both machines are 64-bit Windows and running the Feb 4th build from the Weekly Test Build forum (most current in there at this time).
My understanding of the networking support is that the master machine contacts out to the slaves on whatever periodic interval is set (I used 30 min, FWIW) and requests (via a pull command of some sort) the current film file from each slave. It also connects to each slave on a fairly frequent base to grab their current log file. The first time it talks to a slave, a UUID gets assigned so that the slave knows which master it should accept commands from while a job is actively running. If the wrong UUID is specified for a command, the slave responds "BUSY".
So, on the slave machine, I started it with "luxconsole -s -W". I added -W so that I could always grab its film file, in the event the networking failed (since my experience so far has been that it fails quite frequently, even when on the same local LAN). Since the master in theory drives the show, polling all the slaves, my assumption was that if I disconnect the laptop while it is not talking to the slave (confirmed via a netstat right before pulling the Ethernet cable), it should be possible to sleep/hibernate the laptop and take it home (I don't like to leave it at the office, since we've had them go missing in the night a few times by stick-fingered cleaning staff) and then bring it in the next day, plug it back in, turn it back on, and it should be able to connect to the slave (which has been running all night) and it will have the correct UUID since luxrender was never actually shut down (just the OS suspended).
Well, it didn't work. The slave the next morning (which had done a nice job of cranking through around 800 S/p) reports BUSY when the master tries to talk to it. What's more, the film file is not being written with any real regularity. The last time it was written was at 4:40AM. It's now 1:40PM and it still hasn't updated the film file. I'm hoping that it might update it at 4:40PM (a 12 hour dump cycle), since 4:40PM is about the time it would have last had contact from the master the prior day, when I was packing up to head home. If it's really on a 12 hour dump cycle, then the -W option really isn't very useful for film file recovery (via manual luxmerge) in the event that the communications between the master and slave gets corrupted. The master's config file is set to write its film file once every 15 minutes, which I would have thought with the way slaves inherit everything from their masters, would have affected how often the slave writes its film file.
So 1) why did the slave reject the master when the UUID should have been preserved? 2) is there some way I can tell the luxconsole process to dump its film file arbitrarily (if this were Unix, then luxconsole responding to a HUP signal or somesuch would be easy -- not sure how best to 'signal' a Windows app). 3) what exactly is the frequency that -W is supposed to dump the film file out at on the slave, and is there any way to influence it? I know from the file last being dumped at 4:40AM (when the master laptop was still at home with no connectivity to the slave) that it must dump it on its own on some periodic basis, and not only when the master sends it a command to retrieve the current samples.
And a feature suggestion: It would be nice to see a command line option added to luxconsole (and I suppose luxrender as well) to have it automatically reduce itself to low priority. As it is now, the first thing I do whenever I start it up is go into task manager to manually lower its priority so it doesn't comepletely destroy the box.
Thanks!