luxconsole suggestions

Discussion related to the implementation of new features & algorithms to the Core Engine.

Moderators: jromang, tomb, zcott, coordinators

luxconsole suggestions

Postby mtoivo » Fri Aug 26, 2011 4:59 pm

Hi.

You probably know me from my poular "making network rendering easier" -thread in the general discussion board. I've done some arrangements to help with hanging luxconsole processes, but I would like to suggest something.

First thing is the situation where master luxconsole process crashes. The slaves happily keep on processing the samples and wait for their now dead master to call back someday. But if the master restarts and tries to connecto to the slaves, they just say "I'm BUSY here can't you see, go away". I have to admit I havent looked much into the network code (which I should, I know), but I've concluded there's no constant connections from master to slaves. Master only connects to gather samples etc. Should the connection be permanent, I don't know, but if it were on all the time, the slaves could react instantly when the master disappears. At least it would be nice if some kind of timer would kill the processing if the master hasn't contacted slaves for a while. Do you agree?

Another issue in the network rendering is "over-sampling" that occurs more or less everytime. When master reaches the halt spp value, it could tell the slaves to stop too. You might think this is no big deal, because usually the sampling stops at least after all slaves have been contacted for the last sample gathering. But with a cluster of 64 slaves, that takes a while, I tell you. The rendering itself might be completed in ~10minutes with enough cores samping, but the over-samping can take the same time, if not more. This is not such a show stopper, because it can be overcame with different master/slave combinations, but an idea worth thinking of.

The last two things are about flatting a FLM to PNG. In LuxGUI, you can open plain FLM, but with luxconsole you can't. You have to have .lxs and the rest of the crew too. Is this absolutely necessary? I mean, if there's no need to sample anymore, just output the png. Why is this important? In my case, I'm merging many FLMs into one and then outputting a PNG from the result. Keeping the .lxs and the export directory just for the final .png output to work seems a bit dull. Also, when rendering said FLMs that are about to be merged into one, there's really no need to have png of those sub-FLMs. But if I turn png write off, the final ouput will not produce .png either, since the decision to do that is written in the .lxs somewhere.
mtoivo
 
Posts: 41
Joined: Sun Jul 25, 2010 4:20 pm

Re: luxconsole suggestions

Postby jeanphi » Sat Aug 27, 2011 3:31 am

mtoivo wrote:First thing is the situation where master luxconsole process crashes. The slaves happily keep on processing the samples and wait for their now dead master to call back someday. But if the master restarts and tries to connecto to the slaves, they just say "I'm BUSY here can't you see, go away". I have to admit I havent looked much into the network code (which I should, I know), but I've concluded there's no constant connections from master to slaves. Master only connects to gather samples etc. Should the connection be permanent, I don't know, but if it were on all the time, the slaves could react instantly when the master disappears. At least it would be nice if some kind of timer would kill the processing if the master hasn't contacted slaves for a while. Do you agree?

Worth looking into, but your idea will also have drawbacks: imagine you have a network failure (even a very small disconnect), by reacting instantly you might destroy hours of work.

mtoivo wrote:Another issue in the network rendering is "over-sampling" that occurs more or less everytime. When master reaches the halt spp value, it could tell the slaves to stop too. You might think this is no big deal, because usually the sampling stops at least after all slaves have been contacted for the last sample gathering. But with a cluster of 64 slaves, that takes a while, I tell you. The rendering itself might be completed in ~10minutes with enough cores samping, but the over-samping can take the same time, if not more. This is not such a show stopper, because it can be overcame with different master/slave combinations, but an idea worth thinking of.

That's because haltspp waits for a safe stop point, maybe an option to stop immediately is in order.

mtoivo wrote:The last two things are about flatting a FLM to PNG. In LuxGUI, you can open plain FLM, but with luxconsole you can't. You have to have .lxs and the rest of the crew too. Is this absolutely necessary? I mean, if there's no need to sample anymore, just output the png. Why is this important? In my case, I'm merging many FLMs into one and then outputting a PNG from the result. Keeping the .lxs and the export directory just for the final .png output to work seems a bit dull. Also, when rendering said FLMs that are about to be merged into one, there's really no need to have png of those sub-FLMs. But if I turn png write off, the final ouput will not produce .png either, since the decision to do that is written in the .lxs somewhere.

That's because opening a flm from the GUI is meant to adjust postprocessing from the render, for that you either need a GUI (which luxrender has) or a whole bunch of command line option (which luxconsole lacks). I think that doing that from the command line would be best handled with a completely different tool or integrated into luxmerger rather than integrated into luxconsole.

Jeanphi
jeanphi
Developer
 
Posts: 6624
Joined: Mon Jan 14, 2008 7:21 am

Re: luxconsole suggestions

Postby Abel » Sat Aug 27, 2011 7:46 am

jeanphi wrote:
mtoivo wrote:First thing is the situation where master luxconsole process crashes. [...] At least it would be nice if some kind of timer would kill the processing if the master hasn't contacted slaves for a while. Do you agree?

Worth looking into, but your idea will also have drawbacks: imagine you have a network failure (even a very small disconnect), by reacting instantly you might destroy hours of work.

I do like the idea of a time out for the slaves. On the other hand, in my setup an alternative way to deal with this situation would be to give the master the power to give a "drop whatever you're doing and listen to me" command; that way I wouldn't have to go to all the nodes to manually exit luxconsole and restart it in case something went wrong with the master.
User avatar
Abel
Developer
 
Posts: 1433
Joined: Sat Oct 20, 2007 8:13 am
Location: Helsinki, Finland

Re: luxconsole suggestions

Postby Lord Crc » Sat Aug 27, 2011 8:27 am

Abel wrote:I do like the idea of a time out for the slaves. On the other hand, in my setup an alternative way to deal with this situation would be to give the master the power to give a "drop whatever you're doing and listen to me" command; that way I wouldn't have to go to all the nodes to manually exit luxconsole and restart it in case something went wrong with the master.


Currently when a network rendering session is initiated, the master generates a "session key". It sends this session key to all the slaves each time it connects. If the slave is not currently in a session, it will store the session key, otherwise it will compare the session key and reject if the sent key doesn't match. This causes the issues described by mtoivo.

We could add a command line flag to the slave which changes the behavior when the session key doesn't match. Instead of rejecting the connection request it could instead abort the current session and accept the new one.
May contain traces of nuts.
User avatar
Lord Crc
Developer
 
Posts: 4518
Joined: Sat Nov 17, 2007 2:10 pm

Re: luxconsole suggestions

Postby Abel » Sat Aug 27, 2011 9:53 am

Lord Crc wrote:We could add a command line flag to the slave which changes the behavior when the session key doesn't match. Instead of rejecting the connection request it could instead abort the current session and accept the new one.

That would be really useful, but could also to lead to unwanted scenario's where two masters are both controlling a bunch of slaves: one could hijack each other's slaves and two masters could try to take control of the same slave.

Proposal:

-the slave has some kind of time out system and is in either of two states: "active" or "apparently abandoned"
-if the state is "apparently abandoned", any new master commands will result in accepting a new session
-if the state is " active", new masters will be ignored (or told something like "tough luck, I got better things to do" :) )
-the master has an "evil" mode, forcing the nodes to start with the new job even if they are in "active" mode. This is useful when restarting jobs with minor modifications, in which case one won't have to wait for the time out period to pass.
User avatar
Abel
Developer
 
Posts: 1433
Joined: Sat Oct 20, 2007 8:13 am
Location: Helsinki, Finland

Re: luxconsole suggestions

Postby J the Ninja » Sat Aug 27, 2011 9:56 am

Btw, this is a somewhat related issue with the "busy" status: http://www.luxrender.net/mantis/view.php?id=1095
-Jason

Material DB Admin
User avatar
J the Ninja
Developer
 
Posts: 2249
Joined: Wed May 19, 2010 9:54 pm
Location: Portland, USA

Re: luxconsole suggestions

Postby Lord Crc » Sat Aug 27, 2011 10:02 am

Abel wrote:That would be really useful, but could also to lead to unwanted scenario's where two masters are both controlling a bunch of slaves: one could hijack each other's slaves and two masters could try to take control of the same slave.


Is this anything but a remote corner case?
May contain traces of nuts.
User avatar
Lord Crc
Developer
 
Posts: 4518
Joined: Sat Nov 17, 2007 2:10 pm

Re: luxconsole suggestions

Postby J the Ninja » Sat Aug 27, 2011 10:04 am

Not necessarily, what if you have a studio or lab where multiple workstations have access to the same render farm?
-Jason

Material DB Admin
User avatar
J the Ninja
Developer
 
Posts: 2249
Joined: Wed May 19, 2010 9:54 pm
Location: Portland, USA

Re: luxconsole suggestions

Postby Lord Crc » Sat Aug 27, 2011 10:17 am

Abel wrote:-the master has an "evil" mode, forcing the nodes to start with the new job even if they are in "active" mode. This is useful when restarting jobs with minor modifications, in which case one won't have to wait for the time out period to pass.


What we could do is to optionally have the master add a "force" parameter when connecting, and the slave would only abandon the current scene if this "force" parameter is present, and forcing is enabled (ala my suggestion above). This could be exposed as a button in the gui or something ("Connect (forced)" fex). This should prevent "accidental" session takeover.
May contain traces of nuts.
User avatar
Lord Crc
Developer
 
Posts: 4518
Joined: Sat Nov 17, 2007 2:10 pm

Re: luxconsole suggestions

Postby J the Ninja » Sat Aug 27, 2011 10:32 am

Lord Crc wrote:
Abel wrote:-the master has an "evil" mode, forcing the nodes to start with the new job even if they are in "active" mode. This is useful when restarting jobs with minor modifications, in which case one won't have to wait for the time out period to pass.


What we could do is to optionally have the master add a "force" parameter when connecting, and the slave would only abandon the current scene if this "force" parameter is present, and forcing is enabled (ala my suggestion above). This could be exposed as a button in the gui or something ("Connect (forced)" fex). This should prevent "accidental" session takeover.


Isn't that what abel suggested originally?
-Jason

Material DB Admin
User avatar
J the Ninja
Developer
 
Posts: 2249
Joined: Wed May 19, 2010 9:54 pm
Location: Portland, USA

Next

Return to Architecture & Design

Who is online

Users browsing this forum: No registered users and 0 guests