Hi,
I've just tried to compile everything for the first time and tried some gcc switch from my brand new arch linux installation. And since I have time to loose, I decided to benchmark (timed several demo scene until I reached a 100 S/P) every combination of switchs -flto, -fPIC, -fdata-sections, -ffunction-section. I checked that the output result is identical (with fixed seed) as the default / standard build options, although I believe that you are on the safe side as long as you don't use the -ffast-math and -Ofast
First thoughts:
Using gcc 4.6.1 + march=native (core2) give me a 2% speedup compared to regular luxrender available from this official site.
using -fPIC to compile luxconsole, luxrender give a 3% penalty in most cases (I understand it is mandatory for a shared lib for pylux, but why save some compile time to render time for luxconsole...)
using -flto + -fuse-linker-plugin + -fwholeprogram make the compilation takes ages, especially the link as the optimization are performed at the very end (you can even use -flto-partion=none for a slightly better code quality..), but the result is impressive: another +2% , but also a decrease of 36% of the size of the executable (8.7 MB to 5.7)
using -fdata-sections, -ffunction-section + Wl,--gc-sections (dead code elimination) at link reduce the size furthermore to 5.3 MB and another +1% performance (maybe lessen cache miss ??), although it should not have any impact of performances...
using -funroll-loops or finline-limit=10000 didn't do anything performance wise...
To sum up: gcc 4.6.1 + no -fPIC + -flto + -fdata-sections, -ffunction-section + Wl,--gc-sections give me roughly a +8% performance increase and a 40% executable size loss, not bad... But an awful increase in compile time and link time (didn't check it but certainly +500% )
BTW I tested only non-opencl code (no hybrid), only luxconsole/luxrender, with the some demos (including the two included in the archives), so I assume not every code paths have been traveled. I didn't see any artifact or fireflies, the output result was 100% the same as the default render, but this can happen very late in a render.
I can give you the raw number if you wish...
Regards,
