Okay finished SSE implementation and testing for base geometric classes : all is working but finally bad news
When using it on Point and Vector classes, realignment and shuffles kills the speedup, so it's useless here.
When using it on matrix muls, i see a 2/4% difference on my Athlon X2 in 64bit mode, sometimes better, sometimes worse, depending the scene being rendered.
Conclusion :
-Matrix muls/transformations aren't the main performance bottleneck in Lux
Or
-My SSE implementation sucks

(my muls should be 100% faster than the original ones)
I will profile Lux this week to see where it REALLY looses time, instead of optimizing where I THINK it looses time

I will also upload my changes on the CVS for testing on other processors (maybe we will notice something better on an Intel Core)