I also think it would be interesting to see a comparison between Cuda running on Nvidia hardware compared to openCL running on Nvidia hardware. Further, running openCL on AMD hardware should be included in the comparison too.
In my understanding, openCL on Nvidia hardware is compiled down to Cuda before it is finally compiled down again. There's a lot less opportunities to fully optimize with this path. So, at the present time, AMD hardware is a great option.
I'm running Nvidia 580GTX hardware and at the time when I purchased it, I was trying to compare some AMD hardware, but I couldn't get the drivers to work correctly on my setup. So, ultimately, Nvidia hardware was faster for me
