as I guess you know, adding contributions to the Film ("splatting") is a pretty bad bottleneck, especially with GPU rendering. Contributions that are splatted are either really close by or separated by a fair amount. As such I think we should exploit this by tiling the buffers. This way each tile could be splatted to separately.
Essentially we divide the film into "tiles", and when calling AddSample we pass the tile index, making sure that the AddSample code only writes to the specific tile. The splatting buffer pool is indexed by tile, with a splatting lock per tile. So far I've used horizontal strips as tiles, which seem to work well. The main sample count is updated via an atomic add.
The primary difficulty is handling contributions which cross tile boundaries. My current idea is to assign these to a "special tile", which represents the whole film. Due to the double-buffering system in the contribution pool, locking the entire film every now and then shouldn't be a huge issue I think. The alternative is to add such contributions to both tiles. The code makes sure a tile is larger than 2x filter width, so a contribution can't span more than two tiles.
As you can see in the image below, it does indeed solve the CPU contention. You'll also observe that I have a small bug near tile borders
May contain traces of nuts.