A non-uniform partitioning scheme can allow for very low latency without being terribly inefficient. Here's a paper with some info on implementation.Got any links to how to dis digital ju ju in real time?
It's also possible to combine direct and FFT convolution for zero latency.
That's exactly the trickery I mentioned for low-latency fast convolution - single sample latency is perfectly possible, just not very simple to implement...