That's right, the 3 volts is needed to just bring the output stage to the point of conduction. In a circuit like the sim the actual voltage needed is super critical and depends on the models used.
In a real world build it is actually just as critical but again the actual voltage will be different for all different transistor types.
You might like this
The simulation has been set to run for 30 seconds and the bias current is set at zero for the first ten seconds, then it increases to 0.6milliamps from ten seconds to twenty second and finally increases again from twenty to thirty seconds.
You can see that bias change in the first image. The sim is showing the emitter current of one of the output transistors.
We now do a neat trick... we can record the sine wave into an audio file

and that is posted here in the zipped folder. It is just an MP3 file that plays for 30 seconds and you can hear the change in the 'tone' as the crossover distortion becomes less and less.
So you can have a listen
The second image shows the audio output and the bias changeover points.
The third, fourth and fifth images show what the audio looks like (as on a scope) at each bias value.