The inclusion of the "current-compensation" resistor (in single-transistor circuit) causes VCE to decrease when the input current increases. With the choice of resistor value the steepness of VCE limiting can be controlled. In essence, the value of should be optimized so that only slight voltage variations occur around the optimal bias voltage setting.
With the addition of another transistor (usually in the form of Sziklai pair) the loop gain of the VBE multiplier circuit can be increased. The first transistor serves as the temperature sensor while the second transistor amplifies its effect. The technique can reduce the resistance slope (U versus I) of the multiplier but substantial results are gained (again) only after the inclusion of current-compensation resistor. The advantages are a high linearity regardless of input current and a broader tolerance for values of current compensation resistor.
The results of using various configurations follow approximately the curves shown in the attachment. The linear the curve is, the better. The Sziklai pair can provide very linear performance but unfortunately it is extremely unstable. Using Miller capacitance can compensate the poor behaviour somehow but this is just a band-aid.
I don’t like the 2-transistor configuration because any instability in bias servo can have dreadful consequences. The single-transistor with current compensation works well although its behaviour depends a lot on device parameters such as beta.
I don’t know about FETs and MOSFETs but I remember something along the lines that they are used with MOSFET output transistors because of their somewhat better parameter matching with those than what the ordinary BJTs have. I could be wrong, though.