I'll use an analogy: lens making.
Now probably not very many people here are 'up' on this specialized field, but it has a lot of design criteria that taken general also have applicability in making multistage amplifiers.
It has been found that if economics is not much of a consideration, and also if brightest light transmission is not an absolute ideal, then the highest resolution that one can achieve with a given compound lens design basically maximizes when the number of individual lenses and lens-groups is also maximized. When each lens/group bends light the smallest, then the impact of its various aberrations is also minimized. And thus minimized, easiest to correct down wind.
However, like as with amplifiers real lenses are not 'free'. Neither are their curves: the deeper a curve, the more expensive. The more exotic the glass type (to counteract aberrations, say, the more expensive and rare. The more individual elements, the more expensive … and multiply so due to the whole contraption becoming heavier and heavier, requiring a more robust containment shell.
TAKING THE ANALOGY forward …
The 'best' performing amplifiers (with one very crucial exception) would use close to the most stages each amplifying the least … in order to move the signal from LINE levels (± 2 to 4 volts) to DRIVE levels (± 20 to 70 volts). In so doing, one could optimize each stage to local degenerative feedback (or even enhance it through coupling), so that it is as linear and “non-coloring” as possible. Maybe that gain of 30 to 50 decibels (using 20 log VO / VI as the decibel calculator) could be done in 3 dB steps!
Thing is, that again analogous to the build-a-fine-lens case, stages aren't free. At least in the tube world they're definitely not. So, we try to eek out a bit more gain from them than the optimum minimum. Economy. Convention. Prudence. Best Practices. Supporting math. All that.
The one exception from this idea is that unlike lenses (sort of), amplification stages each add a bit of uncorrectable noise to the signal stream. Yet, if you do the math, you find that the most important stage for squashing noise is the first stage. With reasonable stage-to-stage gain after the first stage, their TOTAL contribution is typically less than 10% of the total overall noise figure. So … designers have gotten accustomed to the idea of giving "stage 1" a bunch of gain.
Hence, why bypassed cathode resistors are much/commonly specified. To give more gain to the first stage, which in a way minimizes the subsequent stage(s) noise contributions. UNFORTUNATELY, this is also (in the lens example) like using substantially nonlinear, high refraction glass … which introduces the maximum single-stage asymmetries to the signal. We lovingly (cheekily?) call these second order distortion, and upsell it as being pleasant on the ear. Musical. Opens the sound stage. Insert weasel-words here.
But the truth is somewhat different: if - as you surmise - the first stage is not just intentionally hobbled thru NOT bypassing its cathode resistor, and it is further hobbled in gain at the expense of adding greater linearity thru local negative feedback, the impact on overall signal linearity is substantial. Yes, the subsequent gain stages have to be either(or both) more numerous, higher gain, more carefully designed for low-noise impact. But so what?
Once - about 20 years and 3 house-moves ago - I decided to cobble together an amplifier that had just this design criteria. It had over 8 gain stages, each having only the smallest gain hops. +3 dB to +6 dB was the criterion. It was also a polarity-symmetric amplifier, having the same devices amplifying both the normal and inverted signal, to induce complete distortion symmetry on the signal. No attempt was made to conserve power (so that input triodes and pentodes operated at high currents.), and little attempt was made to save money. (Well… not so: my revered uncle had saved over 2,500 tubes in various giant boxes, culled from hundreds of yard sales and going-out-of-business TV repair sales. So the tubes were free. It is a pity they disappeared during one of the three house moves.)
The result? Actually remarkable: huge dynamic range, linear, musical.
But it was just an experiment. Only 1 channel was ever built, and I never found the time to work out a different, alternative, and perhaps better other channel.
POINT IS? That from that multi-element lens-building optics example, comes an analogy that may be remarkably applicable to conventional multi-stage amplifier design.
GoatGuy