does this mean this technique could be used to implement a 'digital volume control'?

if so, would be much easier than what I had planned...

Yes, but with coarse 6 dB steps. And the more attenuation you want, the more logic you need to get it. Plus the logic to select 1-of-N attentuation levels. It gets messy fast.

still not quite understanding the logic though

You need to understand the 2's complement representation of signed integers. Look it up in Wikipedia if you have to.

In 2's complement, 1 is 1, 0 is 0, etc., but -1 is "all ones", so for a 16 bit number, -1 is b11111111. Since -2 is -1 subtract 1, it is b11111110, and -3 is b11111101, etc. Note that the most significant bit becomes the sign bit (1 for negative, 0 for positive).

Consider the represention of -1 as a 12 bit number. It is b111111111111. So how do you convert an 8-bit -1 to a 12-bit -1? Well, you can't pad it with (12-8= 4) zeros, because then you get b000011111111 which is +255. And you can't just pad it with 4 ones because then a positive number becomes negative. Instead, you need to copy the sign bit 4 more times.

In fcserei's system, each channel uses a 32-bit 2's complement number per sample. The 16 MSbs are used for the 16 bit samples, with the 16 LSbs essentially unused. To attenuate the samples, he shifted the 16 bits some amount toward the LSb side (by delaying the word clock). To maintain proper 2's complement signage, he then needed to duplicate the sign bit and fill it into the 2 new MSb bits prior to the 16 bit data.