... been looking at some R2R diy dacs that mainly just use 74xx and resistors to replicate sound from a signal but if other bits where there to it should contaminate the sound.
many different sources, a lot of conversion techniques is used
I2S format is still frequently used (converting from USB to I2S is complicated, here I suggest you take a look at the XMOS documentation and sources, tons of libraries) ... or take it just as is and skip right to the I2S signal ...
the simpliest device for conversion from USB to I2S in 16bits/48kHz (or 44.1kHz) is PCM2706/7:
http://www.ti.com/lit/ds/symlink/pcm2706.pdf
PCM2706 outputs standard I2S data:
48kHz/16bit I2S:
LRCK (WCLK) is word clock, the 48kHz frequency for this case
1 LRCK clock consist of 64 data bits (32 bits for LEFT and 32 bits for RIGHT), but only 16 bits for left and 16 bits for the right channel is used in this case, take a look where the MSB and LSB bit is located, MSB is the most significant bit (the greatest voltage (or current) point)
BCK is just a bit clock, in this case 64 bits (in one LRCK clock) x 48kHz = 3.072MHz
1kHz is 1000 sine waves per 1 second ... x2 (1000 for left channel and 1000 for right channel, cyclized by LRCK) .. one SECOND for left channel consists of 48000 samples (LRCK clocks), it follows that a 1kHz sine wave (0.001 second) is created from 48000/1000=48 voltage points, 10Hz sine wave is created from 48000/10=4800 voltage points, and 2.4kHz is 48000/2400=20 points as on this picture, each step in the sine is 1 LRCK cycle (Bits converted to a voltage value):
imagine that two waves are created together:
LRCK1 up=left value1, LRCK1 down= right value1, LRCK2 up=left value2, LRCK2 down=right value2, with a buffer can be achieved that these values are "latched" at once, then left + right channel are not shifted by 10ns, but 10ns is negligible, or is not?
there are various oversampling or upsampling techniques, trying to improve the sound (the wave) by counting these "missing" points to create smoother transitions between them,
Upsampling vs. Oversampling for Digital Audio | Audioholics
it just gets a room for additional calculation between samples, and uses the space for the application of different filters and calculations ... the problem arises when sound is more complex and not just a sine, which is in reality, and these calculations can subsequently transform the sound (known audible digital ringing in music using a FIR filter) ... that's why many people prefer "NOS" (not oversampling) - play it as it is without editing
then there are different converter architectures (delta-sigma, R2R, SAR(in ADC), ...), all with its advantages and disadvantages, discussed a hundred times