Cics - Cplay and Cmp2

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
Hi there.

I 'd like to refer you to Cics from Audio Asylum if you are still interested in Microsoft XP based audio. Cics has just released a new Wiki related to CMP2 and Cplay.

He went deep down in the jungle of XP and PC HW to squeeze the best sound out of it.

I think he did a great job - just on the wrong operating system. ;)

Cics Wiki

There is some real good stuff over there.

After all the whole story is pretty much in line what I am doing under Linux.
 
I am sorry but just a quick look at the wiki shows numerous unsupported arguments regarding the latency http://cplay.sourceforge.net/pmwiki.php?n=CPlay.ASIOLatency

Quoting:
From a Jitter viewpoint, when a soundcard's buffer is populated (whilst the other buffer is converted to SPDIF or whatever), there's a burst of electrical activity. The idea is to keep this burst as short as possible thereby reducing interference to the soundcard's XO, i.e. reduce Jpp. We achieve this by setting latency to the lowest possible level. Of course, using such a low latency would mean more frequent buffer loads. This is the ASIO frequency (or ASIO Hz). At 32 samples latency for 96k output, ASIO Hz is 3kHz. This is now periodic in nature and is digitally induced. We now have Periodic Jitter - the worst kind which exists for all digital playback systems. ASIO gives us control over this.

This is not how PCI cards work. ASIO (or alsa) latency has nothing to do with filling the internal card buffer the author talks about, but with how frequently the card is given a new DMA address and the data buffer length.

ice1724.c:

Code:
	val = (8 - substream->runtime->channels) >> 1;
	outb(val, ICEMT1724(ice, BURST));

	outl(substream->runtime->dma_addr, ICEMT1724(ice, PLAYBACK_ADDR));

	size = (snd_pcm_lib_buffer_bytes(substream) >> 2) - 1;
	/* outl(size, ICEMT1724(ice, PLAYBACK_SIZE)); */
	outw(size, ICEMT1724(ice, PLAYBACK_SIZE));
	outb(size >> 16, ICEMT1724(ice, PLAYBACK_SIZE) + 2);
	size = (snd_pcm_lib_period_bytes(substream) >> 2) - 1;
	/* outl(size, ICEMT1724(ice, PLAYBACK_COUNT)); */
	outw(size, ICEMT1724(ice, PLAYBACK_COUNT));
	outb(size >> 16, ICEMT1724(ice, PLAYBACK_COUNT) + 2);

The driver tells Envy24HT the address in RAM where the data ring buffer starts, how long the buffer is (playback_size, corresponding to buffer size in alsa terms) and after how many bytes read from RAM over DMA to generate interrupt (playback_count, corresponding to period size in alsa terms). The interrupt tells the driver/alsa-lib that the current part of the buffer has been read and is available for rewriting with fresh data. Meanwhile, the card continues reading the rest of the ring buffer, i.e. playing the samples stored there, until DMA is stopped when the stream gets stopped by the application.

As everyone can read in Envy24HT datasheet, the chip itself has only an internal buffer of 12 samples x 8 channels x 24bits, i.e. 12 sample rate periods long. The only parameter which can change the frequency of reading from DMA (the argument the author uses for small latency!) is the burst size (register MT19) which is strictly tied to the number of played channels and absolutely unrelated to asio/alsa latency.

In fact, the latency controls length of the buffer and period, i.e. how long the card's DMA controller will be left to do its job of copying data from RAM to that tiny internal buffer without bothering the CPU for new address/buffer length information.

I do not belive that just telling the card where it finds its DMA buffer more than 1000 times a second can have a positive impact on the output jitter. On the contrary I would rather tell the card a few times a second and let it do its job.

That with keeping the periodic jitter above the PLL's cutoff would make sense if the information in the first part of the article was correct.

This just shows how many windows optimizations are based on plain assumptions of how things work. In linux we have the source code and datasheets available and do not have to just assume.
 
Last edited:
Just a correction - the interrupt from PCI card does not cause the driver tell the card new DMA parameters, just informs the driver that the card has finished reading a particular part of the DMA ring buffer by calling snd_pcm_period_elapsed(). E.g. see method snd_vt1724_interrupt in ice1724.c. It generally means under steady conditions the interrupt leads to no subsequent extra communication with the card.
 
His problem is that he tries to explain everything what causes this or that change to the sound.

I have no problem with that approach, in fact it is my preferred mode of operation. But the explanation should not be an artificially made-up argument fitting one's opinion, but a plausible and technically correct analysis.

This takes me back to my explanation of the PCI DMA communication above which is not technically correct and I feel ashamed.

When an application starts playback, it opens the device. Upon opening, the card is told parameters of the DMA ring buffer in RAM (starting address and buffer length) and the how many bytes to read before it generates an interrupt (period size). The driver fills the ring buffer with more than the period size of initial audio data. After that the card is told to start the DMA transfer - the actual playback.

When the card reaches the end of period size, it notifies the driver so that the driver knows current status of the playback. Meanwhile the card continues to read the data from the buffer. When it reaches the end of the buffer (it knows its length), it starts from the begining automatically.

The driver has to make sure the fresh data reach RAM always ahead of the card reading pointer. Therefore the buffer is almost always at least twice the period size - when the card reads one period segment, the driver has time to refresh data in the other period segment.

Now low latency means the buffer is very small, the driver is very little ahead of the card and the card keeps informing the driver very often. A very short delay in filling the buffer causes the card to consume all the fresh data, resulting in audible xruns. That is why the playback chain needs to run with real-time priority on RT kernels.

How such setup could improve the sound sonically is beyond my understanding (and belief).

The author of pulseaudio wrote a very illuminating description of his glitch-free technology http://0pointer.de/blog/projects/pulse-glitch-free.html . Please note the overall HW latencies he would like to reach - several seconds. And that is for good reasons - low CPU load, low consumption, few IRQs. Of course the actual latencies could be much smaller as the pulseaudio server would "inject" low-latency streams (games, midi, audio monitoring/editing) into the ring buffer just ahead of the sound card reading pointer.

This makes sense to me, not the low-latency push for quality audio.
 
This makes sense to me, not the low-latency push for quality audio.

What in the end makes sense is the solution which delivers best sound at a certain
point in time. This is my rt-kernel&ecasound&ramdisk buffering for now. ( More and more people are confirming this)

If one day in the far future pulseaudio will manage to beat my rtkernel-/ecasound setup I'll be happy to try it (once in a while I did try it) -- until than I remove it (which is quite a difficult task if you run Ubuntu).

If you read what he is saying you'll realize that the guy is tied to "make it work on all platforms" -- that's bad - sounds like compromises.

Number two: He is listing several problems which do exist. He is complaining about everything and the limitations of Alsa. I say: So what - Instead of writing such an animal as pulseaudio he could have supported Alsa or Jack to get things better.


If you would have listenened to ecasound on a real-time kernel you'd know what I am talking about.


Anyhow thx for the link. I've been there before.
 
Some more:
He is talking about timer limitations of 100Hz! Come on. I was running 10khz two years ago.
The rest of the audio world is running at 1000Hz for quite some time.

>> The glitch-free logic will only be enabled on mmap()-capable ALSA devices and where hrtimers are available.

Perhaps he should talk to the HW manufacturers - the few ones that are supporting Linux - first.

Last: I am really wondering how accurate a SW timer can be.

The only acceptable way would probably be to sync the chain to an external precision clock.
 
What in the end makes sense is the solution which delivers best sound at a certain
point in time. This is my rt-kernel&ecasound&ramdisk buffering for now. ( More and more people are confirming this)

Any links? People comparing sound of non-rt kernel to rt-kernel? People comparing bit-perfect playback of ecasound to e.g. bit-perfect MPD? The ram disk makes sense, HDD does produce a lot of noise.


If one day in the far future pulseaudio will manage to beat my rtkernel-/ecasound setup I'll be happy to try it (once in a while I did try it) -- until than I remove it (which is quite a difficult task if you run Ubuntu).

Any pure alsa-based setup surpasses current pulseaudio, at least the one packaged in ubuntu. I always remove pulseaudio since the technology is not ready for production yet, in my POV. It has a pretty high goal which takes a lot of effort to reach, on all audio layers.

If you read what he is saying you'll realize that the guy is tied to "make it work on all platforms" -- that's bad - sounds like compromises.

Unless you present arguments, it is a pure speculation. The sound HW is the same on all platforms, the low-level principles too (these follow the common HW functionality).


Number two: He is listing several problems which do exist. He is complaining about everything and the limitations of Alsa. I say: So what - Instead of writing such an animal as pulseaudio he could have supported Alsa or Jack to get things better.

I belive alsa should stay a lean abstract interface to various sound cards. Most of its user-space stuff has not been very successful. dmix - supports only raw PCI/USB cards, no bluetooth, long latency. The current dmix supports 16-bit resampling only - check the source code. Alsa API is VERY complicated since it has been extended many times and has to be kept backward-compatible. Sure, fixing alsa is doable, but in my eyes that is one of the goals of pulseaudio.

Jack is a professional tool for a limited group of people. The single common sampling frequency does not make it suitable for a broader range of applications. I would not be surprised if pulseaudio gradually employed the jack core principle - applications of the same sample rate communicating via a shared memory, that is latency-free. Sure you could add resampling to jack - and slowly make it a beast similar to pulseaudio.

If you would have listenened to ecasound on a real-time kernel you'd know what I am talking about.

RT kernel has stability problems on my home machine as well as home testing notebook. It is not a well-tested technology, numerous drivers have problems and cause lock-ups. E.g. some people still report crashes of ice1724 midi on RT ubuntu studio, search the alsa-devel mailinglist. So far nobody has been able to fix it. I used to experience the same problem with ice1724 midi, some hacks of trial/error nature helped, apparently not for everyone.

Properly implemented bit-perfect alsa outputs (ecasound, MPD, sox) sound all the same. Saying one app sounds better in a consistent manner (i.e. on machines with different MBs, PSUs, CPUs, etc.) is voodoo to me.

Unlike mplayer which is very stubborn with resampling/reformatting the audio data based on the actual sound card parameters instead of the parameters supplied by the upper-layer alsa plugins. But it makes it bit-imperfect and it does not conform the initial condition of bit-perfection.


He is talking about timer limitations of 100Hz! Come on. I was running 10khz two years ago. The rest of the audio world is running at 1000Hz for quite some time.

For pure playback 100Hz with long-enough latency is sufficient. It is your own decision to strain your machine with 10kHz. The audio world you are talking is crucially dependent on low latency for the real-time audio work. Do you think they would find it acceptable to push the play button, wait for a few seconds until the data buffers from HDD to RAM, and then try to make up for the lost seconds with single-millisecond latency setup?

>> The glitch-free logic will only be enabled on mmap()-capable ALSA devices and where hrtimers are available.

Perhaps he should talk to the HW manufacturers - the few ones that are supporting Linux - first.

Last: I am really wondering how accurate a SW timer can be.

The only acceptable way would probably be to sync the chain to an external precision clock.

Well, are you sure you understand what Lennart is talking about in his article? The hrtimers have nothing to do with audio clock. Since the card is setup to high latency, it reveals its position only few times a second and hrtimers have to "simulate" the card reading pointer so that the upstream knows to which position of the mmapped buffer the low/middle/high latency data should be copied to be properly ahead of the HW reading pointer.

It is pretty difficult to write to a specific part of the buffer by the user-space application if the buffer is not mmapped.

I suggest we stop this discussion since apparently we both are talking about different things.
 
Last edited:
Well, I should say that I am now using cPlay through some lousy desktop speakers and I am definitely hearing something else than using foobar. In fact I've tried foobar with virtually any output (asio, driver, wasapi) and various sample rates, but I could not distinguish some significant changes.. Maybe when changing to asio and buffering in RAM I had some subtle changes.

But it's nothing comparing to what I am hearing with cplay...

Just for reference I am using win7, not xp, 32 bits.
 
SunRa,

did you check if both foobar and cplay are bit perfect? The article http://www.enjoythemusic.com/magazine/viewpoint/0808/aachapter106.htm says:
RAM loading of a 640 mB CD takes about 15 seconds from the hard drive, and the information is then upsampled by the program to the highest allowable amount for your DAC or 192 kHz.

I cannot comment on that, I do not know the internals.

Playback on windows is a black box to me due to missing source code of the whole chain from the player (I could not find complete source code for cplay or foobar2000) through the library layer down to the drivers. Any layer can modify the audio stream which can be detected only by checking for bit-perfection.
 
Hello,

thanks for the reply. Do you have an easy way to check for bit perfect? I really haven't any time for looking on this but if you have an easy ready made method I can check this week-end.
I plan to compare a file up-sampled at 192/24 played through both players. Also, I plan to compare the 192/24 played through foobar with the original 44.1/16 played through cPlay and realtime upsampled with src or sox (cplay can upsample using different src and different sox configurations).

Thanks!
 
Well, there is the simple test of playing a DTS file through the chain and SPDIF output, checking if an AV receiver can detect the incoming stream as DTS. I am not very convinced about conclusiveness of the test since it is rather simple to check for DTS header and the SW can reconfigure the chain accordingly.

I guess the only bulletproof test is recording the SPDIF output with SPDIF input of the same/another card and compare the results. Time-aligning of two wavs in audacity takes a few minutes, the rest is just subtracting the samples of the two wavs and checking the result for non-zero samples, e.g. in sox.

This method is certainly not a simple one. That is the reason not many people actually check for bit-perfection.
 
Hello phofman,

thanks for the tips, I'll see what my card can do... the part with subtracting from wav it's not very clear to me, I'll play with audacity.

Now, does anyone know how can I switch off upsampling in cPlay? Let's say I have 192/24 files, if I set 192Khz upsampling with sox in cPlay, does it mean than when detecting a 192 file it won't intervene on it?
 
Hello phofman,
thanks for the tips, I'll see what my card can do... the part with subtracting from wav it's not very clear to me, I'll play with audacity.

The goal is to compare the outgoing and incoming wavs. They have to be time-aligned first (e.g. in audacity) to start with the same sample. Then you have to compare them somehow. For me the easiest way is to subtract the two wavs in sox and check the resultant wav statistics:

Code:
sox -V -m outgoing.wav -v -1 incoming.wav -n stat

Params:
-V - verbose output
-m - merge input wavs
-v -1 - volume -1 for the second input wav, i.e. multiply its samples by -1, i.e. invert them. Merging thus means subtraction
-n - do not create output wav
stat - calculate and print statistics for the resultant wav (product of the subtraction). If the two wavs are identical, max/min/avg samples must be exactly zero.

Now, does anyone know how can I switch off upsampling in cPlay? Let's say I have 192/24 files, if I set 192Khz upsampling with sox in cPlay, does it mean than when detecting a 192 file it won't intervene on it?

Sox itself detects when the incoming rate is equal to the requested output rate and skips the conversion.
 
SunRa,

did you check if both foobar and cplay are bit perfect? The article http://www.enjoythemusic.com/magazine/viewpoint/0808/aachapter106.htm says:
RAM loading of a 640 mB CD takes about 15 seconds from the hard drive, and the information is then upsampled by the program to the highest allowable amount for your DAC or 192 kHz.

I cannot comment on that, I do not know the internals.

Playback on windows is a black box to me due to missing source code of the whole chain from the player (I could not find complete source code for cplay or foobar2000) through the library layer down to the drivers. Any layer can modify the audio stream which can be detected only by checking for bit-perfection.


He is not loading a full CD into RAM with CPLAY. But I am doing it. This way I can stop the HDD for about 45 minutes.


There is not a fixed upsampling. You just select it. You can choose Secret Rabbit or SOX.
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.