PeppyMeter

I would suggest - marked as *****

(!meter->running) {
*****
list_for_each(pos, &meter->scopes) {
scope = list_entry(pos, snd_pcm_scope_t, list);
if (scope->enabled)
scope->ops->start(scope);
*****
}
meter->running = 1;
}
***** (1)
list_for_each(pos, &meter->scopes) {
scope = list_entry(pos, snd_pcm_scope_t, list);
if (scope->enabled)
scope->ops->update(scope);
***** (2)

}
*****


But the results are predictable, you should see significant delay (several seconds) between (1) and first pass of (2)


Edit: the format I guess should be checked s16_enable/s16_update
 
Last edited:
I believe I added printf everywhere. I don't see any activity in the thread during the spike - no output.

Code:
static void *snd_pcm_meter_thread(void *data)
{
        snd_pcm_t *pcm = data;
        snd_pcm_meter_t *meter = pcm->private_data;
        snd_pcm_t *spcm = meter->gen.slave;
        struct list_head *pos;
        snd_pcm_scope_t *scope;
        int reset;
        printf("thread called\n");
        list_for_each(pos, &meter->scopes) {
                printf("list_for_each\n");
                scope = list_entry(pos, snd_pcm_scope_t, list);
                snd_pcm_scope_enable(scope);
        }
        printf("meter->closed: %d\n", meter->closed);
        while (!meter->closed) {
                 printf("in while loop\n");
                snd_pcm_sframes_t now;
                snd_pcm_status_t status;
                int err;
                pthread_mutex_lock(&meter->running_mutex);
                err = snd_pcm_status(spcm, &status);
                assert(err >= 0);
                //printf("status.state: %d\n", status.state);
                if (status.state != SND_PCM_STATE_RUNNING &&
                    (status.state != SND_PCM_STATE_DRAINING ||
                     spcm->stream != SND_PCM_STREAM_PLAYBACK)) {
                        printf("status.state\n");
                        if (meter->running) {
                                printf("meter->running\n");
                                 list_for_each(pos, &meter->scopes) {
                                        scope = list_entry(pos, snd_pcm_scope_t, list);
                                        scope->ops->stop(scope);
                                }
                                meter->running = 0;
                        }
                        pthread_cond_wait(&meter->running_cond,
                                          &meter->running_mutex);
                        pthread_mutex_unlock(&meter->running_mutex);
                        continue;
                }
                pthread_mutex_unlock(&meter->running_mutex);
                if (pcm->stream == SND_PCM_STREAM_PLAYBACK) {
                        printf("pcm->stream\n");
                        now = status.appl_ptr - status.delay;
                        if (now < 0)
                                now += pcm->boundary;
                } else {
                        printf("not pcm-<stream\n");
                        now = status.appl_ptr + status.delay;
                        if ((snd_pcm_uframes_t) now >= pcm->boundary)
                                now -= pcm->boundary;
                }
                meter->now = now;
                if (pcm->stream == SND_PCM_STREAM_CAPTURE)
                        reset = snd_pcm_meter_update_scope(pcm);
                else {
                        reset = 0;
                        while (atomic_read(&meter->reset)) {
                                printf("atomic read\n");
                                reset = 1;
                                atomic_dec(&meter->reset);
                        }
                }
                if (reset) {
                        list_for_each(pos, &meter->scopes) {
                                printf("reset list");
                                scope = list_entry(pos, snd_pcm_scope_t, list);
                                if (scope->enabled)
                                        scope->ops->reset(scope);
                        }
                        continue;
                         if (!meter->running) {
                        printf("meter not running\n");
                        list_for_each(pos, &meter->scopes) {
                                printf("not running list for each \n");
                                scope = list_entry(pos, snd_pcm_scope_t, list);
                                if (scope->enabled)
                                        scope->ops->start(scope);
                        }
                        meter->running = 1;
                }
                list_for_each(pos, &meter->scopes) {
                        printf("list 2\n");
                        scope = list_entry(pos, snd_pcm_scope_t, list);
                        if (scope->enabled)
                                scope->ops->update(scope);
                }
                nanosleep(&meter->delay, NULL);
        }
        list_for_each(pos, &meter->scopes) {
                printf("list 3\n");
                scope = list_entry(pos, snd_pcm_scope_t, list);
                if (scope->enabled)
                        snd_pcm_scope_disable(scope);
        }
        return NULL;
 }
 
Rpi: With **** I marked the points to insert your logging with timestamps (the latter us critical).
I can predict points as the CPU spikes I logged on the peppyalsa side should be the same.

So what happens
list_for_each(pos, &meter->scopes) iterates though the scopes.
After alsaplugin is loaded we will have 2 scopes
The first is the native so called “s16” scope,
the second belongs to peppyalsa.

I suspect it happens near printf("list 2\n"); but not before the loop, rather inside the loop.
With your logging point you would hardly visualize the spikes in the log as you didn’t wrap the loop body, i.e. scope->ops->update(scope) calls.

So the loop will call scope->ops->update() for “s16 scope” first, so will call s16_update, that respectively calls snd_pcm_linear_convert(). That is exactly what you found while profiling.
This call converts the buffer, so it takes time and CPU depending on the buffer size
After that the loop will call level_update() from peppy alsa.

So if you log before and after scope->ops->update(scope) you should see the delay to process the first, i.e. s16_update call.


If the above is not clear, please let me know, I will be happy to choose another words to explain.
 
You're right. It's paused (before) when spike starts and continues (after) when no spike:
Code:
if (scope->enabled) {
    printf("before\n");
    scope->ops->update(scope);
    printf("after\n");
}
During normal execution (no spike) I see both of them (before, after) in the terminal.
 
I groped an alsa config based solution and trying to verify.


Edit
The spiking scope->ops->update(scope)
calls s16_update and then calls snd_pcm_linear_convert that loops on conversion.
If the buffer size is large, it takes time.


But the most curious thing is it doesn’t take time always on the same buffer size.
So the conversion is applied conditionally, I suspect it depends on formats matching.


I know that using snd-dummy sound card as the output target doesn’t cause the spikes.
Snd-dummy is a dummy/fake driver (sound card “Dummy”) without a real output, it just silently plays everything directed onto it.
It can be installed as “modeprobe snd-dummy” and autostarted by adding snd-dummy into /etc/modules


The most curious (for me) thing is where/how alsa checks the format matching and why the same
scope->ops->update(scope) → calls s16_update → s snd_pcm_linear_convert sequence is proceeded faster or skipped while playing onto Dummy sound card

It should be in s16* functions of pcm_meter.c


Rpi: Could you try to find that by logging
 
Last edited:
Here is what I have composed and doesn’t produce spikes … for me at least.
Please try and let me know on the results. I actually need a confirmation from you, Rpi

The main logic is splitting the output into 2 equal sub-outputs and direct the first output to real output (sound card), the second one to peppyalsa that downs the stream to Dummy sound card.
The latter supports all the formats natively, so pcm_meter doesn’t need a conversion (of large buffers) so doesn’t cause the spikes.


Install Dummy sound card. The driver should be on your system already
“modprobe snd-dummy”
Auto starting is made by adding snd-dummy into /etc/modules

The asound is the following
-path to libpeppyalsa.so should be changed to yours
-you may want to add softvol to a place depending on whether the volume control has to affect level metering

Code:
pcm.!default {
  type plug
  slave.pcm "quadsplit"
}

ctl.!default {
  type hw
  card Headphones
}

ctl.quadsplit {
  type hw;
  card Headphones;
}


pcm.!DummyPCM {
   type hw
   card Dummy
}

pcm.!realphone {
  type hw
  card Headphones
}

pcm.!peppyfake {
  type meter
  slave.pcm "DummyPCM"
  scopes.0 peppyalsa
}

pcm.quad {
    type multi

    slaves.a.pcm "peppyfake" 
    slaves.a.channels 2

    slaves.b.pcm "realphone" 
    slaves.b.channels 2


    bindings.0 { slave a; channel 0; }
    bindings.1 { slave a; channel 1; }
    bindings.2 { slave b; channel 0; }
    bindings.3 { slave b; channel 1; }
   
}


pcm.quadsplit {
    type route
    slave.pcm "quad"
    ttable.0.0 1
    ttable.1.1 1
    ttable.0.2 1
    ttable.1.3 1
}


pcm_scope.!peppyalsa {
  type peppyalsa
  decay_ms 400
  meter "/home/pi/myfifo"
  meter_max 100
  meter_show 0
  #spectrum "/home/pi/myfifosa"
  #spectrum_max 100
  #spectrum_size 30
}


pcm_scope_type.!peppyalsa {
        lib /home/pi/peppyalsa/src/libpeppyalsa.so
}

#pcm.dsp0 peppyalsa
 
Hi Serge, I confirm that it works without spikes with dummy sound card and that .asoundrc file. So we have one more workaround which is a little bit complicated but doesn't need changes in the players' cache.

Also I tried to debug pcm_meter.c but didn't have any meaningful result. In this function it always falls into the clause with SND_PCM_FORMAT_U32_BE for all radio streams which caused spikes:

alsa-lib/pcm_meter.c at 03b2f3d05653ef1d9de32ce1d9cf251b6fecd279 * alsa-project/alsa-lib * GitHub

This function always comes to the 'default' clause:
alsa-lib/pcm_meter.c at 03b2f3d05653ef1d9de32ce1d9cf251b6fecd279 * alsa-project/alsa-lib * GitHub

I can continue debugging if you tell me where to debug. Hopefully we can find that bottleneck. It's probably just a couple steps away :)
 
> So we have one more workaround

I believe it’s rather a solution, not just a workaround ;-)

>which is a little bit complicated but doesn't need changes in the players' cache.

Why do you think it’s complicated? ;-)
It’s the most easy way. It’s just a text (config) file/
Changing cache size of players is more complex as you need to patch (configurations again) of multiple players without a warranty of success. Or patch sources/rebuild the players to force them using less cache.
Patching alsa has a global effect and looks more complex than just a new .asounderc

>I can continue debugging if you tell me where to debug. Hopefully we can find that bottleneck.

I’m under impression alsa just works as expected, so what we have found is not a bottleneck to fix,
but rather “as by design” with the the expected consequence.

I will try to explain my logic.
So the facts as they seem to me
- The buffer size regulation is beyond alsa, it’s determined by players
- The more data (of mismatching formats), the more CPU is required to process
- The meter plugin works with S16 (signed 16 bit) format only. Remember snd_pcm_scope_s16_open in peppyalsa? There are no s24_open etc, only s16
- Typically files I (and probably you) have been playing are of S16_LE (signed 16 bit little endian).
So the meter plugin has no reasons to convert the formats, they are equal, So it’s buffer size irrelevant at all.
- However the radio streams are often (nearly always) are of S24 format.
- Card “headphones” (bcm2835) is 16 bit only card
- snd-dummy accepts any format

So my “theory” is based on required conversion if the formats are mismatching.
The processing chain is
source→ meter plugin → scope- peppyalsa → slave.pcm (soubd card)

If it is
S16 (as in files) → meter plugin → scope- peppyalsa → S16 no conversion is required and so no spikes
If it is
S24 (radio) → meter plugin → scope- peppyalsa → S16 (bcm2835 Headphones), conversion is required and takes time depending on the buffer size.
If it is
S24 (radio) → meter plugin → scope- peppyalsa → S24 (snd-dummy), no conversion is required and doesn’t takes time independently on the buffer size.

-How to make bcm2836 Headphones S24 compatible? – no way I guess-
-How to make radio S16? - by forcing native (radio S24) to (bcm2835) S16 conversion inside (multiple) players. It’s hardly possible imho.
-Theoretically, without snd-dummy usage, we could find something the conversion related inside pcm_meter.c, but not sure it could help (be possible) as the final target will be S16 bcm2835 anyway.
Anyway it’s alsa internal logic we wouldn’t be able to change it globally. I have an idea for what to try, but just an idea, no warranties. Nobody knows how much time we would have to spend on this.

Taking into account the above, I do think the asound.conf was it’s the easiest and probably the best solution.

Would be happy to know your opinion
 
Format conversion of a typical audio stream is a trivial operation taking almost no CPU load. The actual bug is some looping somewhere in alsa-lib caused by some untested combination of parameters/statuses/timing. Avoiding the format conversion is a workaround. But also a good pointer at the possible cause.
 
phofman Looping was found by perf and the logging, it’s the regular conversion loop.
It’s really trivial, you are right, but big buffers make the trivial operation long and CPU consumptive.
For me if there is an alsa bug, it may be in a redundant conversion, somewhere in s16_* functions of pcm_meter. Would be happy if you propose a more strict hypothesis (logging point). It’s curious.
However the solution beyond the need to fix a third party code is found. Agree?
 
If the processing runs correctly, there are only as many conversion calls as there are samples per second, i.e. e.g. 48k * 2 = 96k conversions per second, that's nothing. Longer buffers just mean the iteration over the buffer does not occur so often.

It's simple to add printf to the actual conversion call and check how many times a second the conversion is called. It is possible the processing is incorrect and the conversion gets looped, converting the same samples over and over again.

But IIRC rpi said the meter thread is not called at all during the spike. Also look at perf output at https://www.diyaudio.com/forums/pc-based/291010-peppymeter-13.html#post6272395 - the sum of the Overhead column is still 100%, yet CPU load was small. That number is not percentage of CPU load, but of all probes tracked by perf on that particular process.
 
Last edited:
>It may use also other criteria which may change.


The sound card, it spikes with snd-bcm2835, but doesn’t with snd-dummy

>Nevertheless debugging the scope->update method should not be difficult.


I’m with Linux for about 3 months only, so it’s difficult for me.
Please realize and don’t get me wrong. I spent more than a week trying to help Rpi and did find a way to avoid spikes. If you suppose debugging/fixing (if possible) is better/easier than just configuring, please go ahead, the “ball” is on your side. I’m curious of the results though.
 
SergeG, I am not by any means undervaluing your effort. But the fact is a trial/error which for an unknown reason happens to work differently (i.e. not hitting the bug) is a brittle workaround. Yet very useful for the actual troubleshooting, no doubts.

rpi, I have an RPi4, a regular monitor, and an empty SD card, please what should I install /configure to hit the problem? Thanks.
 
I hope we all agree that CPU utilization over 100% for about 20 seconds is not normal because that blocks almost all other running tasks for that period of time. Unless we know exactly the reason for that we can only use workarounds. Some of them are easier some not.

Serge, I appreciate very much your help in resolving this issue and your suggested workaround If the root cause will not be found I'll use your workaround.

phofman, you can download/install the Headless disk image (Turner Edition) using this page (turner-headless.zip):
Disk Images * project-owner/PeppyPlayers.doc Wiki * GitHub
To login over SSH (e.g. Putty) use user/pass: pi/Turner
Kill the running player process: kill -9 PYTHON3_PID

Disable the player auto-start by commenting out the line in the /etc/rc.local.
Connect a powered speaker to the 3.5mm audio jack. This is default player output.

In the folder /home/pi you will see several versions of the .asoundrc files. Replace the current .asoundrc by one of them for example .asoundrc-peppyalsa. That will enable the peppyalsa plugin.

Start the player from the /home/pi/Peppy folder: python3 peppy.py
Open another SSH terminal and run either 'top' or 'htop' (thank you Serge)

To see the player UI open web browser on any device in your home network and use URL: http://IP_OF_YOUR_PI:8000 (sorry Serge for 8080 ;) )
Then select Radio in Home menu. It will start playing the first station in the Children genre. You may or may not see the spike at this point. To see the spike just switch to another station. I usually go to the Pop genre and play the first station. In another SSH session you should see that CPU utilization goes over 100% and keeps there for about 20 seconds. UI becomes not so responsive during this period of time.

To build/install the peppyalsa plugin use these steps:
Installation * project-owner/peppyalsa.doc Wiki * GitHub
It should not take more than 5 minutes.

To build alsa-lib use these steps:
Updating ALSA on Raspbian Stretch
The current ALSA version on disk image is 1.1.8 So you can define it for downloading source files in 'wget'. I also tried to use the latest 1.1.9 and didn't see any difference related to the discussed issue.

When you run 'sudo make install' for alsa-lib it will install it in /usr/lib. I usually just copy the file libasound.so.2.0.0 from that folder to
/usr/lib/arm-linux-gnueabihf/ where it should be.

Please let me know if you need any other help.
Thanks!
 
Last edited: