RPi5 Banklow 1 Performance Regression On Non-NUMA Kernels A Deep Dive

by ADMIN 70 views

Introduction

Hey guys, let's dive into a critical performance issue affecting Raspberry Pi 5 users, specifically those running non-NUMA kernels. This article will break down the problem, its causes, and how to mitigate it. We're talking about a significant performance regression when decoding H264 videos, and it's all tied to the SDRAM_BANKLOW setting in the rpi-eeprom configuration. So, if you've noticed your RPi5 struggling with video playback, especially at 4K resolution, this is for you. Keep reading, and we'll get to the bottom of it!

The Bug: Massive Performance Drop

The core issue revolves around the SDRAM_BANKLOW setting in the Raspberry Pi 5's EEPROM configuration. When set to 1 (which is the default in recent rpi-eeprom versions), users are experiencing a massive performance hit when software decoding H264 videos on non-NUMA kernels. To put it simply, the frame rates are almost halved compared to older configurations where SDRAM_BANKLOW was set to 3. This performance drop isn't a minor inconvenience; it's a significant regression that makes smooth playback of 4K videos a real challenge. Previously, many 4Kp24 videos could be decoded without a hitch, but now, the experience is far from ideal. Imagine trying to watch your favorite movie and it's constantly stuttering – not fun, right? This is precisely what users are facing, and it's a problem we need to address.

Understanding the Impact

The performance regression is particularly noticeable when dealing with higher-resolution videos like 4K. While high-definition (HD) videos might still be somewhat manageable, the strain on the system becomes evident with 4K content. This means that users who rely on their Raspberry Pi 5 for media playback, especially those with 4K displays, are feeling the pinch. The software decoding process, which is already resource-intensive, is further hampered by this configuration. It's like trying to run a marathon with weights on your ankles – you can do it, but it's going to be a lot harder and slower. The frustration is real, and it's impacting the user experience significantly. The good news is that there's a workaround, which we'll discuss shortly, but first, let's dig a bit deeper into the technical aspects.

Why This Matters

For those who might not be deeply technical, it's crucial to understand why this performance drop is such a big deal. The Raspberry Pi 5 is a powerful little device, and many users rely on it for various tasks, including media playback, home automation, and even light gaming. When a core function like video decoding suffers, it affects the overall usability of the device. It's like having a sports car that can't go over 30 miles per hour – the potential is there, but it's not being realized. This issue not only impacts the immediate user experience but also raises concerns about future software updates and configurations. Users need to be confident that their devices will perform as expected, and regressions like this can erode that confidence. That's why it's essential to understand the root cause and how to fix it.

Reproducing the Issue: Steps to Take

To really grasp the problem, let's walk through the steps to reproduce the performance regression. This isn't just about complaining; it's about providing concrete evidence and allowing others to verify the issue. The method involves using ffmpeg, a powerful command-line tool for handling multimedia files. We'll use it to software decode a video and measure the decoding time. By observing the reported frame rate and decoding speed, we can quantify the performance difference between different configurations. Think of it as a scientific experiment – we're setting up a controlled environment to isolate the variable that's causing the problem.

The ffmpeg Command

The command we'll use is:

time ffmpeg -i bbb_sunflower_1080p_30fps_normal.mp4 -an -f null /dev/null

Let's break this down:

  • time: This is a Unix utility that measures the execution time of a command.
  • ffmpeg: The multimedia framework we're using for decoding.
  • -i bbb_sunflower_1080p_30fps_normal.mp4: This specifies the input video file. In this case, it's the Big Buck Bunny video, which is a commonly used test file.
  • -an: This disables audio processing, as we're only interested in video decoding performance.
  • -f null /dev/null: This tells ffmpeg to discard the output, as we don't need to save the decoded video. We're just measuring the decoding time.

By running this command, we can get a clear picture of how long it takes to decode the video under different SDRAM_BANKLOW settings.

Disabling NUMA (Temporarily)

To further isolate the issue, we need to temporarily disable NUMA (Non-Uniform Memory Access). NUMA is a memory architecture that can improve performance in certain situations, but it can also complicate our testing. To disable it, we add numa=fake=1 to the cmdline.txt file, which is located in the boot partition of your Raspberry Pi's SD card. This tells the kernel to treat the memory as if it were non-NUMA.

The Baseline: SDRAM_BANKLOW=3 and NUMA Disabled

For a baseline, we'll use the SDRAM_BANKLOW=3 setting with NUMA disabled. This represents the configuration from about half a year ago, before the default was changed to SDRAM_BANKLOW=1. Under this configuration, you should see a certain frame rate and decoding time. This will serve as our reference point.

The Experiment: Comparing Results

Now, we'll compare the results under different configurations:

  1. SDRAM_BANKLOW=3 and NUMA disabled (baseline): This gives us our starting point.
  2. SDRAM_BANKLOW=3 and NUMA enabled: This shows us the performance improvement we get from NUMA.
  3. Default rpi-eeprom-config (SDRAM_BANKLOW=1) and NUMA enabled: This should give us similar results to the NUMA-enabled case with SDRAM_BANKLOW=3.
  4. Default rpi-eeprom-config (SDRAM_BANKLOW=1) and NUMA disabled: This is where we should see the performance regression. The frame rate should be significantly lower, and the decoding time should be much higher.

By running these tests, you'll see the stark difference in performance when SDRAM_BANKLOW is set to 1 with NUMA disabled. It's a clear demonstration of the bug and its impact.

The Evidence: Performance Numbers Don't Lie

Let's get down to the nitty-gritty and look at the performance numbers. These figures paint a clear picture of the performance regression we're discussing. By comparing the frame rates and decoding times under different configurations, we can see the magnitude of the issue. It's not just a subjective feeling of sluggishness; it's a quantifiable drop in performance.

Baseline Performance

With SDRAM_BANKLOW=3 and NUMA disabled, the user reported the following results:

  • Frame rate (fps): 217
  • Real time: 1 minute 29 seconds

This is our baseline – the performance level we expect under the older configuration. It's a solid starting point, and it allows us to measure the impact of the new setting.

NUMA's Impact

Enabling NUMA with SDRAM_BANKLOW=3 yields a noticeable improvement:

  • Frame rate (fps): 253
  • Real time: 1 minute 16 seconds

That's a 16% increase in frame rate, which is about the expected performance boost from NUMA. This shows that NUMA is indeed working as intended, improving memory access and overall performance.

The Default Configuration with NUMA

Using the default rpi-eeprom-config (SDRAM_BANKLOW=1) with NUMA enabled gives us similar results:

  • Frame rate (fps): 253
  • Real time: 1 minute 16 seconds

This is consistent with the NUMA-enabled results from the previous test. It shows that when NUMA is active, the SDRAM_BANKLOW setting doesn't have a significant impact.

The Problem: NUMA Disabled and SDRAM_BANKLOW=1

Here's where the issue becomes glaringly obvious. With the default rpi-eeprom-config (SDRAM_BANKLOW=1) and NUMA disabled, the performance plummets:

  • Frame rate (fps): 121
  • Real time: 2 minutes 37 seconds

That's a staggering 45% drop in frame rate compared to the baseline! The decoding time nearly doubles, making it clear that something is seriously wrong. This is the performance regression in action, and it's a significant problem for users running non-NUMA kernels.

The Takeaway

These numbers don't lie. They demonstrate that the default SDRAM_BANKLOW=1 setting, when combined with a disabled NUMA, causes a massive performance hit. It's not a subtle difference; it's a dramatic regression that makes video decoding significantly slower. This evidence is crucial for understanding the scope of the problem and for justifying the need for a fix.

The Solution: Setting SDRAM_BANKLOW=3

Alright, guys, enough about the problem – let's talk solutions! The good news is that there's a relatively simple workaround for this performance regression. By manually setting SDRAM_BANKLOW=3 in the EEPROM configuration, you can mitigate the issue and restore performance to the levels you'd expect. It's like finding the right key to unlock the full potential of your Raspberry Pi 5.

How to Implement the Fix

To implement this fix, you'll need to access the EEPROM configuration. This typically involves editing a configuration file on your Raspberry Pi. The exact steps might vary slightly depending on your operating system and setup, but the general process is as follows:

  1. Access the EEPROM Configuration: You'll need to find the configuration file that controls the EEPROM settings. This file is often located in the /boot partition or a similar location.
  2. Edit the Configuration: Open the configuration file in a text editor with root privileges. You'll need to add or modify the SDRAM_BANKLOW setting.
  3. Set SDRAM_BANKLOW=3: Add the line SDRAM_BANKLOW=3 to the configuration file. If the setting already exists, make sure to change its value to 3.
  4. Save the Changes: Save the changes to the configuration file.
  5. Reboot Your Raspberry Pi: Reboot your Raspberry Pi for the changes to take effect.

After rebooting, the SDRAM_BANKLOW setting will be applied, and you should see a significant improvement in video decoding performance, especially if you're running a non-NUMA kernel.

A Temporary Fix, Not a Permanent Solution

It's important to note that this workaround is a temporary fix. It addresses the immediate performance regression, but it doesn't solve the underlying issue. Ideally, the Raspberry Pi Foundation will release an updated EEPROM configuration that addresses this problem more comprehensively. In the meantime, setting SDRAM_BANKLOW=3 is a viable solution for those affected by the issue.

Why This Works

The exact technical reasons why SDRAM_BANKLOW=3 restores performance are a bit complex and beyond the scope of this article. However, the key takeaway is that this setting affects how the memory is accessed and managed. By changing it, we're essentially optimizing the memory access patterns for non-NUMA kernels. It's a bit like adjusting the gears on a bicycle – you're finding the right setting for the terrain you're on.

Affected Devices and Systems

This performance regression primarily affects the Raspberry Pi 5. It's a relatively new device, and issues like this are not uncommon in the early stages of a product's lifecycle. The good news is that the Raspberry Pi community is active and responsive, and issues like this are typically addressed in a timely manner.

Operating Systems and Kernels

The issue has been observed on RPiOS Bookworm lite, but it's likely to affect other distributions as well. The key factor is the kernel configuration. Non-NUMA kernels are more susceptible to this problem, as they don't benefit from the memory access optimizations that NUMA provides. This means that if you're running a distribution that doesn't enable NUMA by default, you're more likely to encounter the performance hit.

LibreELEC and Other Distributions

The user who reported this issue initially noticed it on LibreELEC, a popular media center distribution. LibreELEC currently uses 512MB CMA (Contiguous Memory Allocator), which disables NUMA on 4GB and smaller models to avoid display corruption issues. This makes LibreELEC users particularly vulnerable to the performance regression. However, the issue is not limited to LibreELEC. Other distributions or workloads that don't utilize NUMA may also be affected.

A Note on CMA

The use of CMA is worth mentioning. CMA is a memory allocation technique that reserves a contiguous block of memory for specific purposes. While it can be beneficial in some cases, it can also have unintended side effects, such as disabling NUMA. This highlights the complex interplay between different system configurations and the importance of understanding the potential consequences of each setting.

Additional Information: System Details

To provide a complete picture of the issue, let's look at some additional system details. This information can be helpful for developers and other users who are trying to diagnose or fix the problem.

Bootloader Configuration

The bootloader configuration is as follows:

[all]
BOOT_UART=1
POWER_OFF_ON_HALT=1
BOOT_ORDER=0xf421
NET_INSTALL_ENABLED=0

These settings control various aspects of the boot process, such as the UART (Universal Asynchronous Receiver/Transmitter), power-off behavior, boot order, and network installation. They're not directly related to the performance regression, but they're included here for completeness.

System Information

The system information provides details about the operating system, kernel, and bootloader versions. This can be crucial for identifying the specific versions that are affected by the issue.

  • RPiOS Bookworm lite, fully updated
  • /etc/rpi-issue:
    Raspberry Pi reference 2023-10-10
    Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 962bf483c8f326405794827cce8c0313fd5880a8, stage4
    
  • bootloader_version:
    2025/07/17 17:25:12
    version a668b6e6edce3274de221324b93cb8741e4a7f7c (release)
    timestamp 1752769512
    update-time 1754066025
    capabilities 0x0000007f
    
  • Kernel:
    Linux raspberrypi 6.12.34+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.34-1+rpt1~bookworm (2025-06-26) aarch64 GNU/Linux
    

This information shows that the system is running a recent version of RPiOS Bookworm lite with a specific kernel and bootloader version. This helps narrow down the potential causes of the issue and identify any relevant patches or updates.

Conclusion: Addressing the Performance Regression

In conclusion, the performance regression caused by banklow 1 on non-NUMA kernels is a significant issue for Raspberry Pi 5 users. It affects video decoding performance, particularly at 4K resolution, and can make the device feel sluggish and unresponsive. However, by manually setting SDRAM_BANKLOW=3 in the EEPROM configuration, users can mitigate the problem and restore performance to the expected levels. This workaround is a temporary fix, and we hope that the Raspberry Pi Foundation will address the underlying issue in a future update.

The Importance of Community Feedback

This issue highlights the importance of community feedback and bug reporting. By sharing their experiences and providing detailed information, users like the one who reported this issue can help developers identify and fix problems more quickly. The Raspberry Pi community is known for its active and collaborative nature, and this is a prime example of how that collaboration can lead to improvements in the platform.

Staying Informed

If you're affected by this issue, it's a good idea to stay informed about any updates or fixes. Keep an eye on the Raspberry Pi forums, GitHub repositories, and other community channels for news and information. By staying engaged, you can ensure that you're among the first to know when a permanent solution is available.

A Call to Action

If you've experienced this performance regression, we encourage you to try the workaround and share your results. Your feedback can help others who are facing the same issue and can also provide valuable information for developers who are working on a fix. Together, we can make the Raspberry Pi 5 an even better platform for everyone.