nyob zoo tus hlub,
in my blog I am cheating a little for performance as I have heavy Cloudflare caching in front of it, but the default optimization for WordPress is using Memcached, a widely used in-memory cache. And this week’s paper discusses how to make Memcached itself even faster by bypassing the Kernel (kinda) to achieve further performance optimizations.
I finally understood what you need active waiting and CPU pinning for, and you also will get it throughout reading this paper. Also nice, this is another use case for eBPF which we discussed in another week
In-memory key-value stores are critical components that help scale large internet services by providing low-latency access to popular data. Memcached, one of the most popular key-value stores, suffers from performance limitations inherent to the Linux networking stack and fails to achieve high performance when using high-speed network interfaces.
While the Linux network stack can be bypassed using DPDK based solutions, such approaches require a complete redesign of the software stack and induce high CPU utilization even when client load is low. To overcome these limitations, we present BMC, an in-kernel cache for Memcached that serves requests before the execution of the standard network stack. Requests to the BMC cache are treated as part of the NIC interrupts, which allows performance to scale with the number of cores serving the NIC queues. To ensure safety, BMC is implemented using eBPF. Despite the safety constraints of eBPF, we show that it is possible to implement a complex cache service. Because BMC runs on commodity hardware and requires modification of neither the Linux kernel nor the Memcached application, it can be widely deployed on existing systems. BMC optimizes the processing time of Facebook-like small-size requests. On this target workload, our evaluations show that BMC improves throughput by up to 18x compared to the vanilla Memcached
application and up to 6x compared to an optimized version of Memcached that uses the SO_REUSEPORT socket flag. In addition, our results also show that BMC has negligible
overhead and does not deteriorate throughput when treating non-target workloads.