Detecting Android memory leaks in production

Monitoring mobile performance and resource consumption at Lyft

Published in 8 min read Jan 17, 2023

Android developers have a number of tools in their arsenal for detecting memory leaks such as Android Studio Memory Profiler, LeakCanary, Perfetto, etc. These tools are useful when analyzing app builds locally. However, in production the app is running on a wide range of devices in different circumstances and it is hard to predict all of the edge cases when profiling the build locally.

At Lyft, we were curious about how our apps behave in production on users’ devices. Thus, we’ve decided to bring observability to various runtime performance metrics and see how it could improve the user experience.

We’ve already published a blog post about CPU usage monitoring and in this story, we will focus on the memory footprint of mobile apps. While the overall concept of monitoring the memory footprint is applicable to both Android and iOS platforms, we will focus on the former for implementation details.

Lyft relies on A/B testing when rolling out new features. When a feature is ready for production, it’s covered by a feature flag and is launched as part of an experiment. This experiment is run for a certain group of users in order to compare metrics against the standard version of the app.

When a large and complex feature is released, it is important to make sure it does not bring any regressions in terms of memory usage. This is especially important if the feature includes native C/C++ code which has a higher chance of introducing memory leaks.

Therefore, we wanted to test the following hypothesis. For each feature experiment, we measure its memory footprint across all users that have access to it (by reporting metrics to analytics at runtime). Then, we compare it to the standard version of the app. If the variant shows larger memory usage values, this is an indicator of a regression or memory leak.

Memory footprint metrics

First, we needed to identify the availability of memory metrics on Android, which are not as trivial to collect as one might think. We are interested in the memory usage by the application process or in other words the app’s memory footprint.

Android provides various APIs for retrieving memory usage metrics for apps. However, the hardest part is not retrieving the metrics but making sure they are suitable and provide meaningful data.

Since Android Studio has a built-in memory profiler, we decided to use it as a reference point. If we get the same value as the memory profiler, our data is correct.

One of the primary metrics the Android Studio memory profiler shows is called PSS.

PSS

Proportional set size (PSS) — the amount of private and shared memory used by the app where the amount of shared memory is proportional to the number of processes it is shared with.

For example, if 3 processes are sharing 3MB, each process gets 1MB in PSS.

Android exposes a Debug API for this data.

As we can see above, both control (green line) and treatment (orange line) variants are the same. This means that the feature has not introduced any regressions and is safe to roll out to production from a memory usage perspective.

Example 2 — regression

Now let’s take a look at a feature that introduced a regression.

This is an example of an experiment that adds a feature to the Lyft Android app for drivers.

The new feature here has clearly increased the app’s memory footprint at each percentile. This is an indicator of a regression that allowed us to identify a memory leak.

This graph is based on the RSS metric. In order to narrow down the root cause of the issue similar graphs with JVM and native heap allocations were used.

Example 3 — memory leak at 99th percentile

The final example is the most important as it demonstrates the biggest advantage of this memory monitoring approach.

In the graph below both variants show almost the same values, except for a noticeable difference around the 99th percentile.

The memory footprint of the treatment variant has significantly increased at the 99th percentile.

This led to identifying a memory leak that occurred in very specific circumstances for a small number of users. However, when it occurred it significantly increased memory usage.

In the case of the second example, it is more likely to detect a memory leak that affects all usages of the app when profiling locally. However, it becomes a much harder task with the third example as the memory leak was related to an edge case which is easy to miss locally.

Learnings

One of the challenges with implementing such a memory monitoring tool is to pick the right metrics that show valid data. On the other hand, it is important that collecting those metrics is not time- or resource-consuming.

Another task is to visualize the data in an effective report. Comparing the average values between control and treatment variants on a daily basis does not produce meaningful data. A better approach is to use a percentile distribution from the very beginning of the experiment as shown in the examples above.

Overall, the longer the experiment runs the better data is. It requires a few days to start getting meaningful data after an experiment launch. There is also a dependency on how many users are exposed to the experiment.

This approach is especially useful when a memory leak appears around the 99th percentile. This means a memory leak happens due to a specific edge case that would be much harder to detect with local profiling.

Using tools for local memory leak detection is an important task to prevent promoting regressions to users. Reporting performance metrics at runtime shines a light on issues that would be much harder to detect otherwise.

Resources

If you’re interested in working on tooling and performance at Lyft then take a look at our careers page.