Avoid memory leaks when caching instance methods in Python classes
While working on a customer's project last week I stumbled across the problem that caching instance method results in Python classes easily creates memory leaks, especially in long-running processes.
Although Python comes with built-in caching functionality (see functools
library)
and there are 3rd party caching libraries available like cachetools
I couldn't
find a proper solution that satified my needs. So I wrote my own, by using a very
nice function from the cachetools
libarary, and building a proper decorator around
it.
Before using the code shown below, create a virtualenv and install the cachetools package upfront so that you can try things out yourself:
$ virtualenv pyenv
$ . pyenv/bin/activate
$ pip install cachetools
Now let's start implementing. A naive approach to apply caching functionality to a very simple could be:
from cachetools import cached
mycache = {}
class MyClass:
@cached(mycache)
def myfunc(self, a):
print('myfunc called with', a)
return a * 2
It uses the cached()
decorator from the cachetools
libary, and initializes it
with a dictionary instance that serves as the actual container for the caching.
Now let's create an instance of MyClass
and call myfunc()
twice to see that
the cache works:
>>> my = MyClass()
>>> my.myfunc(2)
myfunc called with', 2
4
>>> my.myfunc(2)
4
Here we can see that the second call doesn't actually call into to the myfunc()
function but receives the result from the previous call from the cache. So far so good.
But: What happens if the my
-instance gets deleted (and hence garbage collected)?
Does the cache lose its cached results from those calls above?
Well, let's see. In order to prove that an object is really deleted from the Python interpreter by garbage collection I'm applying a weak reference to the object which I want to delete. If the weak reference returns None, the previously referenced object has been indeed deleted.
Note: Garbage collection doesn't necessarily happen immediately after an object has been orphaned. Instead, the Python interpreter calls the garbage collector at certain trigger points, so it can take a while until objects are actually removed. However, the garbage collector can be called manually to enforce the cleanup process, which is what we do in the following example.
Let's demonstrate this:
>>> import weakref, gc
>>> myinst = MyClass()
>>> myinstref = weakref.ref(myinst)
>>> myinstref()
<__main__.MyClass object at 0x7f6565b11a30>
>>> del myinst
>>> gc.collect() # enforce garbage collection
>>> myinstref() is None
True
So, this worked. Now let's go back to the my
-instance created above, and also
try to remove it:
>>> myref = weakref.ref(my)
>>> del my
>>> gc.collect()
>>> myref.ref()
<__main__.MyClass object at 0x7f6565a01c40>
Hmm, this didn't work quite as expected, the my
instance is still alive. The reason
behind this is actually the cache. When we had called my.myfunc(2)
before, not only
the parameter 2
and the result 4
got stored in the cache, but also the reference
to self
, just because self
is part of the method signature. This can be made
visible by inspecting the cache object, which is a plain dictionary:
>>> mycache
{(<__main__.MyClass object at 0x7f6565a01c40>, 2): 4}
As you can see the cache itself keeps a reference to the my
instance which prevents it
from being garbage collected. Now imagine you create thousands or millions of
MyClass
instances, and only ever make the same call my.myfunc(2)
to it. The
result would be a million entries in the cache for parameter 2
and result 4
,
just each with a reference to a different short-lived MyClass
instance. Voilà,
here is our memory leak.
The solution is to clear the cache, and then our my
instance will eventually
also be garbage collected:
>>> cache.clear()
>>> gc.collect()
>>> myref.ref() is None
True
Constructing a cache that frees instances when they are garbage collected
As we have seen before the weakref
-module provides a nice way to keep references
to objects without holding a firm grip on them. This approach can be used inside the cache
as well. Here is an implementation that I came up with, on this nice Sunday afternoon.
In order to understand this implementation please make sure you are familiar with how decorators work in general. I will only explain details which are specific to the implemenation, not the entire principle of decorators.
from cachetools import cachedmethod
from weakref import WeakKeyDictionary
def methodcache(cache_factory):
def wrapped_methodcache(method):
def get_instance_cache(self):
try:
instance_cache = weak_cache[self]
except KeyError:
instance_cache = weak_cache[self] = cache_factory()
return instance_cache
weak_cache = WeakKeyDictionary
cached_method = cachedmethod(get_instance_cache)(method)
# Attach the weak dictionary to the decorator so we can access it from outside:
cached_method.__cache__ = weak_cache
return cached_method
return wrapped_methodcache
# Application:
class MyNewClass:
@methodcache(dict)
def myfunc(self, a):
return a * 2
The core of this decorator is the cachetools.cachedmethod()
function (which itself
is a decorator). It is initialized with a callback function (get_instance_cache()
in
this case) which is called each time the decorated method is called. It receives
a reference to the class instance (MyNewClass()
in the demo case above) as argument
self
.
The approach is that each decorated method will get a WeakKeyDictionary
-instance
attached to it. Since get_instance_cache()
is defined at the same level as
the weak_cache
variable is assigned the WeakKeyDictionary
, it has access to this
weak dictionary instance. When it is called it looks up whether this weak dict has
already stored a cache for this
. If not, such a cache is created via the
cache_factory()
instance. Then the cache is returned to the cachedmethod
decorator
function, which itself then is responsible for doing the caching or looking up cached
values.
I've to admit that this solution is not super easy to understand. but with some experience with decorators in general and a little digging into the code I hope everyone can profit from this solution.