16 February 2012 10:12 PM (java | jvm | boehm | concurrency | finalizers | gc | python | guile | threads)
OK kids, quiz time. Spot the bugs in this Python class:
import os class FD: _all_fds = set() def __init__(self, fd): self.fd = fd self._all_fds.add(fd) def close(self): if (self.fd): os.close(self.fd) self._all_fds.remove(self.fd) self.fd = None @classmethod def for_each_fd(self, proc): for fd in self._all_fds: proc(fd) def __del__(self): self.close()
The intention is pretty clear: you have a limited resource (file descriptors, in this case). You would like to make sure they get closed, no matter what happens in your program, so you wrap them in objects known to the garbage collector, and attach finalizers that close the descriptors. You have a for_each_fd procedure that should at least allow you to close all file descriptors, for example when your program is about to exec another program.
Let's start with one: FD._all_fds can't sensibly be accessed from multiple threads at the same time. The file descriptors in the set are logically owned by particular pieces of code, and those pieces of code could be closing them while you're trying to for_each_fd on them.
Well, OK. Let's restrict the problem, then. Let's say there's only one thread. Next bug?
Another bug is that signals cause arbitrary code to run, at arbitrary points in your program. For example, if in the close method, you get a SIGINT after the os.close but before removing the file descriptor from the set, causing an exception to be thrown, you will be left with a closed descriptor in the set. If you swap the operations, you leak an fd. Neither situation is good.
The root cause of the problem here is that asynchronous signals introduce concurrency. Signal handlers are run in another logical thread of execution in your program -- even if they happen to share the same stack (as they do in CPython).
OK, let's mask out signals then. (This is starting to get ugly). What next?
What happens if, during for_each_fd, one of the FD objects becomes unreachable?
The Python language does not guarantee anything about when finalizers (__del__ methods) get called. (Indeed, it doesn't guarantee that they get called at all.) The CPython implementation will immediately finalize objects whose refcount equals zero. Running a finalizer on the method will mutate FD._all_fds, while it is being traversed, in this case.
The implications of this particular bug are either that CPython will throw an exception when it sees that the set was modified while iterating over it, or that the finalizer happens to close the fd being processed. Neither one of these cases are very nice, either.
This is the bug I wanted to get to with this article. Like asynchronous signals, finalizers introduce concurrency: even in languages with primitive threading models like Python.
Incidentally, this behavior of running finalizers from the main thread was an early bug in common Java implementations, 15 years ago. All JVM implementors have since fixed this, in the same way: running finalizers within a dedicated thread. This avoids the opportunity for deadlock, or for seeing inconsistent state. Guile will probably do this in 2.2.
For a more thorough discussion of this problem, Hans Boehm has published a number of papers on this topic. The 2002 Destructors, Finalizers, and Synchronization paper is a good one.