Saturday, June 20, 2009

Incoming, Part 2-- creating speedy receivers

Sorry for the delay in a post; there's been lots going on here including house guests and professional obligations.

So in the last post I talked about the various ways that I wanted to allow users to create receivers that would get messages delivered to them, and how I didn't want them to be constrained in using almost any kind of Python callable to perform the role of message handler. I showed how I created a factory that supported all of these models of receiver creation, and showed the different ways that the resultant classes could be used to manufacture receivers of different kinds. I now want to dive down further into the topic of receiver implementation and examine some of the issues involving how to make them perform well.

Now this section of the extension has undergone the greatest amount of change as I kept coming up with new ways to make the extension faster. The early going was dreadful; while the pure C example programs from 29West could handle over 800K msgs/sec on my testbed, my first attempts with Python analogs using the extension managed just over 40K msgs/sec. After a lot of tinkering I managed to get Python to deal with 400K msgs/sec, and I believe that if 29West were to implement some of the thread interfaces I mentioned in a previous post, it could go even higher. While I won't go into the entire development arc, I will share what I think are the important points I learned along the way.

In the end, there were three main areas whose optimization provided the biggest performance boosts: mapping from C data to Python objects, invoking the user's Python callback, and optimizing the performance of the callback itself.

The first issue to be tackled was how to tie a C callback efficiently back to relevant Python objects from within Pyrex. That last bit was the crucial part since Pyrex's type safety can stand in the way of a lot of approaches. Pyrex can only do direct translation between simple Python and C types; it isn't happy to let you cast an Python object reference a C pointer, even if that's just void *.

To start examining this issue, let's have a look at a fragment of the code from the receiver factory I discussed in the last post:

with nogil:
result = lbmh.lbm_rcv_create(rcv, context, cTopic,
pyReceiver, eq)
The fifth and sixth parameters here are the keys for tying the LBM message delivery callback to Python. The fifth, here _rcvEventCallback, is the callback function that LBM invokes when a message has arrived. The sixth is the so-called “client data” pointer, a void * to some data opaque to LBM, which is associated with the newly created receiver. This pointer is passed as an argument to the callback function along with the receiver that just received the message.

Now an obvious approach would have been to pass the receiving Python object as the client data pointer, but since Pyrex wouldn't allow me to cast it to a void *, I thought that avenue was closed off to me. At that time I was content to have that restriction, as I admit that I was a little uncomfortable about handing a reference counted object over to C code that could care less about references.

So at first I tried using the address of the C receiver object (after it was allocated with lbm_rcv_create), recast to an int, as a key to into a dict that mapped the address to the extension type receiver object. This worked, although not as fast as I'd hoped, and it was also vulnerable to certain race conditions-- the C receiver object had to allocated before I could map it, which means that a message could theoretically arrive before I had the structures set up that told me what to do with it.

In the next iteration, I started generating serial integer keys to use for the mapping, and would pass the key as the client data pointer into the lbm_rcv_create call. This allowed me to establish the mapping before the C object was created, thus eliminating the race condition. However, it wasn't any faster, partly due to having to create a Python integer object from a C int for every message that arrived, and partly due to the lookup time in the dictionary. I did get some improvement by changing how the dictionary was initialized in Pyrex from this:

_mappingDict = {}
to this:

cdef dict _mappingDict
_mappingDict = {}
The former approach requires dynamic method lookups when you want to insert a key or lookup a value associated with a key. With the latter approach, Pyrex knows that _mappingDict is going to be a Python dictionary and will then use the dictionary's C API, bypassing the costly method lookups.

But the creation of the Python integer objects and their use in dictionary lookups was still too costly, so I decided to stop being a wimp and see if I could figure out a way to get Pyrex to accept me providing a Python object as the value of the void * client data pointer. Besides satisfying Pyrex's idea of the type of things, I had to have control over the lifecycle of both the C receive object and the Python receiver object so that LBM wouldn't call out with a reference to a Python object that no longer existed.

Since I couldn't find a way to tell Pyrex that my Python object could be used as the void *, I decided to lie to Pyrex about what sort of parameters lbm_rcv_create could take. This function's signature in the lbm.h file is:

int lbm_rcv_create(lbm_rcv_t **rcvp, lbm_context_t *ctx,
lbm_topic_t *topic, lbm_rcv_cb_proc proc,
void *clientd, lbm_event_queue_t *evq)
But I told Pyrex that the signature is:

int lbm_rcv_create(lbm_rcv_t **rcvp, lbm_context_t *ctx,
lbm_topic_t *topic, lbm_rcv_cb_proc proc,
object clientd, lbm_event_queue_t *evq)
So I told Pyrex that the client data argument was simply a Python object. This only matters to Pyrex; you direct it to add the actual lbm.h to the generated C file so that it compiles with the proper signature. Of course, this results in a warning that the pointer types are compatible, but since it's opaque to LBM it doesn't matter.

Of course, we then have to change the signature of the callback function as well from:

int (*lbm_rcv_cb_proc)(lbm_rcv_t *rcv, lbm_msg_t *msg, void *clientd)

int (*lbm_rcv_cb_proc)(lbm_rcv_t *rcv, lbm_msg_t *msg, object clientd)
Again, telling Pyrex that the expected parameter was a Python object. Another warning is the result, but again it's harmless.

That shuts up Pyrex, but it hardly makes passing a Python object around safe yet. As long as LBM holds on to this pointer, we have to make sure that the object's reference count accurately reflects this. Fortunately, we can get access to the Python C API from within Pyrex, and so can utilize the functions that manage the reference counts on objects. We still have to cheat a bit and tell Pyrex that Py_INCREF and Py_DECREF take objects rather than the Python C type, but that's fine because Pyrex's “object” actually resolves down to the Python C type. So we need a little bit of special include code at the top of the module to tell Pyrex to bring in Python.h and treat these two functions accordingly:

def extern from "Python.h":
void Py_INCREF(object)
void Py_DECREF(object)
And after that, we're free to manage reference counts. Here are the relevant lines from the receiver factory code I last posted:

with nogil:
result = lbmh.lbm_rcv_create(rcv, context, cTopic,
pyReceiver, eq)
So our Python receiver extension type becomes the client data for the callback, and it is this object that understands how to actually call back to user code.

With that in hand, the callback function itself is now easy and runs very fast:

cdef int _rcvEventCallback(lbmh.lbm_rcv_t *rcv, lbmh.lbm_msg_t *msg,
object clientd) with gil:
cdef AbstractReceiver receiver
cdef message.Message pyMessage
receiver = clientd
if receiver is not None: #still a little paranoid
pyMessage = Message()._setMsg(msg)
return lbmh.LBM_OK
No more mapping shenanigans; we can directly call this object safely as we've taken care of accounting for the reference.

What isn't shown here is where the Py_DECREF occurs; that's part of the new functionality in the extension that provides better lifecycle management of extension types fronting the LBM C objects, but that's a topic for another post.

So in the callback we now can quickly use our receiver, create a message object, and ask the receiver to send the message to the user's callback through the use of _routeMsgCallback. This concludes addressing the issue of mapping from C to Python objects in an efficient manner. That takes us to the next issue, invoking the user's callback itself. That'll be the topic of the next post.


  1. Tom,
    Good series on Python and low latency messaging.

  2. Thanks Craig. Now if I can only find the time to keep writing...