Saturday, June 20, 2009

Incoming, Part 2-- creating speedy receivers

Sorry for the delay in a post; there's been lots going on here including house guests and professional obligations.

So in the last post I talked about the various ways that I wanted to allow users to create receivers that would get messages delivered to them, and how I didn't want them to be constrained in using almost any kind of Python callable to perform the role of message handler. I showed how I created a factory that supported all of these models of receiver creation, and showed the different ways that the resultant classes could be used to manufacture receivers of different kinds. I now want to dive down further into the topic of receiver implementation and examine some of the issues involving how to make them perform well.

Now this section of the extension has undergone the greatest amount of change as I kept coming up with new ways to make the extension faster. The early going was dreadful; while the pure C example programs from 29West could handle over 800K msgs/sec on my testbed, my first attempts with Python analogs using the extension managed just over 40K msgs/sec. After a lot of tinkering I managed to get Python to deal with 400K msgs/sec, and I believe that if 29West were to implement some of the thread interfaces I mentioned in a previous post, it could go even higher. While I won't go into the entire development arc, I will share what I think are the important points I learned along the way.

In the end, there were three main areas whose optimization provided the biggest performance boosts: mapping from C data to Python objects, invoking the user's Python callback, and optimizing the performance of the callback itself.

The first issue to be tackled was how to tie a C callback efficiently back to relevant Python objects from within Pyrex. That last bit was the crucial part since Pyrex's type safety can stand in the way of a lot of approaches. Pyrex can only do direct translation between simple Python and C types; it isn't happy to let you cast an Python object reference a C pointer, even if that's just void *.

To start examining this issue, let's have a look at a fragment of the code from the receiver factory I discussed in the last post:

with nogil:
result = lbmh.lbm_rcv_create(rcv, context, cTopic,
_rcvEventCallback,
pyReceiver, eq)
The fifth and sixth parameters here are the keys for tying the LBM message delivery callback to Python. The fifth, here _rcvEventCallback, is the callback function that LBM invokes when a message has arrived. The sixth is the so-called “client data” pointer, a void * to some data opaque to LBM, which is associated with the newly created receiver. This pointer is passed as an argument to the callback function along with the receiver that just received the message.

Now an obvious approach would have been to pass the receiving Python object as the client data pointer, but since Pyrex wouldn't allow me to cast it to a void *, I thought that avenue was closed off to me. At that time I was content to have that restriction, as I admit that I was a little uncomfortable about handing a reference counted object over to C code that could care less about references.

So at first I tried using the address of the C receiver object (after it was allocated with lbm_rcv_create), recast to an int, as a key to into a dict that mapped the address to the extension type receiver object. This worked, although not as fast as I'd hoped, and it was also vulnerable to certain race conditions-- the C receiver object had to allocated before I could map it, which means that a message could theoretically arrive before I had the structures set up that told me what to do with it.

In the next iteration, I started generating serial integer keys to use for the mapping, and would pass the key as the client data pointer into the lbm_rcv_create call. This allowed me to establish the mapping before the C object was created, thus eliminating the race condition. However, it wasn't any faster, partly due to having to create a Python integer object from a C int for every message that arrived, and partly due to the lookup time in the dictionary. I did get some improvement by changing how the dictionary was initialized in Pyrex from this:

_mappingDict = {}
to this:

cdef dict _mappingDict
_mappingDict = {}
The former approach requires dynamic method lookups when you want to insert a key or lookup a value associated with a key. With the latter approach, Pyrex knows that _mappingDict is going to be a Python dictionary and will then use the dictionary's C API, bypassing the costly method lookups.

But the creation of the Python integer objects and their use in dictionary lookups was still too costly, so I decided to stop being a wimp and see if I could figure out a way to get Pyrex to accept me providing a Python object as the value of the void * client data pointer. Besides satisfying Pyrex's idea of the type of things, I had to have control over the lifecycle of both the C receive object and the Python receiver object so that LBM wouldn't call out with a reference to a Python object that no longer existed.

Since I couldn't find a way to tell Pyrex that my Python object could be used as the void *, I decided to lie to Pyrex about what sort of parameters lbm_rcv_create could take. This function's signature in the lbm.h file is:

int lbm_rcv_create(lbm_rcv_t **rcvp, lbm_context_t *ctx,
lbm_topic_t *topic, lbm_rcv_cb_proc proc,
void *clientd, lbm_event_queue_t *evq)
But I told Pyrex that the signature is:

int lbm_rcv_create(lbm_rcv_t **rcvp, lbm_context_t *ctx,
lbm_topic_t *topic, lbm_rcv_cb_proc proc,
object clientd, lbm_event_queue_t *evq)
So I told Pyrex that the client data argument was simply a Python object. This only matters to Pyrex; you direct it to add the actual lbm.h to the generated C file so that it compiles with the proper signature. Of course, this results in a warning that the pointer types are compatible, but since it's opaque to LBM it doesn't matter.

Of course, we then have to change the signature of the callback function as well from:

int (*lbm_rcv_cb_proc)(lbm_rcv_t *rcv, lbm_msg_t *msg, void *clientd)
...to:

int (*lbm_rcv_cb_proc)(lbm_rcv_t *rcv, lbm_msg_t *msg, object clientd)
Again, telling Pyrex that the expected parameter was a Python object. Another warning is the result, but again it's harmless.

That shuts up Pyrex, but it hardly makes passing a Python object around safe yet. As long as LBM holds on to this pointer, we have to make sure that the object's reference count accurately reflects this. Fortunately, we can get access to the Python C API from within Pyrex, and so can utilize the functions that manage the reference counts on objects. We still have to cheat a bit and tell Pyrex that Py_INCREF and Py_DECREF take objects rather than the Python C type, but that's fine because Pyrex's “object” actually resolves down to the Python C type. So we need a little bit of special include code at the top of the module to tell Pyrex to bring in Python.h and treat these two functions accordingly:

def extern from "Python.h":
void Py_INCREF(object)
void Py_DECREF(object)
And after that, we're free to manage reference counts. Here are the relevant lines from the receiver factory code I last posted:

Py_INCREF(pyReceiver)
with nogil:
result = lbmh.lbm_rcv_create(rcv, context, cTopic,
_rcvEventCallback,
pyReceiver, eq)
So our Python receiver extension type becomes the client data for the callback, and it is this object that understands how to actually call back to user code.

With that in hand, the callback function itself is now easy and runs very fast:

cdef int _rcvEventCallback(lbmh.lbm_rcv_t *rcv, lbmh.lbm_msg_t *msg,
object clientd) with gil:
cdef AbstractReceiver receiver
cdef message.Message pyMessage
receiver = clientd
if receiver is not None: #still a little paranoid
pyMessage = Message()._setMsg(msg)
receiver._routeMsgCallback(pyMessage)
return lbmh.LBM_OK
No more mapping shenanigans; we can directly call this object safely as we've taken care of accounting for the reference.

What isn't shown here is where the Py_DECREF occurs; that's part of the new functionality in the extension that provides better lifecycle management of extension types fronting the LBM C objects, but that's a topic for another post.

So in the callback we now can quickly use our receiver, create a message object, and ask the receiver to send the message to the user's callback through the use of _routeMsgCallback. This concludes addressing the issue of mapping from C to Python objects in an efficient manner. That takes us to the next issue, invoking the user's callback itself. That'll be the topic of the next post.

Thursday, June 4, 2009

Incoming-- wrapping up message receivers

Given the last post on upcall performance, I thought that instead of talking about the extension beginning with the most fundamental classes (that is, wrapping the context object), it might make more sense to follow up directly with with some discussion on how I'm looking at the problem of making the receipt of messages fast and efficient while still flexible. This post is going to assume a bit of knowledge about 29West; specifically, it will be referring to the concepts of topics, event queues, and contexts, although some high-level descriptions will be provided.

I wanted to provide for a number of different ways to receive messages, in essence allowing any Python callable to be used as the target of message upcalls, while at the same time provide some basic classes that could be subclassed to make the process more simple if desired. Across all of these approaches I needed to make the handling of messages as fast as possible.

In LBM, you need to provide a few raw ingredients to create a message receiver:

  • A context object, which establishes the the overall logical messaging environment for sending and receiving messages.

  • A topic object, which identifies a named channel to which messages are posted to or received from.

  • An optional event queue which decouples the arrival of messages from their processing.

  • Some internal bookkeeping items to tell LBM how to route messages back to your application.
I took the view that in a significant number of applications, the majority of the receivers created would probably be associated with a single context object, a single event queue, and further would be handled in a single fashion, say with a bound method on different instances of a class. With this motivation, I decided to create a receiver factory that would keep these items together for the user to make creation of the receiver simple. The factory's definition from the pxd file is pretty simple:

cdef class ReceiverFactory:
cdef core.Context ctx
cdef core.EventQueue defaultEventQueue
cdef object defaultReceiverClass
The only required piece data needed to create a factory instance is the context; the event queue and receiver class (more on this below) are optional.

And here are the key pieces of the implementation from the pyx file:

cdef class ReceiverFactory:
"""
This class takes 1 required and 2 optional arguments. The first arg
is the core.Context object that the factory will be creating
receivers for. The second, defaultReceiverClass, is a callable that
yields instances of subclasses of receiver.AbstractReceiver. It
defaults to Receiver. The third is an optional event queue you'd
like to be shared between all receivers created with this factory.

If a defaultReceiverClass is specified, it must be a callable that
yields a subclass of receiver.AbstractReceiver (thus it may be a
class object or other callable). The callable must take 2 arguments:
the first argument is the calling ReceiverFactory object (self),
and the second is whatever the current value for eventCallback is.
If the class you wish to use takes more arguments, consider passing
a closure that provides the additional args.

NOTICE: Since the expected class to be produced by
createReceiverForTopic is one of the extension classes, duck typed
classes won't work properly here; the class of the object yielded
by the defaultReceiverClass must be a subclass of AbstractReceiver.
"""
def __init__(self, core.Context ctx, defaultReceiverClass=Receiver,
defaultEventQueue=None):
#you may override this __init__ as long as you call it from the
#derived class's __init__
self.ctx = ctx
self.defaultEventQueue = defaultEventQueue
self.defaultReceiverClass = defaultReceiverClass

def createReceiverForTopic(self, Topic pyTopic, eventCallback=None,
core.EventQueue eventq=None,
receiverClass=None):
"""
Create a receiver for the specified topic. If an eventCallback
is provided, pass it into the receiver class's constructor
(see below). If an eventq is provided, attach the queue to the
receiver so it is used for event delivery.

If a receiverClass is specified, it must be a callable that
yields a subclass of receiver.AbstractReceiver (thus it may be
a class object or other callable). The callable must take two
arguments: the first argument is the calling factory object
(self), and the second is whatever the current value for
eventCallback is. If the class you wish to use takes more
arguments, consider passing a closure that provides the
additional args. If the receiverClass is specified here, it
overrides any that was specified at the factory's creation
with defaultReceiverClass. If none is specified here, then
the one specified at the factory's creation is used.

NOTICE: Since the expected class is one of the extension
classes, duck typed classes won't work properly here; the
class of the object yielded by the callable must be a
subclass of AbstractReceiver.
"""
cdef int result
cdef object rcvrClass
cdef lbmh.lbm_context_t *context
cdef lbmh.lbm_rcv_t **rcv
cdef lbmh.lbm_topic_t *cTopic
cdef AbstractReceiver pyReceiver
cdef lbmh.lbm_event_queue_t *eq
if receiverClass is None:
rcvrClass = self.defaultReceiverClass
else:
rcvrClass = receiverClass
pyReceiver = rcvrClass(self, eventCallback)
if not typecheck(pyReceiver, AbstractReceiver):
raise LBMWException("The provided callable must yield “
“a subclass of AbstractReceiver")
context = self.ctx._getContext()
if eventq is None:
eventq = self.defaultEventQueue
if eventq is None:
eq = NULL
else:
eq = eventq._getEQ()
cTopic = pyTopic._getTopic()
rcv = pyReceiver._getReceiverPtrPtr()
Py_INCREF(pyReceiver) #key to preventing dangling pointers!
with nogil:
result = lbmh.lbm_rcv_create(rcv, context, cTopic,
<lbmh.lbm_rcv_cb_proc> _rcvEventCallback,
pyReceiver, eq)
if result == lbmh.LBM_FAILURE:
raise LBMFailure(fn="lbm_rcv_create")
return pyReceiver
So a bit of discussion is in order here.

The general approach is to create a factory that works cooperatively with a user-supplied factory to produce the appropriate Receiver instances. By default, the “user-supplied” factory is a class object, and it yields an instance of itself anytime a new receiver is created. However, the user factory can be any callable, as long as it produces an instance that is a derived class of the extension's AbstractReceiver class. This opens up a lot of possibilities as to what you might provide for the user factory.

When the user creates the ReceiverFactory instance, he provides the context that new receivers are to be associated with, and he has the option of supplying the default receiver factory object that will be used to create receiver objects, and an event queue for queuing arrived messages for processing.

When the user calls createReceiverForTopic, the only required argument is the topic that identifies the channel through which the receiver will acquire messages. The other arguments allow overriding the user's default receiver instance factory and event queue that were both provided at ReceiverFactory instantiation time, and to also specify an alternate event callback that will be the target of upcalls when messages arrive. This last bit is key; the default behavior of the ReceiverFactory is to generate a receiver object whose methods get invoked whenever a message for the receiver arrives. The eventCallback keyword argument allows the user to specify an alternative callback target; in this case, the receiver object simply acts as a control construct to manage the routing of messages to the desired upcall target.

A couple of examples of how to use this would be helpful. In each, assume that aContext is an LBM context object, aTopic is an LBM topic object, and anEventQueue is an LBM event queue object.

#Example 1: create a receiver that uses
#an independent function for callbacks
def simpleCallback(aReceiver, theMessage):
#do stuff with theMessage
return

receiverFac = ReceiverFactory(aContext)
newReceiver = receiverFac.createReceiverForTopic(aTopic,
eventCallback=simpleCallback)

#Example 2: create a receiver subclass whose instances
#are the target of callbacks
class MyReceiver(receiver.Receiver):
def handleMsg(self, theMessage):
#do stuff with theMessage
return

receiverFac = ReceiverFactory(aContext, defaultReceiverClass=MyReceiver)
newReceiver = receiverFac.createReceiverForTopic(aTopic)

#Example 3: like Example 2, but specifying MyReceiver at creation time
receiverFac = ReceiverFactory(aContext)
newReceiver = receiverFac.createReceiverForTopic(aTopic,
receiverClass=MyReceiver)

#Example 4: Using a function for the receiverClass callable
class MyReceiver(receiver.Receiver):
def handleMsg(self, theMessage):
#do stuff with theMessage
return

class MyOtherReceiver(receiver.Receiver):
def handleMsg(self, theMessage):
#do other stuff
return

def vendReceiver(rFac, eventQueue):
if someTestOnRecvFac(rFac):
theReceiver = MyReceiver(rFac, eventQueue)
else:
theReceiver = MyOtherReceiver(rFac, eventQueue)
return theReceiver

receiverFac = ReceiverFactory(aContext,
defaultReceiverClass=vendReceiver)
newReceiver = receiverFac.createReceiverForTopic(aTopic)
So there's lots of ways to get the factory to generate different sorts of receivers.

There are a number of calls of the form “_getTopic()”, “_getEQ()” that are used to acquire the underlying pointer to the managed C object that the Python extension type is wrapping. These methods are only available in the extension itself, and by convention I start them with an underscore to identify that.

The only other magic worth noting here is the use of Py_INCREF() on the newly created receiver instance. This is because we're about to give a reference to this instance to LBM and we want to make sure that this reference is counted properly by Python. This reference is dropped when the user indicates that they no longer want the receiver; I'll show how that works later.

But the above is only part of the story. This takes care of the creation aspect, but I still need to concern myself with the actual delivery of messages. That means we have to focus on the receivers themselves in the next post.