diary of a wrap: 2009

Friday, August 7, 2009

Incoming Revisited-- You're Never Too Old To Learn

While writing so much about receiving messages and discussing performance, I came up with a few ideas about how the Python extension could become a bit faster. I've discussed most of the ideas already in past posts, so there shouldn't really be any surprises here if you've been reading along. This post is going to build on the ideas from those, so you might need to go back to a few of the older “Incoming” posts to fully understand what's being discussed here.

The potential speedups I had in mind were:

Get rid of the sequential search for the proper handling method in the “routing” receiver objects, and start using the message type as a direct index into the handler list.

Replace the routing receiver's dispatch table with a C array. Currently it's a Python list, but that was largely for convenience; the list is a lot slower to access than an array.

Loan a Python thread to LBM to use for message handling. Early on I showed how much faster Python handled upcalls from extension modules if the thread was known to it, and I wanted to see if I could accrue some benefit doing the same in the LBM extension.

I won't keep you in suspense; all changes provided some benefit, although some much more than others.

First off was eliminating the sequential search for the proper receiver method for a particular message type. I wasn't expecting very much improvement from this change since all that was being eliminated was a loop setup and a C comparison, and my expectations were largely met. You may recall from a previous post that the method dispatch table was ordered such that the message types with the highest probability of occurring were at the front of the dispatch table, and thus for the most part the first comparison in the sequential search was all that was needed to find the proper handler, since in the majority of cases the most frequently occurring message was a data message, and that handler lived at index 0 of the table.

It's hard to even say how much improvement was gained; I certainly observed a few thousand more messages a second a lot of the time, but due to measurement noise it wasn't a consistent improvement. However, it gave enough benefit that I decided to leave the change in.

A much bigger improvement came from changing the routing receiver's dispatch table from being a Python list to a C array. Frankly, I don't know why this didn't occur to me before-- both searching and direct indexing into a Python list was certainly going to entail a lot more processing than simply doing the same with a C array. The StaticRoutingReceiver extension type changed to support this like so:

cdef class StaticRoutingReceiver(AbstractRoutingReceiver):
    cdef void * dispatchList[lbmh.LBM_MSG_UME_REGISTRATION_COMPLETE_EX]

I used the highest-valued message type as the size of the array. I still don't like this to some degree; I wish 29West would add a symbolic constant LBM_MAX_MSG_TYPE that they change from release to release of LBM so that I won't have to make sure that any particular version of the extension had the proper symbolic name in there.

Setting this thing up involved a little more manual management of Python object reference counts. This is because I was going to put references to Python bound methods into C pointers, and I needed to make sure that the method objects didn't disappear. So __init__() changed to look like the following:

def __init__(self, myFactory, msgCallback=None):
    cdef object meth
    super(StaticRoutingReceiver, self).__init__(myFactory, msgCallback)
    for 0 <= i < lbmh.LBM_MSG_UME_REGISTRATION_COMPLETE_EX:
        meth = getattr(self.msgCallback, _msgTypeToMethodMap[i])
        Py_INCREF(meth)
        self.dispatchList[i] = <void *>meth

It's all pretty straightforward; find the desired method, increment the reference count, and then store it into the dispatchList. There's a similar piece of work to manage decrementing reference counts when the receiver is “shut down”, but covering that is a topic for another post. It's interesting that Pyrex won't actually let you create a C array of “object”; you have to make it an array of void * and do a cast.

But now, routing an upcall is nothing; here's what _routeMsgCallback() evolved into:

cdef _routeMsgCallback(self, message.Message msg):
    #use the message type as an index into the dispatchList
    cdef object meth
    meth = <object> self.dispatchList[msg.type]
    meth(msg)

Unlike going from a sequential search to direct indexing into a Python list, moving from a list to an array provided significant improvements in the rate that messages could be received. Roughly 60K more messages/sec could be handled in my test program, and that was a steady improvement.

That takes us to the third speedup, loaning a Python thread to LBM. Back in THIS POST I showed how Python performed much better when it was familiar with the thread that was acquiring the GIL when calling up into Python from a C extension. In that post I wished for an interface in LBM that would allow an app to provide a “thread factory”, a function that LBM could call that would provide it any threads it needed.

In reality, LBM already partially meets this need with what's referred to as the “sequential” mode of operation of an LBM context. By default, an LBM context object operates in what's known as “embedded” mode; it internally creates the thread that's used for calling back into applications. In the case of Python, it's this foreign thread that creates a slowdown. However, contexts can also be created in what's known as “sequential” mode. In this mode, no internal thread is created; instead, an application invokes a function with the context that activates message processing, and the thread that invokes this function is also used for performing callbacks.

The LBM extension supports setting sequential mode and also invoking the processing function via a method on Context objects, processEvents(). This method blocks the calling thread while messages are processed, and returns periodically to allow the thread to do other tasks (like figuring out if it's time to exit). Enabling sequential mode is handled by setting an option on the context attribute object like so:

self.contextAttrs.setOption("operational_mode", "sequential")

This has to be done on the context attribute object because once the context has been created the operational mode can't be changed, so the context has to be created in the proper mode.

Once you have a sequential mode context, you must invoke processEvents() on it in order for receivers to have any messages delivered to them. I wanted my test program to support both embedded and sequential operation, so I not only made setting sequential mode dependent on command line parameters, I changed the test run method to optionally spawn threads for calling processEvents():

    def doProcessEvents(app, ctx):
        #ctx is a Context object
        while not app.quit:
            #process for 1000 millis, then loop again
            ctx.processEvents(1000)
        
    if self.options.sequential:  #run sequentially
        for ctx in self.contexts:
            cthread = threading.Thread(target=doProcessEvents,
                                       args=(self, ctx))
            cthread.start()

    while not self.quit:
        #pretty much just print stats until we're done
        time.sleep(1)
        self.printStats()
        if (self.options.exitFlagFile and 
                os.path.exists(self.options.exitFlagFile)):
            self.quit = True

So doProcessEvents() provides a starting point where new threads can begin execution; it's within this function that the processEvents() method is invoked on the context. The explanation of the 'for' looping over self.contexts is that this program supports using multiple contexts, and each context must have processEvent() invoked on it in a separate thread. So if the program is to run with sequential contexts, a thread is created for each, and they call processEvents() on their respective contexts until the application is told to exit. However, if the program is to run in embedded mode, then nothing extra needs to be done; the processing threads have already been internally created by the context objects, and all the program has to do is wait for updates to flow in.

Of the three speedups, this had the most profound impact. Depending on the message size (and test run), upwards of 100K more messages/sec could be handled by the receiver when the underlying context was running in sequential mode.

It's time to look at a bit of data to see how we're doing. The following graph shows the message handling rates at different message sizes for a Python receiver program running in different modes compared with an equivalent C program (the C program is 29West's stock lbmmrcv program). The message source for all of these programs is 29West's lbmmsrc C program, which is able to produce messages faster than Python can consume them. The idea here is to get some notion as to how the extension is doing compared to C.

The Python receiver program has a number of different ways it can be run, allowing us to see the impact of various speedups. In all cases it uses a routing receiver since those will be able to handle data messages faster (since determining that a message is a data message happens in C). It actually has two equivalent receivers it can use, one in pure Python, and the other a Pyrex extension receiver. It can also run in either embedded or sequential mode. This provides four execution combinations of receiver and operational mode, each of which is run with a list of different message sizes. Additionally, the C receiver is run for each message size as well to provide a comparison. Each test run involved the transmission of 7M messages. The message rates reported here are in the K's of msgs/sec.

One thing I find maddening when running these tests is that the results are only approximately consistent. I know that's to be expected to some degree, but the cause of the variability is too opaque for my liking. For instance, I've seen the C test program running much closer to 900K msgs/sec, and in fact the sequential Python/Pyrex combination routinely runs at 490K msgs/sec. In either case, it's impossible to tell what else might be going on that impacts performance. For some reason in this test Python/Pyrex underperformed a bit, so that's what I charted.

Nonetheless, I'm pretty pleased. The first operating version of the extension could only do 40K msgs/sec, so it's pretty hard to complain about an order of magnitude improvement.

Wednesday, July 8, 2009

Incoming, Part 3-- back to Pythonland

I guess I should just stop apologizing for late posts-- I'm up against various deadlines, and we all know that that means, so I'll just get on with it and hope there's someone still reading.

So the last time around, I talked about building the C receiver that acts as the bridge from C to Python, and putting structures in place to find the correct Python object to call quickly. Now it's time to make it back to Python code, and more importantly user code, and it turns out there's lots to be considered when you hit this realm. I have to say that I was surprised how much the Python code, even trivial stuff, slowed things down, and it underscored for me how important it is that for maximum performance the Python receiver get implemented as a Pyrex object as well. But that was only one of the implementation challenges.

One of the design goals I stated early on was that I wanted to have few restrictions as to what sort of callable could be used by a client to handle message callbacks. This means that I couldn't call directly into the user's callback from the _rcvEventCallbck() C function; there were a number of potential variations to be accounted for, and taking care of them all seemed to be best served in the various Receiver extension types. So I decided to first route callback control through a C method in the Receiver, and then make further decisions as to what to do there.

The entry point for callbacks into the Receiver object is _routeMsgCallback(), and it's defined like this in the AbstractReceiver class in the Pyrex pxd file:

cdef class AbstractReceiver(coption.OptionMgr):
    cdef object msgCallback
    cdef _routeMsgCallback(self, message.Message msg)

And to recap, here's how that method gets invoked from the C callback function:

cdef int _rcvEventCallback(lbmh.lbm_rcv_t *rcv, lbmh.lbm_msg_t *msg,
                           object clientd) with gil:
    cdef AbstractReceiver receiver
    cdef message.Message pyMessage
    receiver = <AbstractReceiver> clientd
    if receiver is not None:                #still a little paranoid
        pyMessage = Message()._setMsg(msg)
        receiver._routeMsgCallback(pyMessage)
    return lbmh.LBM_OK

Finally, here are three relevant methods from the implementation of Receiver, the basic derived class of AbstractReceiver:

cdef class Receiver(AbstractReceiver):
    def __init__(self, myFactory, msgCallback=None):
        if msgCallback is None:
            msgCallback = self.__class__.handleMsg
        if msgCallback is not None and not callable(msgCallback):
            raise LBMWException("msgCallback isn't callable; "
                                "no way to get messages out")
        super(Receiver, self).__init__(myFactory, msgCallback)
        
    cdef _routeMsgCallback(self, message.Message msg):
        self.msgCallback(self, msg)

    def handleMsg(self, message.Message msg):
        return

There's some subtle things going on here to support both user-specified callback functions as well as derived classes of Receiver. Recall from this post that when working with the receiver factory the user can specify either an eventCallback callable or a receiverClass callable which yields Receiver instances; either of these object are called when messages arrive. Therefore the Receiver class figure out which kind of callback is being used and has to set up its msgCallback attribute in __init__() to be the function or unbound method to call whenever _routeMsgCallback() is invoked from the C callback function. For subclasses of Receiver, I call the unbound handleMsg method because I can then invoke it just as if I was invoking a callback function, and thus no additional logic is needed to determine what kind of object is being called. It probably has the benefit of being slightly faster since no bound method object has to be created for each callback.

So when _routeMsgCallback() calls self.msgCallback(self, msg), that's a call out to user code. There, the user needs to examine the provided Message object (another extension type) and figure out what they want to do based on the message's type. This was the first point where I encountered pure-Python's slowness; checking the type and extracting the data from the message took a surprising amount of time. When I had a derived class's handleMsg() call do no work and return immediately, things sped up appreciably. I needed to provide more assistance in order for the user's code to perform well.

One thing that I could do for the user is determine what type of message they received and provide a way to more finely route the callback to a method. This resulted in a new abstraction, the AbstractRoutingReceiver. This class establishes a protocol for passing messages to specific methods matched to each type of message. This means that I have the opportunity in C to detect the message type and select the right method to invoke which I could do much faster than in pure Python. Here's a bit of that code to give you an idea of what's going on:

cdef class AbstractRoutingReceiver(AbstractReceiver):
    def __init__(self, myFactory, msgCallback=None):
        if self.__class__ is AbstractRoutingReceiver:
            raise LBMWException("You can't directly make instances of "
                        "AbstractRoutingReceiver, only its subclasses")
        if msgCallback is not None:
            if not typecheck(msgCallback,
                             callbackMixins.RoutingReceiverMixin):
                raise LBMWException("The provided msgCallback isn't an "
                                    "instance of RoutingReceiverMixin")
        else:
            msgCallback = self
        super(AbstractRoutingReceiver, self).__init__(myFactory,
                                                      msgCallback)
        
    def dataMsg(self, msg):
        return
    
    def endOfSourceMsg(self, msg):
        return
    
    def requestMsg(self, msg):
        return

...and so on. The __init__() method here is very similar to the one in Receiver, except that if a msgCallback is provided, it can't be a plain function-- it must a subclass of the RoutingReceiverMixin class (the typecheck() function is a Pyrex addition that tests types in a manner similar to isinstance(), and is the preferred way to verify type in Pyrex code). The RoutingReceiverMixin class defines all of the per-message type methods that a callback object for this class must support. The _routeMsgCallback() method (defined in derived classes of AbstractRoutingReceiver) then makes the decision as to which method to invoke.

However, finding the right method is another potential time-sink. Not only is method lookup in Python costly, it involves the creation of a new bound method object every time you look up a method. This is well-defined Python behavior, but not the best for high performance apps. So I created two derived classes of AbstractRoutingReceiver to give users a choice regarding flexibility and speed: the first, DynamicRoutingReceiver, fully adheres to the dynamic nature of method lookups in Python, and thus operates more slowly:

cdef class DynamicRoutingReceiver(AbstractRoutingReceiver):
    cdef _routeMsgCallback(self, message.Message msg):
        getattr(self, _msgTypeToMethodMap[msg.type])(msg)

The variable _msgTypeToMethodMap is a dict that maps LBM message types to the corresponding name of the handling method; this is then the attribute name that is then looked up with getattr() on self, and the message is then dispatched to the discovered method. This retains all of Python's method lookup semantics (that is, a method could dynamically change between invocations), at the expense of some performance.

To provide better fine-grained method dispatching, I created the StaticRoutingReceiver class. This class trades away a bit of dynamism for better lookup and dispatch performance. It does this by computing a dispatch table during __init__(), capturing the bound method at init time for each message handler in a list. The order of the items in the list arranged so that the most frequently received message types (most importantly, LBM_MSG_DATA) are at the front of the list. The definition of the class in the pxd file is:

cdef class StaticRoutingReceiver(AbstractRoutingReceiver):
    cdef list dispatchList
    cdef _routeMsgCallback(self, message.Message msg)

And the implementation in the pyx looks like this:

cdef class StaticRoutingReceiver(AbstractRoutingReceiver):

    def __init__(self, myFactory, msgCallback=None):
        super(StaticRoutingReceiver, self).__init__(myFactory,
                                                    msgCallback)
        self.dispatchList = []
        for i in range(len(_pyMsgTypeList)):
            self.dispatchList.append( getattr(self.msgCallback,
                        _msgTypeToMethodMap[_pyMsgTypeList[i]]) )
        
    cdef _routeMsgCallback(self, message.Message msg):
        cdef int i
        cdef int mtype
        mtype = msg.type
        for 0 <= i < 12:
            if mtype == _msgTypeList[i]:
                self.dispatchList[i](msg)
                return

The variables _pyMsgTypeList and _msgTypeList contain the prioritized list of LBM messages, one containing Python objects and the other C data. Having the C data type list avoids having to do any Python object creation when looking up values in the list. The handler methods for each message type live at the same index in self.dispatchList; so for the message type stored at _msgTypeList[i], the handling method for that type can be found at self.dispatchList[i] for any specific value of i.

So before people get out the pitchforks and torches to run me out of town, let me address the linear search in _routeMsgCallback(). First of all, the syntax of the for loop is a special Pyrex notation that is supposed to provide the best processing performance. But why the for at all? Why not a lookup structure such as as dict? Conventional wisdom holds that if your search list consists of around 10 items, it's pretty hard to come up with a lookup algorithm that performs better on average than linear search, and measurements with an earlier implementation using dicts bears this out. With a list composed of 12 items, the number of possible message types fits comfortably within range of this rule of thumb. I've further improved the search's performance by placing the most commonly occurring message types at the front of _msgTypeList, so that it only requires a compare or two to find the index of the proper handling method in the majority of cases. This change provided a significant performance increase in the test code that used this as a Receiver base class.

Still, even though I bought significant improvement with this static method routing approach, Python processing in the callbacks was still holding overall performance back. So the final approach was to Pyrex-ize the callback class itself, turning it into a extension type. This would have the benefit of using Python's native C API for the objects it knew the type of, while maintain “plug compatibility” with the pure Python class it replaced.

Here's a snippet from a Pyrex Receiver class that's used to count messages, bytes, and various other statistics on the received messages:

cdef class StatsCatcherReceiver(RoutingReceiverMixin):
    cdef public int msgCount, byteCount, unrecCount, burstLoss
    cdef public int totalMsgCount, totalByteCount
    cdef public int totalUnrecCount, totalBurstLoss
    cdef object owner
    cdef object verbose
    def __init__(self, owner, verbose=False):
        self.msgCount = 0
        self.byteCount = 0
        self.unrecCount = 0
        self.burstLoss = 0
        self.totalMsgCount = 0
        self.totalByteCount = 0
        self.totalUnrecCount = 0
        self.totalBurstLoss = 0
        self.verbose = verbose
        self.owner = owner
        
    def dataMsg(self, msg):
        self.msgCount += 1
        self.byteCount += msg.len
        if self.verbose is True:
            gwInfo = self.getGWInfo(msg)
            if gwInfo:
                print "[%s][%s][%u], via gateway [%s][%u], %u bytes" \
                      % (msg.topicName, msg.source, msg.seqnum,
                       gwInfo.source, gwInfo.seqnum, msg.len)
            else:
                print "[%s][%s][%u], %u bytes" \
                      % (msg.topicName, msg.source, msg.seqnum, msg.len)

This class was created by porting a pure Python class to Pyrex, giving the Pyrex class the same name, methods, and functionality, thus allowing it to be plug-compatible within the program in which the original class was used. The message receipt rate increased by almost 100% when this extension type was used, putting the overall program's performance levels at about 50% of the equivalent C program's on the same test rig (around 440K msgs/sec). This is in large part because Pyrex turns all self.attr references into struct member accesses, providing the corresponding speed gains. This gets me to a reasonably happy place performance-wise, at least for now.

It's worthwhile to consider a few of the obvious opportunities that still exist for further increases in message receipt performance:

In StaticRoutingReceiver, I could do away with the linear search and look into a direct indexing scheme where the message type itself would index into the list where the proper handling method could be found. This would eliminate the loop setup and few compares even in the best cases with the current algorithm. The LBM message types lend themselves well to this approach-- they are small consecutive integers that start at zero, and so would make fine array indicies. I considered this approach, but didn't like the fact that I was leveraging intimate details regarding the nature of what was otherwise supposed to be an opaque symbol, although to be honest I suppose I still am now. 29West would most likely not change these values (in fact, they apparently go to great pains to remain backwards compatible), but it still rubs me the wrong way. Nonetheless, an experimental implementation to see performance differences is definitely in order.

Again for StaticRoutingReceiver, it would probably turn out to provide a bigger gain if I changed dispatchList from a Python list to a C array of Python objects. This would save considerable time by avoiding the list's C APIs. Coupled with direct indexing, this could provide some significant improvement.

Finally, it might be useful to pass a different object as the clientd to LBM's lbm_rcv_create() function. Instead of handing over a Python object on which I need to do an attribute lookup to find the _routeMsgCallback() method with each received message, the better thing to do would probably be to pass the bound _routeMsgCallback() method itself. That is, instead of calling:
```
result = lbmh.lbm_rcv_create(rcv, context, cTopic,
                <lbmh.lbm_rcv_cb_proc> _rcvEventCallback,
                pyReceiver, eq)
```
It might be better to call it like:
```
result = lbmh.lbm_rcv_create(rcv, context, cTopic,
                <lbmh.lbm_rcv_cb_proc> _rcvEventCallback,
                pyReceiver._routeMsgCallback, eq)
```
And then directly call the object passed into _rcvEventCallback(). Depending on the inheritance structure backing the class of pyReceiver, this could save several dictionary lookups. However, it isn't entirely clear that this will be a win at all in the long run. The _routeMsgCallback() method is defined as a C function in Pyrex, and since we cast the returned object to an AbstractReceiver in _rcvEventCallback(), we may very well get to do a direct call to the underlying C function and bypass Python's attribute lookup in this case. Additionally, it isn't clear what kind of object you get with pyReceiver._routeMsgCallback; it could be a bound method, or it could be a pointer to a function. If a bound method, there's a good chance that it will actually be slower to call than in the current approach, especially since it would entail Python's method invocation protocol, which involves building an argument tuple and calling a Python C API function to invoke the method. If, on the other hand, pyReceiver._routeMsgCallback results in a function pointer, it may be no faster then dereferencing an AbstractReceiver pointer to find the _routeMsgCallback member that's a pointer to a function. The only way to know this, of course, is to look at the generated Pyrex code.

This investigation into improved performance also demonstrated a very worthwhile development process: create pure Python subclasses of StaticRoutingReceiver or RoutingReceiverMixin to hold the data that arrives in the callbacks during development, and when the structures that use the objects have settled down, re-implement these classes using Pyrex in order to get an easy speed boost.

It's not a bad application strategy for an LBM/Python app, either: put the message handling into extension types whose instances can respond rapidly incoming messages, and use pure Python to organize the operation of these high-bandwidth objects.

So that's the whole of the message receipt stack in my LBM wrapper. As with such things, it's an ever-evolving beast, and even in the course of drafting these posts I've come up with new ideas to try to make the receipt of messages run faster. But that's enough for this part of the system for now; next time it'll be onto topics anew.

Saturday, June 20, 2009

Incoming, Part 2-- creating speedy receivers

Sorry for the delay in a post; there's been lots going on here including house guests and professional obligations.

So in the last post I talked about the various ways that I wanted to allow users to create receivers that would get messages delivered to them, and how I didn't want them to be constrained in using almost any kind of Python callable to perform the role of message handler. I showed how I created a factory that supported all of these models of receiver creation, and showed the different ways that the resultant classes could be used to manufacture receivers of different kinds. I now want to dive down further into the topic of receiver implementation and examine some of the issues involving how to make them perform well.

Now this section of the extension has undergone the greatest amount of change as I kept coming up with new ways to make the extension faster. The early going was dreadful; while the pure C example programs from 29West could handle over 800K msgs/sec on my testbed, my first attempts with Python analogs using the extension managed just over 40K msgs/sec. After a lot of tinkering I managed to get Python to deal with 400K msgs/sec, and I believe that if 29West were to implement some of the thread interfaces I mentioned in a previous post, it could go even higher. While I won't go into the entire development arc, I will share what I think are the important points I learned along the way.

In the end, there were three main areas whose optimization provided the biggest performance boosts: mapping from C data to Python objects, invoking the user's Python callback, and optimizing the performance of the callback itself.

The first issue to be tackled was how to tie a C callback efficiently back to relevant Python objects from within Pyrex. That last bit was the crucial part since Pyrex's type safety can stand in the way of a lot of approaches. Pyrex can only do direct translation between simple Python and C types; it isn't happy to let you cast an Python object reference a C pointer, even if that's just void *.

To start examining this issue, let's have a look at a fragment of the code from the receiver factory I discussed in the last post:

with nogil:
    result = lbmh.lbm_rcv_create(rcv, context, cTopic,
                                  _rcvEventCallback,
                                 pyReceiver, eq)

The fifth and sixth parameters here are the keys for tying the LBM message delivery callback to Python. The fifth, here _rcvEventCallback, is the callback function that LBM invokes when a message has arrived. The sixth is the so-called “client data” pointer, a void * to some data opaque to LBM, which is associated with the newly created receiver. This pointer is passed as an argument to the callback function along with the receiver that just received the message.

Now an obvious approach would have been to pass the receiving Python object as the client data pointer, but since Pyrex wouldn't allow me to cast it to a void *, I thought that avenue was closed off to me. At that time I was content to have that restriction, as I admit that I was a little uncomfortable about handing a reference counted object over to C code that could care less about references.

So at first I tried using the address of the C receiver object (after it was allocated with lbm_rcv_create), recast to an int, as a key to into a dict that mapped the address to the extension type receiver object. This worked, although not as fast as I'd hoped, and it was also vulnerable to certain race conditions-- the C receiver object had to allocated before I could map it, which means that a message could theoretically arrive before I had the structures set up that told me what to do with it.

In the next iteration, I started generating serial integer keys to use for the mapping, and would pass the key as the client data pointer into the lbm_rcv_create call. This allowed me to establish the mapping before the C object was created, thus eliminating the race condition. However, it wasn't any faster, partly due to having to create a Python integer object from a C int for every message that arrived, and partly due to the lookup time in the dictionary. I did get some improvement by changing how the dictionary was initialized in Pyrex from this:

_mappingDict = {}

to this:

cdef dict _mappingDict
_mappingDict = {}

The former approach requires dynamic method lookups when you want to insert a key or lookup a value associated with a key. With the latter approach, Pyrex knows that _mappingDict is going to be a Python dictionary and will then use the dictionary's C API, bypassing the costly method lookups.

But the creation of the Python integer objects and their use in dictionary lookups was still too costly, so I decided to stop being a wimp and see if I could figure out a way to get Pyrex to accept me providing a Python object as the value of the void * client data pointer. Besides satisfying Pyrex's idea of the type of things, I had to have control over the lifecycle of both the C receive object and the Python receiver object so that LBM wouldn't call out with a reference to a Python object that no longer existed.

Since I couldn't find a way to tell Pyrex that my Python object could be used as the void *, I decided to lie to Pyrex about what sort of parameters lbm_rcv_create could take. This function's signature in the lbm.h file is:

int lbm_rcv_create(lbm_rcv_t **rcvp, lbm_context_t *ctx,
    lbm_topic_t *topic, lbm_rcv_cb_proc proc,
    void *clientd, lbm_event_queue_t *evq)

But I told Pyrex that the signature is:

int lbm_rcv_create(lbm_rcv_t **rcvp, lbm_context_t *ctx,
    lbm_topic_t *topic, lbm_rcv_cb_proc proc,
    object clientd, lbm_event_queue_t *evq)

So I told Pyrex that the client data argument was simply a Python object. This only matters to Pyrex; you direct it to add the actual lbm.h to the generated C file so that it compiles with the proper signature. Of course, this results in a warning that the pointer types are compatible, but since it's opaque to LBM it doesn't matter.

Of course, we then have to change the signature of the callback function as well from:

int (*lbm_rcv_cb_proc)(lbm_rcv_t *rcv, lbm_msg_t *msg, void *clientd)

...to:

int (*lbm_rcv_cb_proc)(lbm_rcv_t *rcv, lbm_msg_t *msg, object clientd)

Again, telling Pyrex that the expected parameter was a Python object. Another warning is the result, but again it's harmless.

That shuts up Pyrex, but it hardly makes passing a Python object around safe yet. As long as LBM holds on to this pointer, we have to make sure that the object's reference count accurately reflects this. Fortunately, we can get access to the Python C API from within Pyrex, and so can utilize the functions that manage the reference counts on objects. We still have to cheat a bit and tell Pyrex that Py_INCREF and Py_DECREF take objects rather than the Python C type, but that's fine because Pyrex's “object” actually resolves down to the Python C type. So we need a little bit of special include code at the top of the module to tell Pyrex to bring in Python.h and treat these two functions accordingly:

def extern from "Python.h":
    void Py_INCREF(object)
    void Py_DECREF(object)

And after that, we're free to manage reference counts. Here are the relevant lines from the receiver factory code I last posted:

Py_INCREF(pyReceiver)
with nogil:
    result = lbmh.lbm_rcv_create(rcv, context, cTopic,
                                  _rcvEventCallback,
                                 pyReceiver, eq)

So our Python receiver extension type becomes the client data for the callback, and it is this object that understands how to actually call back to user code.

With that in hand, the callback function itself is now easy and runs very fast:

cdef int _rcvEventCallback(lbmh.lbm_rcv_t *rcv, lbmh.lbm_msg_t *msg,
                           object clientd) with gil:
    cdef AbstractReceiver receiver
    cdef message.Message pyMessage
    receiver =  clientd
    if receiver is not None:                #still a little paranoid
        pyMessage = Message()._setMsg(msg)
        receiver._routeMsgCallback(pyMessage)
    return lbmh.LBM_OK

No more mapping shenanigans; we can directly call this object safely as we've taken care of accounting for the reference.

What isn't shown here is where the Py_DECREF occurs; that's part of the new functionality in the extension that provides better lifecycle management of extension types fronting the LBM C objects, but that's a topic for another post.

So in the callback we now can quickly use our receiver, create a message object, and ask the receiver to send the message to the user's callback through the use of _routeMsgCallback. This concludes addressing the issue of mapping from C to Python objects in an efficient manner. That takes us to the next issue, invoking the user's callback itself. That'll be the topic of the next post.

Thursday, June 4, 2009

Incoming-- wrapping up message receivers

Given the last post on upcall performance, I thought that instead of talking about the extension beginning with the most fundamental classes (that is, wrapping the context object), it might make more sense to follow up directly with with some discussion on how I'm looking at the problem of making the receipt of messages fast and efficient while still flexible. This post is going to assume a bit of knowledge about 29West; specifically, it will be referring to the concepts of topics, event queues, and contexts, although some high-level descriptions will be provided.

I wanted to provide for a number of different ways to receive messages, in essence allowing any Python callable to be used as the target of message upcalls, while at the same time provide some basic classes that could be subclassed to make the process more simple if desired. Across all of these approaches I needed to make the handling of messages as fast as possible.

In LBM, you need to provide a few raw ingredients to create a message receiver:

A context object, which establishes the the overall logical messaging environment for sending and receiving messages.

A topic object, which identifies a named channel to which messages are posted to or received from.

An optional event queue which decouples the arrival of messages from their processing.

Some internal bookkeeping items to tell LBM how to route messages back to your application.

I took the view that in a significant number of applications, the majority of the receivers created would probably be associated with a single context object, a single event queue, and further would be handled in a single fashion, say with a bound method on different instances of a class. With this motivation, I decided to create a receiver factory that would keep these items together for the user to make creation of the receiver simple. The factory's definition from the pxd file is pretty simple:

cdef class ReceiverFactory:
    cdef core.Context ctx
    cdef core.EventQueue defaultEventQueue
    cdef object defaultReceiverClass

The only required piece data needed to create a factory instance is the context; the event queue and receiver class (more on this below) are optional.

And here are the key pieces of the implementation from the pyx file:

cdef class ReceiverFactory:
    """
    This class takes 1 required and 2 optional arguments. The first arg
    is the core.Context object that the factory will be creating
    receivers for. The second, defaultReceiverClass, is a callable that
    yields instances of subclasses of receiver.AbstractReceiver. It
    defaults to Receiver. The third is an optional event queue you'd
    like to be shared between all receivers created with this factory.
    
    If a defaultReceiverClass is specified, it must be a callable that
    yields a subclass of receiver.AbstractReceiver (thus it may be a
    class object or other callable). The callable must take 2 arguments:
    the first argument is the calling ReceiverFactory object (self),
    and the second is whatever the current value for eventCallback is.
    If the class you wish to use takes more arguments, consider passing
    a closure that provides the additional args.
    
    NOTICE: Since the expected class to be produced by
    createReceiverForTopic is one of the extension classes, duck typed
    classes won't work properly here; the class of the object yielded
    by the defaultReceiverClass must be a subclass of AbstractReceiver.
    """
    def __init__(self, core.Context ctx, defaultReceiverClass=Receiver,
                 defaultEventQueue=None):
        #you may override this __init__ as long as you call it from the
        #derived class's __init__
        self.ctx = ctx
        self.defaultEventQueue = defaultEventQueue
        self.defaultReceiverClass = defaultReceiverClass
        
    def createReceiverForTopic(self, Topic pyTopic, eventCallback=None,
                               core.EventQueue eventq=None,
                               receiverClass=None):
        """
        Create a receiver for the specified topic. If an eventCallback
        is provided, pass it into the receiver class's constructor
        (see below). If an eventq is provided, attach the queue to the
        receiver so it is used for event delivery.
        
        If a receiverClass is specified, it must be a callable that
        yields a subclass of receiver.AbstractReceiver (thus it may be
        a class object or other callable). The callable must take two
        arguments: the first argument is the calling factory object
        (self), and the second is whatever the current value for
        eventCallback is. If the class you wish to use takes more
        arguments, consider passing a closure that provides the
        additional args. If the receiverClass is specified here, it
        overrides any that was specified at the factory's creation
        with defaultReceiverClass. If none is specified here, then
        the one specified at the factory's creation is used.
        
        NOTICE: Since the expected class is one of the extension
        classes, duck typed classes won't work properly here; the
        class of the object yielded by the callable must be a
        subclass of AbstractReceiver.
        """
        cdef int result
        cdef object rcvrClass
        cdef lbmh.lbm_context_t *context
        cdef lbmh.lbm_rcv_t **rcv
        cdef lbmh.lbm_topic_t *cTopic
        cdef AbstractReceiver pyReceiver
        cdef lbmh.lbm_event_queue_t *eq
        if receiverClass is None:
            rcvrClass = self.defaultReceiverClass
        else:
            rcvrClass = receiverClass
        pyReceiver = rcvrClass(self, eventCallback)
        if not typecheck(pyReceiver, AbstractReceiver):
            raise LBMWException("The provided callable must yield “
                                “a subclass of AbstractReceiver")
        context = self.ctx._getContext()
        if eventq is None:
            eventq = self.defaultEventQueue
        if eventq is None:
            eq = NULL
        else:
            eq = eventq._getEQ()
        cTopic = pyTopic._getTopic()
        rcv = pyReceiver._getReceiverPtrPtr()
        Py_INCREF(pyReceiver) #key to preventing dangling pointers!
        with nogil:
            result = lbmh.lbm_rcv_create(rcv, context, cTopic,
                                 <lbmh.lbm_rcv_cb_proc> _rcvEventCallback,
                                 pyReceiver, eq)
        if result == lbmh.LBM_FAILURE:
            raise LBMFailure(fn="lbm_rcv_create")
        return pyReceiver

So a bit of discussion is in order here.

The general approach is to create a factory that works cooperatively with a user-supplied factory to produce the appropriate Receiver instances. By default, the “user-supplied” factory is a class object, and it yields an instance of itself anytime a new receiver is created. However, the user factory can be any callable, as long as it produces an instance that is a derived class of the extension's AbstractReceiver class. This opens up a lot of possibilities as to what you might provide for the user factory.

When the user creates the ReceiverFactory instance, he provides the context that new receivers are to be associated with, and he has the option of supplying the default receiver factory object that will be used to create receiver objects, and an event queue for queuing arrived messages for processing.

When the user calls createReceiverForTopic, the only required argument is the topic that identifies the channel through which the receiver will acquire messages. The other arguments allow overriding the user's default receiver instance factory and event queue that were both provided at ReceiverFactory instantiation time, and to also specify an alternate event callback that will be the target of upcalls when messages arrive. This last bit is key; the default behavior of the ReceiverFactory is to generate a receiver object whose methods get invoked whenever a message for the receiver arrives. The eventCallback keyword argument allows the user to specify an alternative callback target; in this case, the receiver object simply acts as a control construct to manage the routing of messages to the desired upcall target.

A couple of examples of how to use this would be helpful. In each, assume that aContext is an LBM context object, aTopic is an LBM topic object, and anEventQueue is an LBM event queue object.

#Example 1: create a receiver that uses
#an independent function for callbacks
def simpleCallback(aReceiver, theMessage):
    #do stuff with theMessage
    return

receiverFac = ReceiverFactory(aContext)
newReceiver = receiverFac.createReceiverForTopic(aTopic,
                                         eventCallback=simpleCallback)

#Example 2: create a receiver subclass whose instances
#are the target of callbacks
class MyReceiver(receiver.Receiver):
    def handleMsg(self, theMessage):
        #do stuff with theMessage
        return

receiverFac = ReceiverFactory(aContext, defaultReceiverClass=MyReceiver)
newReceiver = receiverFac.createReceiverForTopic(aTopic)

#Example 3: like Example 2, but specifying MyReceiver at creation time
receiverFac = ReceiverFactory(aContext)
newReceiver = receiverFac.createReceiverForTopic(aTopic,
                                         receiverClass=MyReceiver)

#Example 4: Using a function for the receiverClass callable
class MyReceiver(receiver.Receiver):
    def handleMsg(self, theMessage):
        #do stuff with theMessage
        return

class MyOtherReceiver(receiver.Receiver):
    def handleMsg(self, theMessage):
        #do other stuff
        return

def vendReceiver(rFac, eventQueue):
    if someTestOnRecvFac(rFac):
        theReceiver = MyReceiver(rFac, eventQueue)
    else:
        theReceiver = MyOtherReceiver(rFac, eventQueue)
    return  theReceiver

receiverFac = ReceiverFactory(aContext,
                              defaultReceiverClass=vendReceiver)
newReceiver = receiverFac.createReceiverForTopic(aTopic)

So there's lots of ways to get the factory to generate different sorts of receivers.

There are a number of calls of the form “_getTopic()”, “_getEQ()” that are used to acquire the underlying pointer to the managed C object that the Python extension type is wrapping. These methods are only available in the extension itself, and by convention I start them with an underscore to identify that.

The only other magic worth noting here is the use of Py_INCREF() on the newly created receiver instance. This is because we're about to give a reference to this instance to LBM and we want to make sure that this reference is counted properly by Python. This reference is dropped when the user indicates that they no longer want the receiver; I'll show how that works later.

But the above is only part of the story. This takes care of the creation aspect, but I still need to concern myself with the actual delivery of messages. That means we have to focus on the receivers themselves in the next post.

Saturday, May 30, 2009

Top gear-- how fast can we drive Python?

In order to get an idea of how well my wrapper is performing, I needed to get some idea of the theoretical limit in performance I could expect with Python. Specifically, I needed to get an idea of how fast I could generate upcalls into Python from an external library via a thread that Python knows nothing about.

Depending on how you use it, LBM can spawn a couple of threads for your process, one of which appears to be in charge of performing upcalls into application code upon receipt of messages. As these threads are outside the set spawned by Python itself, I was concerned as to whether there would be significant cost in Python bestowing the GIL to an alien thread for the upcall. So I figured I'd create a trivial extension module that performs upcalls in a couple of different ways to see how fast we can go.

The test is composed of three parts: the main line code which sets everything in motion and reports performance stats, a simple extension that does upcalls as fast as possible, and a class on which the upcalls will be performed. The method which is the upcall target does nothing, thus giving me a reasonable baseline for comparison.

I decided to have two variables that I'd modify to see what differences emerge:

First, I'll vary what thread we generate upcalls from. The test extension module will be able to perform upcalls from a Python-spawned thread, a thread spawned outside of Python (a raw pthread), and from the main Python thread itself.

Second, we'll vary the kind of object we upcall to. I'll implement a pure Python class, and then a second class that will be implemented as a Pyrex extension type. There will be a callback method on each that will do nothing but return.

Here's the test program's main, which includes the pure Python upcall target:

import time
import cUpcaller

class PyUpcallTarget(object):
   def __init__(self):
       self.kind = "python upcall target"
 
   def callback(self):
       return
 
def doit():
   limit = 5000000
   targets = [PyUpcallTarget(), cUpcaller.CUpcallTarget()]
   ucThreadsDict = {cUpcaller.WITH_PYTHON_THREAD:"python thread",
                    cUpcaller.WITH_ALIEN_THREAD:"alien thread",
                    cUpcaller.WITH_CALLING_THREAD:"calling thread"}
   waysToCall = ucThreadsDict.keys()
   for target in targets:
       for callHow in waysToCall:
           upcaller = cUpcaller.Upcaller(callHow, target.callback, limit)
           print ("Timing for %s from a %s"
                  % (target.kind, ucThreadsDict[callHow]))
           start = time.time()
           upcaller.go()
           upcaller.join()
           elapsed = time.time() - start
           print ("  %d upcalls took %f secs, averaging %f upcalls/sec"
                  % (limit, elapsed, limit / elapsed))
 
if __name__ == "__main__":
   doit()

And the extension Pyrex code, which includes the extension type upcall target:

import threading
cdef extern from "pthread.h" nogil:
   ctypedef unsigned long int pthread_t
   cdef enum:
       __SIZEOF_PTHREAD_ATTR_T = 256 #the value here isn't important
   cdef union pthread_attr_t:
       char __size[__SIZEOF_PTHREAD_ATTR_T]
       long int __align
   int pthread_create(pthread_t *__newthread, pthread_attr_t *__attr,
                      void *(*__start_routine) (object), object __arg)
   int pthread_join(pthread_t tid, void **valPtr)

WITH_PYTHON_THREAD = 1
WITH_ALIEN_THREAD = 2
WITH_CALLING_THREAD = 3

cdef class Upcaller

cdef int upcaller1(object theUpcaller) with gil:
   cdef int result
   cdef Upcaller ucRouter
   ucRouter = <upcaller> theUpcaller
   result = ucRouter.routeUpcall()
   return result

cdef void *upcaller1Agent(object theUpcaller) nogil:
   cdef int quit
   quit = 0
   while quit == 0:
       quit = upcaller1(theUpcaller)
   return NULL


cdef class Upcaller:
   cdef callback
   cdef callHow
   cdef upcallThread
   cdef pthread_t alienThread
   cdef public long upcallLimit
   cdef public long upcallCount
   def __init__(self, callHow, callback, limit):
       self.callback = callback
       self.callHow = callHow
       self.upcallThread = None
       self.upcallLimit = limit
       self.upcallCount = 0
  
   def go(self):
       cdef int callResult
       cdef pthread_t *alienThread
       if self.callHow == WITH_PYTHON_THREAD:
           self.upcallThread = threading.Thread(target=self._hurtEm,
                                                args=())
           self.upcallThread.start()
       elif self.callHow == WITH_CALLING_THREAD:
           self._hurtEm()
       elif self.callHow == WITH_ALIEN_THREAD:
           alienThread = &self.alienThread
           with nogil:
               callResult = pthread_create(alienThread, NULL,
                                           upcaller1Agent, self)
           if callResult != 0:
               raise Exception("failed to start pthread")
       else:
           raise Exception("unrecognized callHow value")
 
   def _hurtEm(self):
       with nogil:
           upcaller1Agent(self)
             
   cdef int routeUpcall(self):
       #indicate being all done by returning 1
       self.callback()
       self.upcallCount += 1
       if self.upcallCount > self.upcallLimit:
           return 1
       else:
           return 0

   def join(self):
       if self.callHow == WITH_PYTHON_THREAD:
           self.upcallThread.join()
       elif self.callHow == WITH_CALLING_THREAD:
           pass  #we blocked in go() so there's nothing to join
       else: #must be WITH_ALIEN_THREAD
           with nogil:
               pthread_join(self.alienThread, NULL)


cdef class CUpcallTarget:
   cdef public char *kind
 
   def __init__(self):
       self.kind = "c ext upcall target"
  
   def callback(self):
       return

A few words about the Pyrex code:

The second line which starts “cdef extern from ...” tells Pyrex a couple of things: first, that the following declarations can be found in the pthread.h file and therefore Pyrex will need to generate a #include for that header, and second that any functions listed in this section are to be called without the GIL. This acts as a flag to Pyrex that it's acceptable for invoke the function inside a with nogil: block.

The pair of functions “upcaller1()” and “upcaller1Agent()” serve as the stand-ins for the glue code to the external library and the external library itself. The upcaller1() function includes the “with gil” suffix on the function definition to tell Pyrex to generate code that acquires the GIL upon entry to the function. An analog to this function would be the upcall target for LBM in the real extension and would acquire the GIL for each upcall, making it safe to subsequently interact with Python objects. The upcaller1Agent() function in essence acts as the whole of the LBM library; it does whatever it does, and when it needs to call out to user code (for instance, to delivery a recently arrived message), it activates the extension callback function upcaller1(). Since upcaller1Agent() is a stand-in for LBM, it is marked as “nogil” to indicate that the GIL cannot be held when calling this function.

I probably could have gotten a bit more speed using a straight function rather than a bound method for the upcall target, but since I planned on doing away with the old “client data pointer”, a method on an object seemed like a reasonable choice. Anyway, we're really looking for a ballpark figure here, as a real implementation isn't bound to get anywhere near this performance.

I ran the test program five times and averaged the results, which are shown in the following table. These rates are upcalls/sec:

	Pure Python upcall target	Python C extension class upcall target
Upcalling thread known to Python	1,827,283	3,528,114
Upcalling thread unknown to Python	948,044	1,329,958
Upcalls from main thread	2,018,659	3,330,011

The test host contains an Athlon 64 X2 dual core 3800+ processor and 3 GB of RAM.

Pretty interesting numbers, to be sure. The good news is that a user of the LBM extension would have some options if they needed better performance; it's pretty clear that by turning your callback objects into Pyrex extension types would give you a significant boost in performance (an interesting data point for Pyrex use in general, too). The bad news is the performance hit encountered when the upcalls are performed by a thread created outside of Python's knowledge (that is, not using threading but rather a raw pthread). I understand Java suffers from a similar effect with alien threads calling up to Java via the JNI.

This of course raises an API extension request for 29West. It would be great to expose an optional interface for the user to plug in their own “thread factory”. The default factory would simply be a call to pthread() to start up a thread of control at some identified function. However, a user-supplied factory could use whatever means it desired to spawn a thread. In the case of Python, it would be a simple matter to create a new thread with “threading” and have it invoke a “nogil” function that would then execute the 29West thread entry point. This way Python won't have to do all the work it otherwise must face whenever an alien thread tries to acquire the GIL, and thus allow such code to run much faster.

Now I have some idea of what to aspire to.

Sunday, May 24, 2009

Opt-out-- simplifying LBM options handling P3

Now that I've described an option value abstraction and help in managing the sea of available options, today's post will cover the facilities to apply those options to LBM objects in a simplified way.

What I want is a single small set of entry points that will be available on all objects that can take options. This is in contrast to the LBM API itself, which has some 36 functions for dealing with options across a variety of objects. The functions allow you to set or get options either as fundamental data types or as a string (when sensible to do so). So each object that has options has two setters (one for fundamental data and one for strings), and two getters (again, for data and strings).

Not all of the objects that have getter/setter functions are “operational” objects; that is, not all are directly involved in the business of processing messages. Many of the operational objects in the C API have a corresponding “attribute” object upon which options can be set, and the attribute object can then be used when the associated target object is created in order to set a whole batch of options at once.

While the names of these functions conform to a pretty straightforward algorithm, for example lbm_event_queue_setopt() and lbm_event_queue_str_setopt() for setting options on event queues, there’s really no need to propagate these distinctions up through an object interface since they can easily be abstracted away. What I wanted was to be able to distill down the interface to just a single pair of getter/setters for each object that takes options.

Given the above, I decided to create a base class in Pyrex that established the option handling interface protocol and have the other extension types that front LBM objects subclass this base. The subclasses would then implement the specific machinery for setting options on their underlying LBM object.

Here’s the base option handling class:

cdef class OptionMgr:
    cdef object _setOptionWithStr(self, char *coptName,
                                  char *coptValue)
    cdef object _setOptionWithObject(self, char *coptName,
                                     void *optRef, lbmh.size_t optLen)
    cdef object _getOption(self, char *coptName, void *optRef,
                           lbmh.size_t *optLen)
    cdef object _getOptionStr(self, char *coptName, char *optValue,
                              lbmh.size_t *optLen)

This class defines the internal protocol which all extension types layered over LBM objects must implement in order to provide access to all of the functions that LBM makes available. The implementations of these methods are where the distinction between the various LBM option setting/getting functions are made.

The OptionMgr class in the .pyx file establishes the interface presented to Python itself:

cdef class OptionMgr:
    cdef object _setOptionWithStr(self, char *coptName,
                                  char *coptValue):
        retval = lbmw.lbmh.LBM_FAILURE
        return ("not_implemented", retval)
    
    cdef object _setOptionWithObject(self, char *coptName,
                                     void *optRef,
                                     lbmw.lbmh.size_t optLen):
        retval = lbmw.lbmh.LBM_FAILURE
        return ("not_implemented", retval)

    cdef object _getOption(self, char *coptName, void *optRef,
                           lbmw.lbmh.size_t *optLen):
        retval = lbmw.lbmh.LBM_FAILURE
        return ("not_implemented", retval)
        
    cdef object _getOptionStr(self, char *coptName, char *optValue,
                              lbmw.lbmh.size_t *optLen):
        retval = lbmw.lbmh.LBM_FAILURE
        return ("not_implemented", retval)
    
    def setOption(self, opt, optValue=None):
        """
        The opt argument may either be a non-unicode string containing
        the name of an LBM option, or it may be an instance of an
        Option subclass.
        
        If opt is a string, then optValue must be present and must
        also be a string containing the desired option value. If opt
        is an instance of an Option subclass, then optValue can be
        left out (if supplied it is ignored).
        """
        cdef int result
        cdef char *coptName
        cdef char *coptValue
        cdef void *coptRef
        cdef lbmw.lbmh.size_t coptLen
        cdef Option copt
        
        if type(opt) == str:
            if optValue is None or type(optValue) is not str:
                raise LBMWException("If opt is a string then optValue "
                                    "must be supplied as a string")
            coptName = opt
            coptValue = optValue
            calling, result = self._setOptionWithStr(coptName,
                                                     coptValue)
        elif typecheck(opt, Option):
            copt =

A few words about various Pyrex constructs and some other extension stuff are in order here:

The construction cdef Option copt is Pyrex's way to allow you to define variables of specific Python extension types. While many cases you can treat an extension type just like any other Python object, accessing methods in this manner is slower than if you specify the variable is of a particular extension type. When using cdef, you can directly and efficiently access C members. Also, this is the only way you can get at the object's methods that were defined with cdef.
The construction <SomeClass> is Pyrex notation for a cast. This allows you to use a generic Python object in a context that requires a specific extension type. You can also do straight C casts with the angle brackets as well.
The typecheck() function is a safer version of isinstance() that can be used within extensions. Apparently isinstance() can be fooled occasionally, and typecheck() works properly in those circumstances. This function causes Pyrex to use the Python C API to check the type of the provided object.
The wrapper defines two exceptions, LBMFailure and LBMWException. LBMFailure is used when the underlying LBM libraries return a failure code. The extension raises this exception and includes the name of the LBM function that returned the failure. LBMWException is used when the extension code itself encountered a failure; the encountered failure isn't associated with the LBM libraries themselves.
There's lots of taking data out of Python objects and storing it into local C variables; this is because of the way that Pyrex generates code and temporary Python objects. Not every type requires an explicitly declared local C variable, but to avoid uncertainly I decided to create explicit local copies since Pyrex would be doing it anyway.

The result of all this is that OptionMgr contributes only two Python methods to any extension type that derive from it, setOption() and getOption(). Both can work with either strings or Option instances for setting option values. Any extension class that wants to play in this game needs to implement the four option setting/getting methods, each of which are very brief. As an example of this, here's a a pair of methods from the Python EventQueue extension type that implement some of the internal protocol of OptionMgr:

cdef object _setOptionWithObject(self, char *coptName, void *optRef,
                                 lbmh.size_t optLen):
    cdef int result
    with nogil:
        result = lbmh.lbm_event_queue_attr_setopt(&self.eqAttrs,
                                                  coptName, optRef,
                                                  optLen)
    retval = result
    return ("lbm_event_queue_attr_setopt", retval)
    
cdef object _getOption(self, char *coptName, void *optRef,
                       lbmh.size_t *optLen):
    cdef int result
    with nogil:
        result = lbmh.lbm_event_queue_attr_getopt(&self.eqAttrs,
                                                  coptName, optRef,
                                                  optLen)
    retval = result
    return ("lbm_event_queue_attr_getopt", retval)

A word about the with nogil: block; this is another Pyrex construction that tells Pyrex to generate code to release the GIL for the execution of the statements in the block. Generally you'll want to do this when calling out to the library that the extension is wrapping so that other Python threads can run. When the block is complete the method will once again hold the GIL.

Earlier versions of OptionMgr passed the actual Option instance into these methods, however each wound up re-implementing a batch of boilerplate code that acquired all the individual fields needed to make the function call, and so the internal interface got refactored to not refer to any Python objects at all. It made each function much shorter.

What's it like to work with? Well, here's a snippet of an IPython session working the extension type build on top of the above “event queue attributes” LBM object. The session has been edited to reformat it a bit and remove a stack trace that isn't terribly helpful to understanding what's going on.

tom@slim-linux:~/workspace/lbmw/src/lbmw$ ipython

Python 2.5.2 (r252:60911, Oct  5 2008, 19:24:49) 
Type "copyright", "credits" or "license" for more information.
IPython 0.8.4 -- An enhanced Interactive Python.

In [1]: from lbmw import core

In [2]: from lbmw.option import eventQueueOpts as eqo

In [3]: eqc = core.EventQueueConfig()

In [4]: eqc.setOption(eqo
                      .queue_cancellation_callbacks_enabled
                      .getOptionObj()
                      .setValue(1))

In [5]: opt = eqo.queue_objects_purged_on_close.getOptionObj()

In [6]: opt.setValue(0)
Out[6]: <lbmw.coption.IntOpt object at 0x8baeedc>

In [7]: eqc.setOption(opt)

In [8]: eqc.setOption("queue_delay_warning", 1)
--------------------------------------------------------
LBMWException          Traceback (most recent call last)

<TRACEBACK SNIPPED>

LBMWException: If opt is a string then optValue must be
  supplied as a string

In [9]: eqc.setOption("queue_delay_warning", "1")

As you can see, all the flexibility provided by the underlying LBM libraries hasn't been lost, and a few different ways of interacting with options has been provided. Line 4 shows how the setValue() method of option returns the option itself, allowing you to allocate, set the value, and then set the option in a single statement. Lines 5, 6, and 7 shows how this might be broken out into multiple statements. Line 8 demonstrates an exception that is raised within the extension itself. Finally, line 9 shows that you can still use strings to set option values.

One of the other differences with LBM is the use of exceptions; every LBM function call requires you to examine a return code to determine if the function completed successfully. Python exceptions allow a much more succinct way to express the same operations with no loss of semantics.

Often, new ideas come up when you describe an interface to someone, and a couple of things have occurred to me while working on these posts:

Legal value checking would be helpful. It would be nice if you got an exception when you tried to set a value on an option and it was outside the allowable range of values. The current classes would support this nicely: the Option subclasses could by default test to ensure that the value provided fit into the underlying C variable, but could allow an option check function to be specified at construction time that could further restrict legal values. The further restriction function could be part of OptObjectKey and passed down into the Option when it is created.
Communicating legal values would be nice. The most helpful form would be some sort of structured information that informs you what legal boundaries are for a given option. This starts to get tricky when you consider options that are actually multi-field structures; trying to do this generally is probably not worth the effort. It may be enough to simply provide a string that contains a text description of the values accepted by the option.

Armed with a means to configure the LBM objects I want to interact with, I can now start to work on building these parts of the extension. Onward!

Friday, May 22, 2009

Opt-out-- simplifying LBM options handling P2

In the previous post, I laid out my plans for simplifying option handling in the Python wrapper I'm creating for 29West's LBM low-latency messaging product. I laid out three goals in that post: to hide option data type details when possible, to provide assistance in actually using the options, and finally to minimize the number of entry points involved to actually get/set options. In that post I introduced the use of Pyrex as the tool I selected to help generate the C extension code for Python, and as a aid in talking about Pyrex's use I showed the definition of the basic classes that will hold option values.

This post will focus on the second goal, namely providing assistance in selecting the proper option value class and in general providing support for navigating through the wide-variety of options available to use with LBM objects.

While LBM defines a large number of options (184 uniquely named options in the version that I'm using), the reality is that only a subset of these apply depending on the kind of transport you select and the kind of objects you create. The object that the option applies to is the “scope” of the option. Options may apply to more than one object, and hence may be in more than one scope.

However, even though the number of relevant options is significantly reduced once you make these choices, weeding out the options you care about from those you don't is a laborious process involving combing through the documentation. And then you still need to select the correct data types to use when dealing with the options.

Another issue that rubs me a bit the wrong way with the LBM option API is that all options are identified by a string name. There are no symbolic #defines for these names, so even in C you don't know that you have a typo until run time. I suppose I have no business griping about this kind of thing since I'm doing Python, but I still think we can raise the bar a bit in this space to help users avoid late discovery of typos.

I'm guessing here that the simplest version of the solution is pretty obvious: establish a dictionary somewhere whose keys are option name strings and whose values are the appropriate subclasses of Option; something like this:

def getOptionObj(key):
    """
    Return an option object suitable for containing the kind
    of data required for the named option
    """
    if isinstance(key, OptObjectKey):
        optName = key.optName
    else:
        raise OptionException("unknown key type:%s" % str(type(key)))
    optClass = _optionNameKeyToOptionClassMap.get(key)
    if optClass:
        inst = optClass(optName)
    else:
        inst = None
    return inst

It's a reasonable start; it addresses one of the key goals, namely to hide the details of the specific data types needed when setting options. You of course still need to know if the data values are integers, floats, strings, etc, but specific types are no longer important. What's still missing is help sorting out which options apply to which LBM objects, and providing a means to catch option name typos earlier.

For this, I introduced the notion of an option object key, OptObjectKey, that performs a few different functions to address these issues. The primary function, however, is to be able to use instances of the class as a way to provide a concrete object that can be referenced elsewhere when a user needs an Option subclass instance. The idea here is to provide an object that can be assigned to a variable whose name matches the option's name, thus giving IDEs something to suggest as users type code, and something that will raise an error upon import if there's a typo. In addition, the key object provides additional functionality to help sort out

The best way to explain how this works is by looking at the class and seeing some examples of how it's used. First, the class:

class OptObjectKey(object):
    instancesByScope = {}
    emptySet = frozenset()
    def __new__(cls, optName, scopeName):
        self = _optionKeyInstances.get(optName)
        if self is None:
            self = super(OptObjectKey, cls).__new__(cls, optName,
                                                    scopeName)
            _optionKeyInstances[optName] = self
        return self
    
    def __init__(self, optName, scopeName):
        #since we reuse instances, check to see if self
        #already has optName as an attribute
        if not hasattr(self, "optName"):
            self.optName = optName
            self.scopes = set([scopeName])
        else:
            self.scopes.add(scopeName)
        self.instancesByScope.setdefault(scopeName, set()).add(self)
        
    def __hash__(self):
        return hash(self.optName)
    
    def __eq__(self, other):
        if type(other) == type(""):
            result = False
        elif isinstance(other, OptObjectKey):
            result = self.optName == other.optName
        else:
            result = False
        return result
    
    def getOptionObj(self):
        return getOptionObj(self)
    
    def getOptionClass(self):
        return _optionNameKeyToOptionClassMap.get(self)

    @classmethod
    def getAllScopes(cls):
        return list(cls.instancesByScope.keys())
    
    @classmethod
    def getOptionsForScope(cls, scopeName):
        return cls.instancesByScope.get(scopeName, cls.emptySet)

Instances of this class aren't directly created by modules wishing to publish information about options; a factory function takes care of creating these instances and mapping them to the proper underlying Option subclass:

def addOptionMapping(optNameStr, optType, scopeName):
    #we always create the OptObjectKey instance
    #since it records the scope information even
    #if we've seen this option already
    optObjKey = OptObjectKey(optNameStr, scopeName)
    optFactoryType = _optionNameKeyToOptionClassMap.get(optNameStr)
    if optFactoryType is None:
        _optionNameKeyToOptionClassMap[optNameStr] = optType
        _optionNameKeyToOptionClassMap[optObjKey] = optType
    return optObjKey

Notice that when we add an option mapping we add entries to _optionNameKeyToOptionClassMap using both the option's string name and the OptObjectKey object as keys. This allows the user to use either the string name or the key object (if they have it) when calling getOptionObj(). Of course, getOptionObj() now needs to be changed:

def getOptionObj(key):
    """
    Return an option object suitable for containing the
    kind of data required for the named option
    """
    if type(key) == str:
        optName = key
    elif isinstance(key, OptObjectKey):
        optName = key.optName
    else:
        raise OptionException("unknown key type:%s" % str(type(key)))
    optClass = _optionNameKeyToOptionClassMap.get(key)
    if optClass:
        inst = optClass(optName)
    else:
        inst = None
    return inst

The OptObjectKey class maintains an internal map of scope names to the set of OptObjectKey instances that belong to each scope. This provides a means for dynamically seeing what options fall within a scope. OptObjectKey instances themselves keep a list of which scopes they apply to, thus providing a means to cross-reference options and scopes.

Publishing option mappings is now a matter of calling addOptionMapping(). Here are some examples from eventQueueOpts.py:

scope = “eventQueue”
queue_cancellation_callbacks_enabled = addOptionMapping(
                              “queue_cancellation_callbacks_enabled",
                              coption.IntOpt, scope)
queue_delay_warning = addOptionMapping("queue_delay_warning",
                                       coption.ULongIntOpt, scope)

The return value of addOptionMapping() is an OptObjectKey instance that can subsequently act as a factory for generating appropriate Option subclasses for that option, or as a key to getOptionObj() for doing the same. The getOptionObject() function still takes option string names in case that's a more convenient form for the user, but can risk a runtime error due to a typo.

But if only OptObjectKeys will be used to look up options, IDEs can provide a lot of help. Here's how it can look in Eclipse with Pydev:

Eclipse helping out

And here's how IPython helps out:

IPython's response to tab

Further, the OptObjectKey class can be interrogated for the options in a scope as shown below:

Looking into a scope

These latter capabilities not only make it easier for the developer, but lend themselves nicely to creating a GUI tool for managing options for a variety of purposes, config file generation dynamic option tweaking being two examples.

The final step here is to address the large number of option getting/setting functions in LBM, and that's what I'll cover in the next post.

UPDATE, 8:42PM the same day

It occurs to me that a couple of the screen grabs I included are kind of "so what-- that's what Pydev/IPython/whatever is supposed to do," and that's all true.

When I originally was working on this post, I wasn't explicitly setting instances to variables with the same name as the option. I thought I'd be terribly clever and format up assignment statements and then exec them dynamically in a for loop in order to create the option variables, something like:

newOptKey = addOptionMapping("some_option",
                             SomeOptionSubclass, "someScope")
exec "%s = newOptKey" % newOptKey.optName

Well, it worked just fine, but the IDEs weren't up to knowing how clever I was; they were only parsing the modules, not executing them, and so tools like Pydev could never tell me anything about my option variables when I'd type Ctrl-space after a module name.

So while I was working on the post, I changed the code to be a regular assignment statement so that the IDEs will give me proper hints, but I forgot how pedestrian that was in relation to the capabilities of IDEs. I was still in that "see, dynamically generated!" mindset, and so I included the screen grabs in my misplaced enthusiasm. I did say I might occasionally wind up with a bit of egg on my face...

Wednesday, May 20, 2009

Opt-out-- simplifying LBM options handling P1

I want to write about the part of the extension that simplifies option handling, but since it’s a pretty straightforward exercise, I’ll also use this as an opportunity to provide a drive-by tutorial in the use of Pyrex for the development of Python extensions, the tool I’m using in binding Python to the LBM library.

Prior to this project I had only used SWIG for building extensions, but I decided to try Pyrex this time around. I have to say, unless I need to expose a library to multiple languages, I’ll probably continue to use Pyrex going forward. Besides being Python-only, its only big drawback is that it doesn’t do any automatic header file parsing, one of the real time-savers with SWIG. But Pyrex really shines in not only letting you write your extension in a dialect of Python itself (rather than in C), but also has direct support for generating extension types. These are types that are implemented in C just like Python lists and dicts, and therefore have the commensurate speed advantages. Being able to create these easily is a real advantage, and I would highly recommend Pyrex for anyone looking to wrap a C library for use in Python.

There were three areas related to options handling I want to address in the extension: first, eliminate the need to provide the address of a variable of the proper type, cast as a void *, when specifying option values, second, assistance in dealing with the extensive set of options supplied by LBM across the various objects upon which options can be set, and third, reduce the large number of function calls available for setting and getting options across LBM objects.

First off, let's deal with the simpler matter of abstracting option values. In the native LBM C API, getting and setting options can be done either with a char * (where possible) or a void pointer to variable of the correct type for the particular option. From the Python perspective, the LBM API expresses too many details of the underlying implementation for a lot of these options, especially when dealing with various integer values. Further, the majority of options involve a single value, something handled nicely by a common option abstraction. There are a few options that deal with structures so these will need to be handled a bit outside of the standard protocol, but that's not the end of the world.

Given the above, I decided to create a base class in Pyrex that established the base option value interface protocol and have the specific option extension types derive from this base. In Pyrex, extension type definitions that are to be shared by different parts of an extension go into a .pxd file, and their implementations go into a .pyx file with the same base file name. Pyrex allows you to define two kinds of methods on extension types: methods which will be visible to Python programs that use the extension type, and methods that are only visible to other Pyrex extensions that are familiar with the actual underlying C structures that the extension deals with. Pyrex-only methods are defined in the .pxd file, while Python-accessible methods are defined in the corresponding .pyx.

This is the Pyrex extension type base class that defines common protocol for all option value objects, as defined in the .pxd file:

cdef class Option:
    cdef void *option
    cdef readonly optName
    cdef void *optRef(self)
    cdef int optLen(self)
    cdef _setValue(self, value)
    cdef toString(self)

The Pyrex reserved word “cdef” has a few uses in in that language: before “class” it identifies a class that should be turned into an extension type. Before the declaration of a variable it identifies what will become an attribute in instances of the extension type, or a local variable in a method or function. And before a method declaration or implementation it identifies methods that will be only accessible from extensions that import the .pxd file. Notice how you can freely mix Python object references (self, value) with C variable declarations; Pyrex takes care of managing each properly. You can also specify argument and return types of a function or method (Pyrex defaults to a Python object).

The above Option class names two data items in the base class, a void * that will refer to the actual underlying option storage attribute, and a read-only variable the holds the option name. The remaining declarations define extension-only methods on this class for getting a direct void pointer to the data item managed by the object, the item's length a low-level mechanism to set its value, and a method to render the option as a string (for calling by the standard __str__ magic method).

The code that implements the class is in the corresponding .pyx file:

cdef class Option:
    """
    The actual option structure's address must be assigned to "option".
    The derived class should create an attribute named "_option" of the
    correct type and implement __cinit__() to set option of the address
    of _option.
    """
    def __init__(self, optName):
        self.optName = optName
        
    def name(self):
        return self.optName
        
    def setValue(self, value):
        self._setValue(value)
        return self
    
    cdef toString(self):
        return self.optName
    
    def __str__(self):
        return self.toString()

    #this next 4 methods are common for many subclasses and so
    #are collected into an include file with the .pxi extension
    cdef _setValue(self, value):
        self._option = value

    cdef int optLen(self):
        return sizeof(self._option)
    
    cdef void *optRef(self):
        return self.option

    def getValue(self):
        return self._option

The code comments mention a “__cinit__” method. This is a special initialization method available to extension types (http://docs.cython.org/docs/special_methods.html) that's invoked before the more familiar __init__ method. At __cinit__ invocation time, you can be sure that all C variables you declared to comprise your extension type's instance attributes have been allocated and initialized to zero, and you are free to perform any dynamic initialization you require. Each derived class is required to implement this method and assign the address of the _option attribute to the option attribute. A couple of examples will illustrate this.

Here are two derived class that handle specific types of option values. First, the entries in the .pxd file:

cdef class UIntOpt(Option):
    cdef unsigned int _option
    
cdef class InAddrOpt(Option):
    cdef lbmw.lbmh.in_addr _option

And the corresponding implementations from the .pyx file:

cdef class UIntOpt(Option):
    def __cinit__(self, optName):
        self.option = &self._option
        
    cdef toString(self):
        ps = "%s:%d" % (self.optName, self._option)
        return ps

    def __str__(self):
        return self.toString() 
        
    include "coption.pxi"
        
cdef class InAddrOpt(Option):
    def __cinit__(self, optName):
        self.option = &self._option
    
    cdef toString(self):
        ps = "%s:%d" % (self.optName, self._option.s_addr)
        return ps

    def __str__(self):
        return self.toString() 
        
    cdef _setValue(self, value):
        self._option.s_addr = value
    
    cdef int optLen(self):
        return sizeof(self._option)
        
    cdef void *optRef(self):
        return self.option
    
    def getValue(self):
        return self._option.s_addr

Notice that UIntOpt class includes a file coption.pxi; this file contains the implementation for the _setValue, optRef, optLen and getValue methods for simple option data types. In Pyrex, the include directive is a simple textual inclusion of one file into another at the current scoping level, and gives us a way to bring in duplicate code in situations where inheritance and overriding won't work (the reason why inheritance won't work here is a bit too subtle to cover in this post). These standard methods aren't used in the InAddrOpt class as the option being managed is a struct and special code is needed to set and get the value of the option.

So far, this doesn't look terribly interesting. If this was all there was to it, it really doesn't get any closer to the design goals I stated earlier, namely that implementation details should be hidden unless they're important. If users had to deal with only these classes, they'd still need to know which class to instantiate for each particular option they wanted to use, thus still expressing the implementation details of the option.

Hiding this bit of detail is the business of the next portion of the option handing system in which the wrapper provides assistance to the user in using the proper type of option and knowing which LBM objects that option can be used with. That's the topic of the next post.

diary of a wrap

Friday, August 7, 2009

Incoming Revisited-- You're Never Too Old To Learn

Wednesday, July 8, 2009

Incoming, Part 3-- back to Pythonland

Saturday, June 20, 2009

Incoming, Part 2-- creating speedy receivers

Thursday, June 4, 2009

Incoming-- wrapping up message receivers

Saturday, May 30, 2009

Top gear-- how fast can we drive Python?

Sunday, May 24, 2009

Opt-out-- simplifying LBM options handling P3

Friday, May 22, 2009

Opt-out-- simplifying LBM options handling P2

Wednesday, May 20, 2009

Opt-out-- simplifying LBM options handling P1

Python Love

About Me

Links

Blog Archive

Friday, August 7, 2009

Wednesday, July 8, 2009

Saturday, June 20, 2009

Thursday, June 4, 2009

Saturday, May 30, 2009

Sunday, May 24, 2009

Friday, May 22, 2009

Wednesday, May 20, 2009

Python Love

About Me

Mouthing-off notification

Links

Blog Archive