Wednesday, May 20, 2009

Opt-out-- simplifying LBM options handling P1

I want to write about the part of the extension that simplifies option handling, but since it’s a pretty straightforward exercise, I’ll also use this as an opportunity to provide a drive-by tutorial in the use of Pyrex for the development of Python extensions, the tool I’m using in binding Python to the LBM library.

Prior to this project I had only used SWIG for building extensions, but I decided to try Pyrex this time around. I have to say, unless I need to expose a library to multiple languages, I’ll probably continue to use Pyrex going forward. Besides being Python-only, its only big drawback is that it doesn’t do any automatic header file parsing, one of the real time-savers with SWIG. But Pyrex really shines in not only letting you write your extension in a dialect of Python itself (rather than in C), but also has direct support for generating extension types. These are types that are implemented in C just like Python lists and dicts, and therefore have the commensurate speed advantages. Being able to create these easily is a real advantage, and I would highly recommend Pyrex for anyone looking to wrap a C library for use in Python.

There were three areas related to options handling I want to address in the extension: first, eliminate the need to provide the address of a variable of the proper type, cast as a void *, when specifying option values, second, assistance in dealing with the extensive set of options supplied by LBM across the various objects upon which options can be set, and third, reduce the large number of function calls available for setting and getting options across LBM objects.

First off, let's deal with the simpler matter of abstracting option values. In the native LBM C API, getting and setting options can be done either with a char * (where possible) or a void pointer to variable of the correct type for the particular option. From the Python perspective, the LBM API expresses too many details of the underlying implementation for a lot of these options, especially when dealing with various integer values. Further, the majority of options involve a single value, something handled nicely by a common option abstraction. There are a few options that deal with structures so these will need to be handled a bit outside of the standard protocol, but that's not the end of the world.

Given the above, I decided to create a base class in Pyrex that established the base option value interface protocol and have the specific option extension types derive from this base. In Pyrex, extension type definitions that are to be shared by different parts of an extension go into a .pxd file, and their implementations go into a .pyx file with the same base file name. Pyrex allows you to define two kinds of methods on extension types: methods which will be visible to Python programs that use the extension type, and methods that are only visible to other Pyrex extensions that are familiar with the actual underlying C structures that the extension deals with. Pyrex-only methods are defined in the .pxd file, while Python-accessible methods are defined in the corresponding .pyx.

This is the Pyrex extension type base class that defines common protocol for all option value objects, as defined in the .pxd file:

cdef class Option:
cdef void *option
cdef readonly optName
cdef void *optRef(self)
cdef int optLen(self)
cdef _setValue(self, value)
cdef toString(self)
The Pyrex reserved word “cdef” has a few uses in in that language: before “class” it identifies a class that should be turned into an extension type. Before the declaration of a variable it identifies what will become an attribute in instances of the extension type, or a local variable in a method or function. And before a method declaration or implementation it identifies methods that will be only accessible from extensions that import the .pxd file. Notice how you can freely mix Python object references (self, value) with C variable declarations; Pyrex takes care of managing each properly. You can also specify argument and return types of a function or method (Pyrex defaults to a Python object).

The above Option class names two data items in the base class, a void * that will refer to the actual underlying option storage attribute, and a read-only variable the holds the option name. The remaining declarations define extension-only methods on this class for getting a direct void pointer to the data item managed by the object, the item's length a low-level mechanism to set its value, and a method to render the option as a string (for calling by the standard __str__ magic method).

The code that implements the class is in the corresponding .pyx file:

cdef class Option:
"""
The actual option structure's address must be assigned to "option".
The derived class should create an attribute named "_option" of the
correct type and implement __cinit__() to set option of the address
of _option.
"""
def __init__(self, optName):
self.optName = optName

def name(self):
return self.optName

def setValue(self, value):
self._setValue(value)
return self

cdef toString(self):
return self.optName

def __str__(self):
return self.toString()

#this next 4 methods are common for many subclasses and so
#are collected into an include file with the .pxi extension
cdef _setValue(self, value):
self._option = value

cdef int optLen(self):
return sizeof(self._option)

cdef void *optRef(self):
return self.option

def getValue(self):
return self._option
The code comments mention a “__cinit__” method. This is a special initialization method available to extension types (http://docs.cython.org/docs/special_methods.html) that's invoked before the more familiar __init__ method. At __cinit__ invocation time, you can be sure that all C variables you declared to comprise your extension type's instance attributes have been allocated and initialized to zero, and you are free to perform any dynamic initialization you require. Each derived class is required to implement this method and assign the address of the _option attribute to the option attribute. A couple of examples will illustrate this.

Here are two derived class that handle specific types of option values. First, the entries in the .pxd file:

cdef class UIntOpt(Option):
cdef unsigned int _option

cdef class InAddrOpt(Option):
cdef lbmw.lbmh.in_addr _option
And the corresponding implementations from the .pyx file:

cdef class UIntOpt(Option):
def __cinit__(self, optName):
self.option = &self._option

cdef toString(self):
ps = "%s:%d" % (self.optName, self._option)
return ps

def __str__(self):
return self.toString()

include "coption.pxi"

cdef class InAddrOpt(Option):
def __cinit__(self, optName):
self.option = &self._option

cdef toString(self):
ps = "%s:%d" % (self.optName, self._option.s_addr)
return ps

def __str__(self):
return self.toString()

cdef _setValue(self, value):
self._option.s_addr = value

cdef int optLen(self):
return sizeof(self._option)

cdef void *optRef(self):
return self.option

def getValue(self):
return self._option.s_addr
Notice that UIntOpt class includes a file coption.pxi; this file contains the implementation for the _setValue, optRef, optLen and getValue methods for simple option data types. In Pyrex, the include directive is a simple textual inclusion of one file into another at the current scoping level, and gives us a way to bring in duplicate code in situations where inheritance and overriding won't work (the reason why inheritance won't work here is a bit too subtle to cover in this post). These standard methods aren't used in the InAddrOpt class as the option being managed is a struct and special code is needed to set and get the value of the option.

So far, this doesn't look terribly interesting. If this was all there was to it, it really doesn't get any closer to the design goals I stated earlier, namely that implementation details should be hidden unless they're important. If users had to deal with only these classes, they'd still need to know which class to instantiate for each particular option they wanted to use, thus still expressing the implementation details of the option.

Hiding this bit of detail is the business of the next portion of the option handing system in which the wrapper provides assistance to the user in using the proper type of option and knowing which LBM objects that option can be used with. That's the topic of the next post.

No comments:

Post a Comment