Monday, May 18, 2009

Bipolar-- the highs and lows of wrapping

One of the benefits of an exploratory project of this nature is that because you don’t have a fixed design path to follow you have ample opportunity to refactor when you have one of those “Ooo! That would be a interesting approach!” moments.

But in my mind development needs to be informed by an overall view, a philosophy that drives the shape of what you’re creating. This helps encourage consistency and symmetry, qualities that a good API must have.

I needed such a set of design signposts for the extension module I was creating. Of particular concern was the establishing the “level” that the extension was to express. As befitting a product whose primary goal is high performance and low latency, the C API for 29West is fairly low level. There’s lots of fine-grained distinctions regarding integer sizes in function call arguments (signed/unsigned, short/int/long), typedefs of functions taking pointers to typedefs containing function pointers which take other typedefs as arguments…this sort of thing.

While understandable in a C API, a lot of this isn’t the kind of thing one expects or wants to encounter in a Python binding. An implicit goal of most Python modules is to provide a reasonably high-level abstraction on the semantics they’re implementing, do lots of useful things for you, and give you hooks to override default behavior.

But too high an abstraction can make the power of the underlying library unavailable, or worse, morph the semantics of the API into something alien to users of the C API. I think there’s a strong argument that extension writers shouldn’t be looking for an opportunity to overhaul an API (unless that API is particularly horrid), but largely just impedance-match the existing API to the target language via the extension. We want the concepts in the extension familiar enough that a 29West C programmer can look at it and understand what’s going on, while a Python programmer can pick it up and use it to the full advantage available due to Python.

Given this, I developed a loose set of design goals for the extension:
  • Shrink the interface in size. As I mentioned before, 29West’s C API is big, a consequence of their approach to creating an object-like C interface. This approach requires a uniquely named C function for each operation on every object, sort of pre-mangled C++ method names. Where possibly, I wanted to collapse multiple functions into a single polymorphic method call whose implementation would take care of the required differentiation. This way the interface becomes a good deal smaller and easier to keep in your head.

  • Hide implementation details unless they are important. This is an important tenant of Python itself. There’s no reason that users should have to worry about the number of bytes needed to represent various data items; only the allowable range of values matter. This, coupled with the mandate from the point above, will allow the Python API to provide a polymorphic interface that does the right thing in various circumstances, and only require the user concern themselves with data types when necessary.

  • Minimize the number of new objects introduced. Each new object represents a new concept that needs to be fit into the overall semantic framework that the 29West API already represents. Additions to this set of concepts should be done with care.

  • Automate repetitive tasks. Kind of another standard Python design goal, but well worth making explicit.

  • Replace C idioms with Python idioms. An obvious goal, but in a lot of ways this becomes the most subtle mandate to fulfill.

  • Strive for speed. While I recognize that Python is unlikely to match C’s speed, that’s no reason for complacency. 29West is predicated on high performance, and so this extension should make best efforts to run as quickly as possible.
With these design goals in mind, the following initial implementation goals fell out:
  • Provide an objective interface. This one is obvious, and easy to achieve. The 29West API is well structured to facilitate this, with each function in the C API looking a lot like a method of a class. Taking advantage of this helps with partitioning the available functions into smaller, more manageable sets.

  • Reduce the number of methods using keyword arguments. The 29West C API contains similarly named but operationally different versions of a number of functions, I suspect because C provides no mechanism for overloading function arguments. There’s no need to expose all of these flavors in Python; I’ll simply enable their use by allowing the user to supply the optional keyword arguments.

  • Hide specific data types in the options setting interfaces. From a Python user’s perspective, I don’t want to have to worry about which option calls use shorts, ints, longs, or signed/unsigned integers. I want to set a port, I’m a network programmer, I understand the ranges available for port values, let me just supply a number. I want the user to be able to not worry so much about data types when setting option values, but still have the right thing happen. Having said that, it’s probably worthwhile having some code that guards against out of range values.

  • Collapse all the option handling calls into just a few methods. Most “objects” in the C API have various options that can be set on them. These options can be set either directly on the object or on an optional associated attribute object that is used when creating the target object. Again, most likely due to the restrictions imposed by C, this has resulted in a thicket of get value/set value functions that manipulate the options on these objects. I wanted to see these all collapse down into a handful of method calls.

  • Client data pointers go away. These are needed in C but are really unnecessary in objective languages—the callback itself can encompass the data that would otherwise be referenced by a void *. This makes all method and callback signatures a little simpler. Bound methods and closures are two handy Python mechanisms that would support this mandate.

  • Few limitations on the kinds of objects that can be used for callbacks. Lots of kinds of things in Python are “callable”; besides functions, there are static/class methods, bound methods, callable instance objects, even classes themselves. I didn’t want to limit the user to just one of these mechanisms for implementing callbacks.

  • Use exceptions rather than return codes. Return codes are a standard C idiom, but Python’s support of exceptions provides a mechanism for more streamlined code. Exceptions let us write a block of code in which we don’t have to check every operation for success, making the code more succinct and easier to read, while not losing any information when something does go wrong. The downside to exceptions (in general) is that they can rob a program of performance.

  • Avoid the standard wrapping approach. Frequently, creating a Python extension module is handled in a two-tier approach: the underlying C API is directly mapped through the Python C API into Python space, and then a pure-Python module provides the objective face for the extension. This would work, but would probably run much more slowly than I’d like. In the case of the extensions for 29West, it would be much better if the end-user Python classes were implemented as extension types directly in C; these would run much faster.

  • Simplify object lifecycle management. LBM objects have dependency relationships between them at runtime, and because of this they have to cleaned up according to a specific algorithm. It seems that it would be helpful to the user if the extension could provide some assistance in cleanup operations, making it simpler to dispose of unneeded objects and their dependents.
The reality here is that some of these implementation goals are the result of those “Ooo!” moments coming along during implementation. The important point is that they all can be traced back to the design goals for the system. This isn't to say that every implementation detail must be traced back to a design goal; good ideas can be left behind if the goals are treated as a straitjacket. But when considering next steps, the design goals provide a way to frame your thinking in order to progress in a way that keeps the extension consistent.

No comments:

Post a Comment