Antoine Kalmbach's blog

Recutils, GOOPS and virtual slots

For the past month or so I’ve been contributing to GNU Recutils, a set of tools for editing human-readable plain text databases. It’s a cool project in its own right, I’ve been using recutils myself for tracking workouts and storing cooking recipes. The cool part of it is its attempt to be both human-readable and machine-readable, which makes it very easy to use programmatically and then with a simple text editor.

The powerful querying facilities of recutils is what turns it into a thing of beauty. In particular, selection expressions are expressions for querying recfiles. For instance, here’s how I would query exercises in my workout log for squats:

recsel -t Exercise -e "Name ~ 'Squat'" workouts.rec

This would match records of type Exercise where the Name field matches regular expressions, so Squat will match all exercise varieties with the word Squat in it.

The machine readability makes it easy to write programs or tools that interact with recfiles. I’ve become maintainer of the Emacs recfile major mode rec-mode. The major mode makes heavy use of the command line tools of the recutils suite to do provide automatic fixing and parsing of recfiles.

if it’s possible to put Lisp in it, someone will

For fun and profit, I’ve also been writing GNU Guile bindings for librec, the library powering recutils itself. The bindings actually interface with the C library directly using Guile’s amazing C extensions. I was interested in using recfiles in a Guile program, and while it would not have been too difficult to write a parser myself, I thought it was more important to not write one myself. What is more, Guile makes it almost too easy to wrap libraries, I had a functioning Scheme interface for parsing records in less than an hour.

Let’s explore what that interface looks like. We start with the simplest data type in librec, fields.

A recutils record is defined as an ordered collection of fields. Below is a record of three fields:

Book: Structure and Interpretation of Computer Programs
Author: Harold Abelson
Author: Gerald Sussman

The inner field type of librec is defined as rec_field_t, which is an opaque data type wrapping rec_field_s:

typedef struct rec_field_s *rec_field_t;

The underlying rec_field_s structure is a bit more complicated since it includes location data for the field, but for our example imagine it contains just the fields name and value, which are null-terminated strings. You don’t need to know anything about that, since librec offers an extensive API for working with the opaque types.

To make a new field, you would write:

rec_field_t field = rec_field_new("Author", "Harold Abelson");

To get the value and name, you use rec_field_value and rec_field_name:

const char *name = rec_field_name(field); /* "Author" */
const char *value = rec_field_value(field); /* "Harold Abelson */

To modify its name or value, you can use:

rec_field_set_name(field, "Book");
rec_field_set_value(field, "Structure and Interpretation of Computer Programs");

How do we wrap these into Guile, using C extensions? To start with, we can simply make some Scheme methods that work with plain pointers and pass that pointer value around.

SCM_DEFINE (scm_field_new, "new-field", 2, 0, 0, (SCM scm_name, SCM scm_value),
            "Make a new field from a string and value.")
{
  SCM_ASSERT_TYPE(scm_is_string(scm_name), scm_name, 1, "new-field", "string");
  SCM_ASSERT_TYPE(scm_is_string(scm_value), scm_value, 2, "new-field", "string");

  const char *name = scm_to_utf8_string(scm_name);
  const char *value = scm_to_utf8_string(scm_value);

  rec_field_t field = rec_field_new (name, value);

  if (!field)
    return SCM_BOOL_F;

  return scm_from_pointer(field, destroy_field);
}

This defines two functions: destroy_field for letting the garbage collector get rid of unused fields, and then a scm_field_new function defined using the SCM_DEFINE macro. The procedure is straightforward: assert both parameters are strings, convert to const char*, create the field and return it if it was successful, otherwise return Scheme false #f. The last bit creates a pointer object to store the pointer address, and passes the destroy_field as the finalizer parameter for the garbage collector.

In the Guile REPL, it looks like this:

scheme@(recutils)> (new-field "foo" "bar")
$2 = #<pointer 0x7fc0654040f0>

OK, it seems to be a pointer all right. Let’s define some helper methods to work with that:

SCM_DEFINE(scm_field_get_name, "field-name", 1, 0, 0, (SCM ptr),
           "Get the name of a field")
{
  rec_field_t field = (rec_field_t)scm_to_pointer(ptr);

  const char *name = rec_field_name(field);
  
  return scm_from_utf8_string(name);
}

SCM_DEFINE(scm_field_get_value, "field-value", 1, 0, 0, (SCM ptr),
           "Get the value of a field")
{
  rec_field_t field = (rec_field_t)scm_to_pointer(ptr);

  const char *value = rec_field_value(field);
  
  return scm_from_utf8_string(value);
}

Loading this extension into the REPL, we get

scheme@(recutils)> (new-field "foo" "bar")
$1 = #<pointer 0x7fa123d0b980>
scheme@(recutils)> (field-name $1)
$2 = "foo"
scheme@(recutils)> (field-value $1)
$3 = "bar"

What about modifying the field? Well, that’s easy:

SCM_DEFINE(scm_field_set_name, "set-field-name!", 2, 0, 0, (SCM ptr, SCM scm_name),
           "Set the name of a field")
{
  SCM_ASSERT_TYPE(scm_is_string(scm_name), scm_name, 1, "set-field-name!", "string");
  rec_field_t field = (rec_field_t)scm_to_pointer(ptr);

  const char *name = scm_to_utf8_string(scm_name);

  bool result = rec_field_set_name(field, name);

  return scm_from_bool(result);
}


SCM_DEFINE(scm_field_set_value, "set-field-value!", 2, 0, 0, (SCM ptr, SCM scm_value),
           "Set the value of a field")
{
  SCM_ASSERT_TYPE(scm_is_string(scm_value), scm_value, 1, "set-field-value!", "string");
  rec_field_t field = (rec_field_t)scm_to_pointer(ptr);

  const char *value = scm_to_utf8_string(scm_value);

  bool result = rec_field_set_value(field, value);

  return scm_from_bool(result);
}

Using all this in the REPL yields:

scheme@(recutils)> (new-field "foo" "bar")
$1 = #<pointer 0x7ffcac406530>
scheme@(recutils)> (set-field-name! $1 "Blah")
$2 = #t
scheme@(recutils)> (set-field-value! $1 "Test")
$3 = #t
scheme@(recutils)> (field-name $1)
$4 = "Blah"
scheme@(recutils)> (field-value $1)
$5 = "Test"

There we go!

the smell of raw pointers

OK, this looks great. But somehow it feels funny to pass a raw pointer object around as a parameter. Ideally, I’d like to define some sort of structure that wraps the raw pointer into something less raw. Well, turns out Guile has exactly that in the define-wrapped-pointer-type macro! With the above constructor and procedures, we can go further:

(define-wrapped-pointer-type
  field-ptr field-ptr? wrap-field-ptr unwrap-field-ptr
  (lambda (ptr port)
    (format port "#<field-ptr name=~s value=~s 0x~x>"
            (field-name (unwrap-field-ptr ptr))
            (field-value (unwrap-field-ptr ptr))
            (pointer-address (unwrap-field-ptr ptr)))))

What the macro defines are a type name (field-ptr), a predicate (field-ptr?), methods for wrapping and unwrapping, and lastly a printer for pretty printing our pointer. The printer outputs a human readable representation of the printer, in which we leverage the procedures defined above, field-name and field-value.

scheme@(recutils)> (wrap-field-ptr (new-field "Author" "Harold Abelson"))
$2 = #<field-ptr name="Author" value="Harold Abelson" 0x7f8a6950b2d0>
scheme@(recutils)> (field-ptr? $2)
$3 = #t
scheme@(recutils)> (unwrap-field-ptr $2)
$4 = #<pointer 0x7f8a6950b2d0>

This makes it a bit easier to pass around field values so that we can treat them like structures, or records in Scheme parlance. That said, constructing the values is still a bit tedious, especially now that our Scheme user would have to constantly wrap and unwrap values if they are to work with a field.

What if we could work with fields as if they were pure Scheme objects and the underlying machinery – pointers and so forth – would be hidden from us? Well, we can use GOOPS, but first let’s digress into the exciting world of FFI.

why not dynamic FFI?

These days the Guile manual recommends using Dynamic FFI when working with a foreign function interface. That is, the above examples are just C code, but we could have done the same with just regular Scheme using the (system foreign) module. This is what I would do in many other languages (Common Lisp, Python, and so on…). In such a case, I could make my Scheme module completely separate from recutils and librec, since I just need the dynamic library libguile-recutils.so for it to functions. But there are subtle reasons why writing these extensions in C is a good idea.

As I went ahead and wrote the bindings, I had a curious thought: I’m writing functionality for working with recfiles from Guile. But what about adding Guile facilities to recutils? What about letting recutils users extend the programs using Scheme? Wouldn’t it be cool if instead of recutils selection expressions I could pass Scheme programs as the query language? Indeed, this was a topic worth exploring!

The consequence of this was that now I was adding code to recutils itself to link against Guile, which means I will already have a dependency to the Guile C library libguile. So, since I’m now already working with the C API of Guile, limiting myself to the strange world of dynamic FFI was starting to feel rather tedious.

From the start I wanted to work with the real deal: the wrapper types of the Guile extensions would be real wrappers. Each field in Scheme would be represented by a librec C struct underneath. This is so that I can leverage the bidirectional design above, and there is no need to parse or convert values twice when crossing language barriers. So, how do we make a Scheme API that is both nice to use and still C structs underneath? Well, the answer is GOOPS and object-oriented programming!

GOOPS, virtual slots, and you

Working with raw pointers and even pointer records can be painful. It would be much better if we could make fields like this:

(make <field> #:name "Author" #:value "Gerald Sussman")

This is a GOOPS class, of type <field>. The constructor has two keyword arguments #:name and #:value for the rec names.

How can we get a class that has both getters and setters (in terms of slot-ref and slot-set!) that work on the underlying pointer? Easy enough, the answer is virtual slots! If we were to define an ordinary class with slots name and value, Guile would allocate memory for those and if we are to juggle the pointer alongside all of this, both the name and value would be in two places: once, behind the pointer (in C world) and in Scheme, as a slot in the class.

But first, how do we create a class <field> that wraps a pointer? Easy enough, we can use #:init-form as the slot option:

(define-class <field> ()
  ;; Internal pointer type.
  (ptr #:init-form (wrap-rec-ptr (new-field "" ""))
       #:init-keyword #:ptr)

The use of #:init-form causes the following expression to be evaluated every time a new class is instantiated, creating a field with an empty name and value. To get the signature we desire above, we need to use virtual accessors. These let us override the getter and setter #:slot-ref and #:slot-set! respectively which will work on the raw pointer, instead of occupying memory like a normal slot would. This is achieved using #:allocation #:virtual:

(define-class <field> ()
  (ptr #:init-form (wrap-field-ptr (new-field "" ""))
       #:init-keyword #:ptr)

  (name #:init-keyword #:name #:allocation #:virtual
        #:accessor field-name
        #:slot-ref (lambda (field)
                     (%field-name (unwrap-field-ptr (slot-ref field 'ptr))))
        #:slot-set! (lambda (field name)
                      (set-field-name!
                       (unwrap-field-ptr (slot-ref field 'ptr)) name)))
  
  (value #:init-keyword #:value #:allocation #:virtual
         #:accessor field-value
         #:slot-ref (lambda (field)
                      (%field-value (unwrap-field-ptr (slot-ref field 'ptr))))
         #:slot-set! (lambda (field value)
                       (set-field-value! (unwrap-field-ptr (slot-ref field 'ptr)) value))))

Note that the procedures we defined previously in C were renamed to %field-value since it would otherwise conflict with the #:accessor slot option.

So using #:virtual lets us write GOOPS classes and not worry about double allocation. It looks like a regular GOOPS class but actually it is modifying a pointer to a C struct underneath using a C API. Moreover, the biggest benefit of this is the ability to pass values in the constructor. If we didn’t have #:virtual, we’d have to write separate accessor methods like this:

(define-method (f-name (field <field>))
  (%field-name (unwrap-field-ptr (slot-ref field 'ptr))))

(define-method ((setter f-name) (field <field>) name)
  (set-field-name! (unwrap-field-ptr (slot-ref field 'ptr)) name))

But the problem with this and any other approach is that you’d still have memory allocated for the slots. All <field>s will have unnecessary name and value slots allocated. I think the only way to get this behaviour if #:virtual were not available would be to create a custom method for initialize. I think the same applies in other CLOS-like systems (and CLOS itself), but I’m not sure.

context is everything, friends

I don’t think many Guile users will find Scheme bindings for recutils that useful in itself, as a library. Guix uses recfiles in its search output, but its record generation is hand-written, usage not deep enough to warrant using the library.

But I think a case can be made for recutils itself, that is, if recutils were to develop extensibility via Guile, the extension mechanism can load the recutils Scheme module as its base runtime. I discussed the idea over at IRC with Jose Marchesi, the Recutils author and maintainer, and he thought it was a good idea as long as there’s someone there to maintain it.

Maybe this will fly, I don’t know. I don’t see any big technical barriers for it to not work, even if it amounts to just adding Scheme bindings without extending recutils itself. That said, every now and then I’m running into the limitations of selection expressions, so being able to use Scheme as a measure of last resort would be interesting, if nothing else.

As of early December 2020 I have bindings for parsing and creating records and fields, so expect an early release of the Scheme bindings to appear within the next few months.

Have I mentioned I also plan to make Common Lisp bindings as well? Well, now I have, but that’s another story!

Previous: A Guile test runner with an exit code Next: Guile fun times, and blog update