Cl-Protobufs Enumerations

In the last few posts we discussed family life, and before that we created a toy application using cl-protobufs and the ACE lisp libraries. Today we will dive deeper into the cl-protobufs library by looking at Enumerations. We will first discuss enumerations in Protocol Buffers, then we will discuss Lisp Protocol Buffer enums.

Enums:

Most modern languages have a concept of enums. In C++ enumerations are compiled down to integers and you are free to use integer equality. For example

enum Fish {
 salmon,
 trout,
}

void main {
  std::cout << salmon == 0 << std::endl;
}

Will print true. This is in many ways wonderful: enums compile down to integers and there’s no cost to using them. It is baked into the language! 

Protocol Buffers are available for many languages, not just C++. You can find the documentation for Protocol Buffer enums here: 

https://developers.google.com/protocol-buffers/docs/proto#enum

Each language has its own way to support enumeration types. Languages like C++ and Java, which have built-in support for enumeration types, can treat protobuf enums like any other enum. The above enum could be written (with some caveats) in Protocol Buffer as:

enum Fish {
  salmon = 0;
  trout = 1;
}

You should be careful though, Protoc will give a compile warning that enum 0 should be a default value, so 

enum Fish {
  default = 0;
  salmon = 1;
  trout = 2;
}

Is preferred.

Let’s get into some detail for the two variants of Protocol Buffers in use.

// Example message to use below.
enum Fish {
  default = 0;
  salmon = 1;
  trout = 2;
}

message Meal {
  {optional} Fish fish;
}

The `optional` label will only be written for proto 2.

Proto 2:

In proto 2 we can always tell whether `Meal.fish` was set. If the field has the `required` label then it must be set, by definition. (But the `required` label is considered harmful; don’t use it.) If the field has an `optional` label then we can check if it has been set or not, so again a default value isn’t necessary.

If the enum is updated to:

// Example message to use below.
enum Fish {
  default = 0;
  salmon = 1;
  trout = 2;
  tilapia = 3;
}

and someone sends fish = tilapia to a system where tilapia isn’t a valid entry, the library is allowed to do whatever it wants! In Java it sets it to the first entry, so Meal.fish would be default! 

Proto 3

In proto3 if the value of Meal.fish is not set, calling its accessor will return the default value which is always the zero value. There is no way to check whether the field was explicitly set. A default value (i.e., a name that maps to the value zero) must always be given, else the user will get a compile error.

If the Fish enum was updated to contain tilapia as above, and someone sent a proto message containing tilapia to a system with an older program that had the message not containing tilapia, the deserializer should save the enum value. That is, the underlying data structure should know it received a “3” for the fish field in Meal. How the accessors return this value is language dependent. Re-serializing the message should preserve this “unrecognized” value.

A common example is: A gateway system wants to do something with the message and then forward it to another system. Even though the middle system has an older schema for the Fish message it needs to forward all the data to the downstream system.

Cl-protobufs:

Now that we understand the basics of enumerations, it is important to understand how cl-protobufs records enumeration values

Lisp as a language does not have a concept of enumerations; what it does understand is keywords. Taking fish as above and running protoc we will get (see readme https://github.com/qitab/cl-protobufs/#enums):

(deftype fish ‘(:default :salmon :trout))

(defun fish-to-int (keyword) 
  (ecase keyword
    (:default 0)
    (:salmon 1)
    (:trout 2)))

(defun int-to-fish (int)
  (ecase int
    (0 :default)
    (1 :salmon)
    (2 :trout)))

Looking at the tilapia example, the enum deserializer preserves the unknown field in both proto2 and proto3. Calling an accessor on a field containing an unknown value will return :%undefined-n. So for tilapia we will see :%undefined-3.

Warning: To get this to work properly we have to remove type checks from protocol buffer enumerations. You can set the field value in a lisp protocol buffer message to any keyword you want, but you will get a serialization error when you try to serialize. This was a long discussion internally, but that design discussion could turn into a blog post of its own.

Conclusion:

The enumeration fields in cl-protobufs are fully proto2 and proto3 compliant. To do this we had to remove type checking. As a consumer, it is suggested that you always type check and handle undefined enumeration values in your usage of protocol buffer enums. We give you a deftype to easily check.

I hope you have enjoyed this deep dive into cl-protobuf enums. We strive to remove as many gotchas as possible.


Thanks to Ron and Carl for the continual copy edits and improvements!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s