Saturday, March 26, 2011

Handling protobuf nulls in Scala

Protobuf is invented by Google as fast binary format for data transfer and storage. It has advantages over XML and JSON, look here and here.

There is one 'feature' of protobuf which doesn't feel neither comfortable nor logical: there are no nulls. There are optional fields, but their semantic doesn't feel natural in Java. Optional in protobuf means "doesn't have to be set" in place of conventional "doesn't have to have a value". The former is recognized as no setter call, the latter is recognized as some special value with "no value" semantic (null in Java). When field setter is not called, getter will return default value (no nulls there either) and isSet will return false. This is how protobuf is designed and there is no way to change it. Battle for nulls in protobuf for Java seems to be lost.

What about Scala? How can 'optional' in protobuf be matched to Scala Option? We can live perfectly without nulls in Scala, it is even encouraged since there is Option class. I think we have to distinguish protobuf optional fields with explicitly provided default value from ones that do not do that. Let's look at the following example:


message Example {
optional string property1 = 1;
optional string property2 = 2 [default "whatever"];
}


I would map it to Scala in this way:


class Example {
var property1: Option[String] = None
var property2: String = "whatever"
}


Obviously, neither of both properties can have nulls. How about mapping in the opposite direction? I mean from Scala class to protobuf schema? This doesn't seems to be so obvious. If you look at the Scala code, first thought is that only property1 is optional. Things get even worse if we decide to implement it as following:


var property1: Option[String] = Some("value")


What is the default value for property1 then? and how is it different from None? Life is so much easier with null values!

I think about following solution:
1. Option properties are always translated to optional protobuf fields without explicit default values. Deserialization implementation has to set None value explicitly if field is omitted in the message.
2. All other properties are translated to protobuf required fields unless not-null default value provided somehow. Default value must be set explicitly to the object property if it is omitted in the message.

Cannot we guess default property value (not confuse with default protobuf field value! those are different planets in Google Universe) by creating a fresh instance and reading the property value? This would save tracking missing fields and calling property setter explicitly? Well, no. Property doesn't have to be initialized with a constant expression necessarily. Think about id = new UUID for example. Protobuf default value must be a constant and I see no way to derive it from an object. It can be provided in another way, via annotation for example.

Rule #2 is not handy actually for schema evolution according to protobuf guidelines. First, required fields are there forever and cannot removed. Second, new fields must be optional. It is better to define as many fields optional as possible thus. To be able to do this, we have to have Zero value per type. This is not a problem for standard types, but for bean types it gets complicated. First, it must be a constant. Beans are mutable by their nature, thus we have to create new instance each time and initialize it's properties to default values. Second, during serialization we have to check all bean properties and if they are equal to default, omit property serialization.

So, we can translate all properties to optional protobuf fields. Here are our rules:
1. All properties are translated to optional fields.
2. Per type there is zero() function defined, it returns default value for a property in protobuf sense. For Option it is None, for numbers it is 0, for String it is "" etc.
3. Default field value can be provided explicitly, except for properties with Option type.

No comments:

Post a Comment