Saturday, March 26, 2011

Handling protobuf nulls in Scala

Protobuf is invented by Google as fast binary format for data transfer and storage. It has advantages over XML and JSON, look here and here.

There is one 'feature' of protobuf which doesn't feel neither comfortable nor logical: there are no nulls. There are optional fields, but their semantic doesn't feel natural in Java. Optional in protobuf means "doesn't have to be set" in place of conventional "doesn't have to have a value". The former is recognized as no setter call, the latter is recognized as some special value with "no value" semantic (null in Java). When field setter is not called, getter will return default value (no nulls there either) and isSet will return false. This is how protobuf is designed and there is no way to change it. Battle for nulls in protobuf for Java seems to be lost.

What about Scala? How can 'optional' in protobuf be matched to Scala Option? We can live perfectly without nulls in Scala, it is even encouraged since there is Option class. I think we have to distinguish protobuf optional fields with explicitly provided default value from ones that do not do that. Let's look at the following example:


message Example {
optional string property1 = 1;
optional string property2 = 2 [default "whatever"];
}


I would map it to Scala in this way:


class Example {
var property1: Option[String] = None
var property2: String = "whatever"
}


Obviously, neither of both properties can have nulls. How about mapping in the opposite direction? I mean from Scala class to protobuf schema? This doesn't seems to be so obvious. If you look at the Scala code, first thought is that only property1 is optional. Things get even worse if we decide to implement it as following:


var property1: Option[String] = Some("value")


What is the default value for property1 then? and how is it different from None? Life is so much easier with null values!

I think about following solution:
1. Option properties are always translated to optional protobuf fields without explicit default values. Deserialization implementation has to set None value explicitly if field is omitted in the message.
2. All other properties are translated to protobuf required fields unless not-null default value provided somehow. Default value must be set explicitly to the object property if it is omitted in the message.

Cannot we guess default property value (not confuse with default protobuf field value! those are different planets in Google Universe) by creating a fresh instance and reading the property value? This would save tracking missing fields and calling property setter explicitly? Well, no. Property doesn't have to be initialized with a constant expression necessarily. Think about id = new UUID for example. Protobuf default value must be a constant and I see no way to derive it from an object. It can be provided in another way, via annotation for example.

Rule #2 is not handy actually for schema evolution according to protobuf guidelines. First, required fields are there forever and cannot removed. Second, new fields must be optional. It is better to define as many fields optional as possible thus. To be able to do this, we have to have Zero value per type. This is not a problem for standard types, but for bean types it gets complicated. First, it must be a constant. Beans are mutable by their nature, thus we have to create new instance each time and initialize it's properties to default values. Second, during serialization we have to check all bean properties and if they are equal to default, omit property serialization.

So, we can translate all properties to optional protobuf fields. Here are our rules:
1. All properties are translated to optional fields.
2. Per type there is zero() function defined, it returns default value for a property in protobuf sense. For Option it is None, for numbers it is 0, for String it is "" etc.
3. Default field value can be provided explicitly, except for properties with Option type.

Friday, March 18, 2011

ScalaBeans Roadmap (work in progress)

Scala has good interoperability with Java code. One area where it doesn't work so good is reflection. Not reflection by itself - this (fortunatelly) not a problem, but Scala design patterns and the way we model data in Scala doesn't match very good with JavaBeans - mainstream Java data modeling framework. Yes, we have @BeanProperty annotation in Scala, but using it everywhere seems ugly to me and literally following JavaBeans specification in Scala doesn't feel naturally and is quite limiting. We have all this vals and vars in Scala, Option class, case classes, great collection framework - this all doesn't map precisely to JavaBeans.

Where do we need it? I can name at least 2 areas: persistence and GUI. Absence of ScalaBeans (a la JavaBeans) is in my humble opinion the major limiting factor for development of native Scala frameworks in this areas and integrating existing Java frameworks with Scala. As an example: wouldn't it be nice if you could code JSF managed beans using vars (without ugly annotations), Options, Scala Collections? Think about other great Java libraries which rely on JavaBeans specification and can discover your data structures at runtime and serialize/deserialize properly in other forms, suitable for persistence, data transfer or GUI.

At first this absence might look like a non-issue. If you really think it is not, please answer me couple of questions: what is a property in Scala? There are no getters and setters which begin with "get" and "set", how do you recognize them at run-time? Do you want to deal with nulls or require each optional property have Option type? Which implementations do you want to use for collection interfaces by default? I faced all this issues when working with Scala and thought that it would be nice to have common ground with other developers when answering them. There is much more, believe me, and answers are sometimes tricky and not so obvious. For sure when you work with reflection. And then, even if you know the answers and agree with the rest of your team on them - there is no standard framework to support it.

Here I want to start with some of this questions and give my answers to them. Feel free to provide your feedback - it is all welcome.

Properties

Let's start with the basic question: What is a property in Scala? When we declare a val, following things happen: Scala compiler generates a private field and a method with the same name. For private[this] only private field is generated, no getter. When we declare a var, compiler generates a private field, getter and setter (here and further I mean Scala getters and setters: getter has the same name as property, setter adds '_eq$' to the property name). We can also declare getter and setter functions explicitly in the code, there will be no field member generated (unless we do it ourselves explicitly).

So far so good. Now the tricky part: do we want to discover non-public properties? From the GUI point of view not: we are not interested in intimate state of the object, we want only see how it looks like (only public members then). From the persistence (and data transfer) point of view it is the other way around: we do not care about how it looks like, we do care about it's state and how we can reproduce it later. Most JPA Entities code use fields to discover object properties (it is an option there - you can use either fields or JavaBean getters/setters).

So, we get 2 views (2 discovering strategies) at an object via reflection:
  • private field members
  • union of: matching public getter+setter (read-write property), matching private field + public getter (read-only property), public setter (write-only property)
Both will match if you use only public vals and/or vals. Reflection framework must support both strategies.

Default constructors

Ok, we discovered all the properties, but how do we instantiate an object? Java took the easy way: default constructor with no args. In Scala this very limiting approach - you will cut off the case classes in this way, to say at least. Then it will also limit possibilities for immutable objects with all the consequences of this choice. And it is so easy to declare vals and vars directly in the constructor - much more elegant than in Java. I cannot live without all this stuff.

I want reflection framework to provide me the names of the properties used in the constructor. However this rises another issue: if you chose for public properties only for your property discovery strategy then all your constructor parameters must be public vals or vars. This requirement becomes tricky to achieve when you use inheritance. Look at this code:


class This(val p1:String)
class That(myP1:String, val p2:String) extends This(myP1)


Constructor parameter names can be discovered by paranamer (constructor bytecode parsing library, works also with Scala objects), but myP1 is not public, so your view of the object is not sufficient to instantiate it. If we use 'field' strategy, here is another issue: same value will appear twice: once as "p1" and once as "myP1".

If you have ideas how to deal whith this, let me know.

Type information

Reflection was so easy without generic types. When we move to generics we have to deal with following type classes: Class (a.k.a. erasure, no generics information), ParametherizedType, WildcardType, TypeVariable, GenericArrayType. It is tricky to decide at runtime if a type is subtype of another type (do not confuse with inheritance - there are no problems, subtype has more generic definition, including covariance and contravariance). I can live with 'subtype' ambiguity, but I just want to be able to make good guess about type parameter value (existential type) in easy way. Manifest is good approximation of what I want, but there is no way to create Manifest from java.lang.reflect.Type object. Reflection framework must provide it with documented 'best effort' strategy to deal with existential type ambiguity at runtime.

//TODO: provide examples and more explanation

Optional fields

We do not use nulls in Scala, we have Option. Period. Optional properties must have Option type and use it properly - we all agree about this, don't we? Reflection framework has to deal with null and Some(null) however by converting them to None on the fly. For easy integration with Java it has to provide at least following functions for property descriptors: javaType (actual value type), javaGet, javaSet (nullify None and unbox Some).

Collections

Scala collections neither subclass nor implement Java collections. Those are 2 distinct implementations. There is JavaConverters object in Scala that provides conversions between these libraries using wrappers. I think it is a good idea to have here same functions as for Option fields: javaType (returning Java collection type), javaGet, javaSet (converting on the fly Java to/from Scala collections).

Another useful thing is getting a Builder instance for corresponding collection. It seems to be possible to get companion class at runtime using reflection (yes, I know, ugly and not reliable, too much dependency on how things are actually compiled, but works for Scala 2.8.1). This doesn't work for sorted collections however since they need Ordering object which is usually provided with implicits. Another way to get a Builder is to require all collection type properties to be initialized with an empty collection. This can be read by reflection and getting new builder from it is very easy. This choice has the downside: we cannot get a Builder before instantiating an object.

Enumerations

...