Alexander Dvorkovyy

Saturday, March 26, 2011

Handling protobuf nulls in Scala

Protobuf is invented by Google as fast binary format for data transfer and storage. It has advantages over XML and JSON, look here and here.

There is one 'feature' of protobuf which doesn't feel neither comfortable nor logical: there are no nulls. There are optional fields, but their semantic doesn't feel natural in Java. Optional in protobuf means "doesn't have to be set" in place of conventional "doesn't have to have a value". The former is recognized as no setter call, the latter is recognized as some special value with "no value" semantic (null in Java). When field setter is not called, getter will return default value (no nulls there either) and isSet will return false. This is how protobuf is designed and there is no way to change it. Battle for nulls in protobuf for Java seems to be lost.

What about Scala? How can 'optional' in protobuf be matched to Scala Option? We can live perfectly without nulls in Scala, it is even encouraged since there is Option class. I think we have to distinguish protobuf optional fields with explicitly provided default value from ones that do not do that. Let's look at the following example:



message Example {

optional string property1 = 1;

optional string property2 = 2 [default "whatever"];

}

I would map it to Scala in this way:



class Example {

var property1: Option[String] = None

var property2: String = "whatever"

}

Obviously, neither of both properties can have nulls. How about mapping in the opposite direction? I mean from Scala class to protobuf schema? This doesn't seems to be so obvious. If you look at the Scala code, first thought is that only property1 is optional. Things get even worse if we decide to implement it as following:



var property1: Option[String] = Some("value")

What is the default value for property1 then? and how is it different from None? Life is so much easier with null values!

I think about following solution:
1. Option properties are always translated to optional protobuf fields without explicit default values. Deserialization implementation has to set None value explicitly if field is omitted in the message.
2. All other properties are translated to protobuf required fields unless not-null default value provided somehow. Default value must be set explicitly to the object property if it is omitted in the message.

Cannot we guess default property value (not confuse with default protobuf field value! those are different planets in Google Universe) by creating a fresh instance and reading the property value? This would save tracking missing fields and calling property setter explicitly? Well, no. Property doesn't have to be initialized with a constant expression necessarily. Think about id = new UUID for example. Protobuf default value must be a constant and I see no way to derive it from an object. It can be provided in another way, via annotation for example.

Rule #2 is not handy actually for schema evolution according to protobuf guidelines. First, required fields are there forever and cannot removed. Second, new fields must be optional. It is better to define as many fields optional as possible thus. To be able to do this, we have to have Zero value per type. This is not a problem for standard types, but for bean types it gets complicated. First, it must be a constant. Beans are mutable by their nature, thus we have to create new instance each time and initialize it's properties to default values. Second, during serialization we have to check all bean properties and if they are equal to default, omit property serialization.

So, we can translate all properties to optional protobuf fields. Here are our rules:
1. All properties are translated to optional fields.
2. Per type there is zero() function defined, it returns default value for a property in protobuf sense. For Option it is None, for numbers it is 0, for String it is "" etc.
3. Default field value can be provided explicitly, except for properties with Option type.

Friday, March 18, 2011

ScalaBeans Roadmap (work in progress)

Scala has good interoperability with Java code. One area where it doesn't work so good is reflection. Not reflection by itself - this (fortunatelly) not a problem, but Scala design patterns and the way we model data in Scala doesn't match very good with JavaBeans - mainstream Java data modeling framework. Yes, we have @BeanProperty annotation in Scala, but using it everywhere seems ugly to me and literally following JavaBeans specification in Scala doesn't feel naturally and is quite limiting. We have all this vals and vars in Scala, Option class, case classes, great collection framework - this all doesn't map precisely to JavaBeans.

Where do we need it? I can name at least 2 areas: persistence and GUI. Absence of ScalaBeans (a la JavaBeans) is in my humble opinion the major limiting factor for development of native Scala frameworks in this areas and integrating existing Java frameworks with Scala. As an example: wouldn't it be nice if you could code JSF managed beans using vars (without ugly annotations), Options, Scala Collections? Think about other great Java libraries which rely on JavaBeans specification and can discover your data structures at runtime and serialize/deserialize properly in other forms, suitable for persistence, data transfer or GUI.

At first this absence might look like a non-issue. If you really think it is not, please answer me couple of questions: what is a property in Scala? There are no getters and setters which begin with "get" and "set", how do you recognize them at run-time? Do you want to deal with nulls or require each optional property have Option type? Which implementations do you want to use for collection interfaces by default? I faced all this issues when working with Scala and thought that it would be nice to have common ground with other developers when answering them. There is much more, believe me, and answers are sometimes tricky and not so obvious. For sure when you work with reflection. And then, even if you know the answers and agree with the rest of your team on them - there is no standard framework to support it.

Here I want to start with some of this questions and give my answers to them. Feel free to provide your feedback - it is all welcome.

Properties

Let's start with the basic question: What is a property in Scala? When we declare a val, following things happen: Scala compiler generates a private field and a method with the same name. For private[this] only private field is generated, no getter. When we declare a var, compiler generates a private field, getter and setter (here and further I mean Scala getters and setters: getter has the same name as property, setter adds '_eq$' to the property name). We can also declare getter and setter functions explicitly in the code, there will be no field member generated (unless we do it ourselves explicitly).

So far so good. Now the tricky part: do we want to discover non-public properties? From the GUI point of view not: we are not interested in intimate state of the object, we want only see how it looks like (only public members then). From the persistence (and data transfer) point of view it is the other way around: we do not care about how it looks like, we do care about it's state and how we can reproduce it later. Most JPA Entities code use fields to discover object properties (it is an option there - you can use either fields or JavaBean getters/setters).

So, we get 2 views (2 discovering strategies) at an object via reflection:

private field members
union of: matching public getter+setter (read-write property), matching private field + public getter (read-only property), public setter (write-only property)

Both will match if you use only public vals and/or vals. Reflection framework must support both strategies.

Default constructors

Ok, we discovered all the properties, but how do we instantiate an object? Java took the easy way: default constructor with no args. In Scala this very limiting approach - you will cut off the case classes in this way, to say at least. Then it will also limit possibilities for immutable objects with all the consequences of this choice. And it is so easy to declare vals and vars directly in the constructor - much more elegant than in Java. I cannot live without all this stuff.

I want reflection framework to provide me the names of the properties used in the constructor. However this rises another issue: if you chose for public properties only for your property discovery strategy then all your constructor parameters must be public vals or vars. This requirement becomes tricky to achieve when you use inheritance. Look at this code:



class This(val p1:String)

class That(myP1:String, val p2:String) extends This(myP1)

Constructor parameter names can be discovered by paranamer (constructor bytecode parsing library, works also with Scala objects), but myP1 is not public, so your view of the object is not sufficient to instantiate it. If we use 'field' strategy, here is another issue: same value will appear twice: once as "p1" and once as "myP1".

If you have ideas how to deal whith this, let me know.

Type information

Reflection was so easy without generic types. When we move to generics we have to deal with following type classes: Class (a.k.a. erasure, no generics information), ParametherizedType, WildcardType, TypeVariable, GenericArrayType. It is tricky to decide at runtime if a type is subtype of another type (do not confuse with inheritance - there are no problems, subtype has more generic definition, including covariance and contravariance). I can live with 'subtype' ambiguity, but I just want to be able to make good guess about type parameter value (existential type) in easy way. Manifest is good approximation of what I want, but there is no way to create Manifest from java.lang.reflect.Type object. Reflection framework must provide it with documented 'best effort' strategy to deal with existential type ambiguity at runtime.

//TODO: provide examples and more explanation

Optional fields

We do not use nulls in Scala, we have Option. Period. Optional properties must have Option type and use it properly - we all agree about this, don't we? Reflection framework has to deal with null and Some(null) however by converting them to None on the fly. For easy integration with Java it has to provide at least following functions for property descriptors: javaType (actual value type), javaGet, javaSet (nullify None and unbox Some).

Collections

Scala collections neither subclass nor implement Java collections. Those are 2 distinct implementations. There is JavaConverters object in Scala that provides conversions between these libraries using wrappers. I think it is a good idea to have here same functions as for Option fields: javaType (returning Java collection type), javaGet, javaSet (converting on the fly Java to/from Scala collections).

Another useful thing is getting a Builder instance for corresponding collection. It seems to be possible to get companion class at runtime using reflection (yes, I know, ugly and not reliable, too much dependency on how things are actually compiled, but works for Scala 2.8.1). This doesn't work for sorted collections however since they need Ordering object which is usually provided with implicits. Another way to get a Builder is to require all collection type properties to be initialized with an empty collection. This can be read by reflection and getting new builder from it is very easy. This choice has the downside: we cannot get a Builder before instantiating an object.

Enumerations

...

Friday, November 5, 2010

Some statistics

Friday, October 29, 2010

Using Scala implicits to limit client's view on the domain model

This is common problem in the Java Enterprise applications: how do we limit client's view on the domain model? I mean, we use JPA Entity Beans to model our domain, then we use it within a transaction in the Service layer to do some stuff and then we want to provide the results to the Presentation layer where is no access to the transaction anymore. The problem is that actually we do not give access to the full domain model, but only to a limited part of it. So, generally speaking, you cannot get all employees of a department if there was no special measures taken about it in the Service layer (or any other layer beneath).

Wouldn't it be nice if compiler would perform such checks? To me it sounds crazy impossible, while certainly desirable feature and I still don't know how to implement it in Java. Commonly used DTO approach quickly leads to several DTOs per entity (like shallow and deep copies + all variations per related entities and their entities and so forth). Code quickly becomes verbose, clumsy and filled with plumbing. In the Java world there is no escape from this looser choice: either DTO rabbit farm or careful programming of the (Web?) GUI part.

I think I've found a way to let compiler solve this problem in Scala.

Let's start with our domain model:



class Person(val id:Long, val email:String, 

 protected[domain] var dep:Department = null, 

 protected[domain] var address:Address = null)

class Department(val code:String, val name:String)

class Address(val street:String)

As you can see, entity relationships are hidden from the code outside the domain package. To provide such access let's define following trait:



trait PersonDepartment {

  implicit def toPersonDepartment(p:Person) = new {

    def getDepartment = p.dep

    def setDepartment(d:Department) = p.dep = d

  }

}

The code for PersonAddress trait is essentially the same, so I omit it here. And now is the tricky part: how do we import this implicit? Scala doesn't allow to import members of traits, only members of object (and packages, ok). Well, how about this code (like Service layer function):



def doThis = {

  val p = new Person(1, "bb")

  val dataView = new PersonDepartment {}

  import dataView._

  p.setDepartment(new Department("aa", "Whatever"))

  p

}

It compiles! There is just one step left: provide the client with our dataView object:



...

  (p, dataView)

}

We return our Person object together with an object who's implicits provide access to protected members of the domain object. The client code will look like this:



val (p, dataView) = doThis 

import dataView._

val dep = p.getDepartment

And it compiles as well! But this one doesn't:



val a = p.getAddress

So, type checking works, compiler does here exactly what we want. Just for fun, let's mixin PersonAddress in the dataView in our function:



def doThis = {

...

  val dataView = new PersonDepartment with PersonAddress {}

...

}

And our p.getAddress on the client side compiles now! Notice that we didn't changed the client code, only the function in the Service layer and still we can get compilation error if the client tries to access parts of the domain model we do not want him to. So, we found a way to manange the client's view of our domain model and we can do it from the Service layer. We can write another function and define there another subset of relationships and tell our client about limitations in type-safe way. All this will be enforced then by Scala compiler. Traits don't have to provide access to the same entity class by the way. Like we can define DepartmentPerson trait and mix it in the dataView together with PersonAddress, all in the same object, there are no obstacles for that from the type system.

This approach can also be used to provide deep-enough copies of our domain objects that will automatically be limited to the selected relationships. I think that relflection will be needed anyway and it rises another questions, but it's possible to write such code.

Hope you enjoyed it and

May the sources be with you! :)

Thursday, October 28, 2010

Transactional Monad for Scala

Scala is a nice programming language that allows both imperative and functional programming style. There is however no standard implementation of monads which would allow to work with outside world (like databases, i/o) in a pure functional way. Since there is no such thing, let's build it! Let's take a web shop as an example and step by step build our monad or whatever we might get along the way. Let's just stay practical and build some useful stuff.

So, how our database monad would look like? Monads are type constructors, so we start with a type constructor:



 trait Transactional[A]

and this is how we would like to use it:



 def findByPK(id:Long):Transactional[Product]

This type signature would indicate that we have a function that returns a Product object as a result of the database transaction. In JPA world it can be managed instance of Product entity. It's already useful on it's own, because now we can distinguish managed entities from not managed, which is a good thing. The code which requires managed entities will always use Transactional and the code which can live with detached ones will use objects directly.

How would we like to use our monad? I mean, encapsulating result of a transaction is good, but what can you do with it? I would say, anything you like. Literally:



trait Transactional[A] {

 def map[B](f:A=>B):Transactional[B]

}

Look at List or Option, they all have this method implemented, even Google wouldn't be the same without such method as we know now. Our function thus could take a Product as a parameter and produce ... , whatever, say HTML:



 def render(p:Product):String

We could write then:



 def renderProduct(id:Long) = findByPK(id) map render

Neat and nice. Very simple, and we've got again something useful. We are still stuck with Transactional[String] but it's just because the end result of all our functions depends on the database transaction, so there is nothing to worry about yet, we are still on track.

It is good that we can combine pure functions with our Transactional thing, but how about other functions that also return Transactional? Like we've got the shopping basket rendered in the same way and want to combine it with the product HTML? In List and Option map() function has a brother which does exactly what we want:



trait Transactional[A] {

 def map[B](f:A=>B):Transactional[B]

 def flatMap[B](f:A=>Transactional[B]):Transactional[B]

}

So, now we can suck following function in our Transactional:



 def renderShoppingBasket:Transactional[String]

and it will look like:



 renderProduct(id) flatMap {_ => renderShoppingBasket}

Cool, but what happend to our product HTML? How do we plug functions which take more than 1 parameter? Say, we have this function to produce the final output:



 def renderProductPage(productHTML:String, shoppingBasket:String):String

How do we plug such things in? This is where Scala for-comprehension come in place. The code looks like this:



for {

 product <- findByPK(id)

 productHTML <- renderProduct(product)

 basketHTML <- renderShoppingBasket

} yield renderProductPage(productHTML, basketHTML)

Do not know about you, but I like this code. All this left arrows are actually translated to flatMap and map calls. To support 'if' we have to add also filter(p:A=>Boolean):Transactional[A]. Not big deal, but handy. We can read this arrows as extraction of object (managed entities?) out of Transactional. Keep in mind that the result will (and should!) always be another Transactional.

Our page is ready, but we still have Transactional[String], not the real String. How do we escape from it? We could simply add a get method which gives us this String, but then it will look more like a JavaBean, not a monad. Monad's are different. They abstract side-effects from pure functions. So, to get out from the Transactional, there must be a ... transaction which has to be commited to give us the outcome. Ok, here you are:



trait Transactional[A] {

 def map[B](f:A=>B):Transactional[B]

 def flatMap[B](f:A=>Transactional[B]):Transactional[B]

 def commit:A // commit transaction and give us outcome

}

What about implementation? Let's start from the beginning, how would our findByPK function look like? How about this (JPA style) :



def findByPK(id:Long) = transactional {em:EntityManager => em.find(id, Product.class) }

I cannot think about simpler function. Let's make it compilable with this implementation:



object Transactional {

 def transactional[A](body: EntityManager => A) = new Transactional {

  def atomic = body

 }

}

If you add import Transactional._ in your Scala code, the example above will compile. But wait, we just made very important design decision as to me. We do not contain the transaction outcome anymore, but a function that takes EntityManager as an argument and gives us the result of some operation on it. So, one more time, just because this is important: Transactional[A] contains an EntityManager => A function. It has following consequences:

We do not need an EntityManager instance when we construct Transactional object. Good, no dependency injection, no ThreadLocal anymore
Our Transactional becomes lazy. It doesn't do anything until we really need it. Cool, will see how it fly.
Whole transaction start/commit/rollback can be placed together, in 1 function. Great! no transaction/connection leaks!

Finally we can implement map and flatMap buddies to do our magic:



trait Transactional[A] {

 def atomic:EntityManager => A

 def map[B](f: A => B):Transactional[B] = transactional[B](f compose atomic)

 def flatMap[B](f: A => Transactional[B]):Transactional[B] = 

  transactional {r => f(atomic(r)).atomic(r)}

}

Back to the commit function now. It doesn't really only commit our transaction, but executes it as a whole. Let's rename it to exec. It needs an instance of EntityManager to get the job done. Let's inject it as implicit parameter. Might get life easier. I omit implementation here, it is quite trivial and not in functional style. Yes, all side-effects happen there, so no need use functional style. At least, I cannot think of anything functionally-stylish and useful at the moment.

Well, it seems like we are done with our Transactional for now. If you ask me if we've build a real Monad the answer will be "not yet". We need to factor EntityManager out as a type parameter and add nonTransactional constructor that accepts 1 parameter of any type.

Notice that we didn't used any dependency injection, interceptors, ThreadLocals, there is no place for it anymore. In return we've got a way to combine our transactional code, distill pure functions from it and still be expressive and practical.