Serialisation formats

While all of us understand the need of serialisation to transfer data across network devices - it often becomes difficult to make the right choice of the serialisation framework for the project. Out of many such serialisation formats - 2 which are m0st frequently used are Protocol Buffers and Avro. In this blog, I’ll summarise both of them and also provide my personal prefernce / opinion based on experience.

Protocol Buffers (often called Protobuf)

This one is from Google and has really become popular in recent times. Here is very short quickstart guide or summary :

Schema is defined in .proto files and then corresponding java files can be created using Proto compiler or maven plugin. Below is an example of a proto file which is the schema definition:

syntax = "proto3";
package model;

option java_package = "com.experiment.protobuf.model";
option java_outer_classname = "StudentProto";

message Student {
  int32 student_id = 1;
  string student_name = 2;
}

Each of the fields in the above schema can be required/optional or repeated
The identifier at the end of the field is the tag for the corresponding field, normally tags from 1-15 occupy lesser byte.
Some very popular uses of protobuf in industry :
- Used as an alternate to json as response format for Spring boot based REST endpoints - for lesser size and lesser response time
- Also used in gRPC - it is faster than spring boot
- Also used as an alternate to Avro in Kafka
Below is a sample java code to show serialisation using protobuf :

StudentProto.Student student1
    =  StudentProto.Student.newBuilder()
        .setStudentId(1)
        .setStudentName("Soumik")
        .build() ;
FileOutputStream output = new FileOutputStream("test.txt");
student1.writeTo(output);

byte[] bytes = Files.readAllBytes(Paths.get("test.txt"));
StudentProto.Student student2 = StudentProto.Student.parseFrom(bytes);

Protobuf is more aligned to multiple languages than Avro
Also making a change to the existing schema in Protobuf is easier

Avro

Avro was originally created in context of Hadoop (which is the Big data framework from DOg Cutting). Again here is a short summary:

The schema is usually a json file named .avsc
Usually a maven plugin is used to generate the model objects from the schema.
Below is an example of json schema :

{"namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}

Avro is the default serialisation format which was used in Kafka. Nowadays Kafka has also come up with support for protobuf. Below is a sample java code to achieve serialisation :

User user1 = new User();
    user1.setName("Alyssa");
    user1.setFavoriteNumber(256);

DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class);
DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter);
dataFileWriter.create(user1.getSchema(), new File(fileNameToStoreSerializedData));
dataFileWriter.append(user1);
dataFileWriter.close();

Serialisation formats

Protocol Buffers (often called Protobuf)

Avro

Leave a Comment