Serialisation formats

While all of us understand the need of serialisation to transfer data across network devices - it often becomes difficult to make the right choice of the serialisation framework for the project. Out of many such serialisation formats - 2 which are m0st frequently used are Protocol Buffers and Avro. In this blog, I’ll summarise both of them and also provide my personal prefernce / opinion based on experience.

Protocol Buffers (often called Protobuf)

This one is from Google and has really become popular in recent times. Here is very short quickstart guide or summary :

syntax = "proto3";
package model;

option java_package = "com.experiment.protobuf.model";
option java_outer_classname = "StudentProto";

message Student {
  int32 student_id = 1;
  string student_name = 2;
}
StudentProto.Student student1
    =  StudentProto.Student.newBuilder()
        .setStudentId(1)
        .setStudentName("Soumik")
        .build() ;
FileOutputStream output = new FileOutputStream("test.txt");
student1.writeTo(output);

byte[] bytes = Files.readAllBytes(Paths.get("test.txt"));
StudentProto.Student student2 = StudentProto.Student.parseFrom(bytes);

Avro

Avro was originally created in context of Hadoop (which is the Big data framework from DOg Cutting). Again here is a short summary:

{"namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}
User user1 = new User();
    user1.setName("Alyssa");
    user1.setFavoriteNumber(256);

DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class);
DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter);
dataFileWriter.create(user1.getSchema(), new File(fileNameToStoreSerializedData));
dataFileWriter.append(user1);
dataFileWriter.close();