Using Protobuf Codecs

Weave defines simple Marshaller and Persistent interface standards. These interfaces are automatically implemented by any code autogenerated by protoc from protobuf files, and we recommend using .proto files to specify the serialization format for any persistent data in our application (internal state as well as transactions and messages). However, if you have never worked with protobuf, this might be a bit of a challenge, so we explain a simple workflow that we use in weave based projects.

Create Proto File

The first thing is to imagine the shape of your classes. These should be defined in proto3 syntax. There are a number of different int encodings, byte slices, strings, and nested structures. And fields may be repeated. So forget complex types with methods now and just focus on the actual data structure. The x/codec.proto file defines the Coin type rather simply, once you remove the comments, this is all that is left:

syntax = "proto3";

package x;

message Coin {
    int64 whole = 1;  // default: 0
    int64 fractional = 2;  // default: 0
    string ticker = 3;
    string issuer = 4;  // optional
}

Or the app/results.proto file, that defines an array of byte slices:

syntax = "proto3";

package app;

// ResultSet contains a list of keys or values
message ResultSet {
    repeated bytes results = 1;
}

Note that the package defined in the protobuf file must match the package name used by the Go language code in the same directory.

You can also import types from one proto file into another. Make sure to use the full github path in order that the generated go code has properly working imports. The package name above is also used as a namespace for the imported protobuf definitions. This is how x/cash creates a wallet that contains an array of tokens of different currencies.

syntax = "proto3";

package cash;

import "github.com/iov-one/weave/x/codec.proto";

// Set may contain Coin of many different currencies.
// It handles adding and subtracting sets of currencies.
message Set {
    repeated x.Coin coins = 1;
}

Compiling Proto Files

To compile protobuf files, you need to have the protoc binary installed and a language-specific translator (gogo-protobuf in this case). This can be a bit of a pain, especially the first time, so the default weave `Makefile < https://github.com/iov-one/weave/blob/master/Makefile>`_ contains some helpers for you.

  • make prototools will install all the needed tools, perform once to set up your machine
  • make protoc will compile all the _.proto_ files in the repo

If you are building a repo based on weave, you are invited to copy the bottom part of the Makefile and just copy the make prototools logic verbatim. Let’s take a look at the second phase, as this is the one you will have to modify when you add a new protobuf file, either to weave or to your own repo.

protoc:
    protoc --gogofaster_out=. app/*.proto
    protoc --gogofaster_out=. crypto/*.proto
    protoc --gogofaster_out=. orm/*.proto
    protoc --gogofaster_out=. x/*.proto
    protoc --gogofaster_out=. -I=. -I=$(GOPATH)/src x/cash/*.proto
    protoc --gogofaster_out=. -I=. -I=$(GOPATH)/src x/sigs/*.proto

First, you notice that we need the protoc executable that we installed with prototools. Next you notice the --gogofaster_out=. flag. This indicated that we should use protoc-gen-gogofaster to generate the code (we installed the driver in $GOBIN during the prototools step). Also, that the output file will be placed in the same directory as the input file. So app/results.proto produces app/results.pb.go.

The first few lines should make sense now, but what is with the -I=. -I=$(GOPATH)/src flags used in the last two lines? These _.proto_ files import other _.proto_ files and _protoc_ needs to know where to find them. Since we want the generated code to use absolute paths, we have to import them with their absolute path from the root of our _GOPATH_, thus: -I=$(GOPATH)/src. If you just add that one, it will fail with the following message, which can be resolved by adding -I=. as well:

x/sigs/codec.proto: File does not reside within any path specified using
--proto_path (or -I).  You must specify a --proto_path which encompasses
this file.  Note that the proto_path must be an exact prefix of the
.proto file names -- protoc is too dumb to figure out when two paths
(e.g. absolute and relative) are equivalent (it's harder than you think).

You are welcome to use other codecs than gogofaster, you can also try the standard Go language protobuf compiler. What this mode goes is auto-generate static code for serialization and deserialization of the type. It performs the introspection one time to generate efficient code allowing us to avoid the use of reflection at runtime and get ~10x speed ups in the serialization/deserialization. I like this, but this may vary based on your preference or aversion of auto-generated code.

Using Autogenerated Structs

The first time through the above process may appear tedious, but once you get the hang of it, you just have to add a few lines to a _.proto_ file and type make protoc. Et viola! You have a bunch of fresh *.pb.go files that provide efficient, portable serialization for your code.

But how do you use those structs? Taking Coin from x/codec.proto as an example, we see a x/codec.pb.go file with type Coin struct {...} that very closely mirrors the content of the codec.proto file, as well as a number of methods. There are some auto-generated getters, which can be useful to fulfill interfaces or to query field of _nil_ objects without panicking. And then there are some (very long) Marshal and Unmarshal methods. These are the meat of the matter. They fulfill the Persistent interface and let us write code like this:

orig := Coin{Whole: 123, Ticker: "CASH"}
bz, err := orig.Marshal()
parsed := Coin{}
err = parsed.Unmarshal(bz)

This is fine, but what happens when I want to add custom logic to my Coin struct, perhaps adding validation logic, or code to add two coins? Luckily for us, go allows you two write methods for your structs in any file in the same package. That means that we can just inherit the struct definition and all the serialization logic and just append the methods we care about. coin.go is a great example of extending the functionality, with code like:

func (c Coin) Add(o Coin) (Coin, error) {
    if !c.SameType(o) {
        err := ErrInvalidCurrency(c.Ticker, o.Ticker)
        return Coin{}, err
    }
    c.Whole += o.Whole
    c.Fractional += o.Fractional
    return c.normalize()
}

func (c Coin) Validate() error {
    if !IsCC(c.Ticker) {
        return ErrInvalidCurrency(c.Ticker)
    }
    if c.Whole < MinInt || c.Whole > MaxInt {
        return ErrOutOfRange(c)
    }
    if c.Fractional < MinFrac || c.Fractional > MaxFrac {
        return ErrOutOfRange(c)
    }
    // make sure signs match
    if c.Whole != 0 && c.Fractional != 0 &&
        ((c.Whole > 0) != (c.Fractional > 0)) {
        return ErrMismatchedSign(c)
    }

    return nil
}

This is a quite productive workflow and I recommend trying it out. You may find it doesn’t work for you and you can try other approaches, like copying the protobuf generated structs into some custom-writen structs you like and then copying back into protobuf structs for serialization. You can also try playing with special gogo-protobuf flags in your protobuf files to shape the autogenerated code into the exact shape you want.

Notes about oneof

oneof is a powerful feature to produce union/sum types in your protobuf structures. For example, you may have a public key which may be one of many different algorithms, and can define cases for each, which can be swtiched upon in runtime. We also use this for the transaction to enumerate a set of possible messages that can be embedded in the transaction. A transaction may have any one of them and serialize and deserialize properly. Type-safety is enforced in compile-time and we can switch on the kind on runtime, quite nice. (Example from bcp-demo):

oneof sum{
  cash.SendMsg send_msg = 1;
  namecoin.CreateTokenMsg new_token_msg = 2;
  namecoin.SetWalletNameMsg set_name_msg = 3;
  escrow.CreateMsg create_escrow_msg = 4;
  escrow.ReleaseMsg release_escrow_msg = 5;
  escrow.ReturnMsg return_escrow_msg = 6;
  escrow.UpdatePartiesMsg update_escrow_msg = 7;
}

The only problem is that the generated code is ugly to some people’s eyes. This lies in the fact that there is no clean way to express sum types in golang, and you have to force an interface with private methods in order to close the set of possible types. Although some people have been so revolted by this code that they prefered to write their own serialization library, I would suggest just taking the breath and getting to know it. Here are the relevant pieces:

type Tx struct {
    // msg is a sum type over all allowed messages on this chain.
    //
    // Types that are valid to be assigned to Sum:
    //  *Tx_SendMsg
    //  *Tx_CreateTokenMsg
    //  *Tx_SetNameMsg
    //  *Tx_CreateMsg
    //  *Tx_ReleaseMsg
    //  *Tx_ReturnMsg
    //  *Tx_UpdateEscrowMsg
    Sum isTx_Sum `protobuf_oneof:"sum"`
...
}

type isTx_Sum interface {
    isTx_Sum()
    MarshalTo([]byte) (int, error)
    Size() int
}

type Tx_SendMsg struct {
    SendMsg *cash.SendMsg `protobuf:"bytes,1,opt,name=send_msg,json=sendMsg,oneof"`
}
type Tx_CreateTokenMsg struct {
    CreateTokenMsg *namecoin.CreateTokenMsg `protobuf:"bytes,2,opt,name=new_token_msg,json=newTokenMsg,oneof"`
}

We now have some intermediate structs that give us a layer of indirection in order to enforce the fact we can now securely switch over all possible tx.Sum fields, with code like this:

sum := tx.GetSum()
switch t := sum.(type) {
case *Tx_SendMsg:
    return t.SendMsg, nil
case *Tx_SetNameMsg:
    return t.SetNameMsg, nil
case *Tx_CreateTokenMsg:
    return t.CreateTokenMsg, nil
case *Tx_CreateMsg:
    return t.CreateMsg, nil
case *Tx_ReleaseMsg:
    return t.ReleaseMsg, nil
case *Tx_ReturnMsg:
    return t.ReturnMsg, nil
case *Tx_UpdateEscrowMsg:
    return t.UpdateEscrowMsg, nil
}