【Protobuf】Protocol Buffers 入门

Posted by 西维蜀黍 on 2020-01-29, Last Modified on 2021-09-21

Protocol Buffers (Protobuf)

Protocol Buffers (Protobuf) is a free and open source cross-platform library used to serialize structured data. It is useful in developing programs to communicate with each other over a network or for storing data.

The method involves an interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data.

Install

1 安装 protoc

https://github.com/protocolbuffers/protobuf/releases 下载。

bin目录下的protoc.exe拷贝到GOPATH/bin目录下,将include/目录下的google文件夹拷贝GOPATH/src目录下(只有使用protobuf的一些内置结构才需要用到该文件夹内的文件,这次并不会用到这个文件夹)。

2 安装proto-gen-go

安装protobuf的编译器插件protoc-gen-go

protoc程序会调用protoc-gen-go,将.proto文件生成golang代码。可以使用go get命令安装:

$ go get -u github.com/golang/protobuf/proto
$ go install google.golang.org/protobuf/cmd/protoc-gen-go

安装成功后,会在GOPATH/bin下生成protoc-gen-go程序。

The compiler plugin protoc-gen-go will be installed in $GOBIN, defaulting to $GOPATH/bin. It must be in your $PATH for the protocol compiler protoc to find it.

# 然后将环境变量GOPATH定义的目录下的bin目录加入到环境变量PATH中。
$ vi  ~/.bashrc
# 然后在该文件最后加上:export PATH="$PATH:$GOPATH/bin"即可。
$ source ~/.bashrc

Usage

1 定义一个proto文件test.proto

proto2

package example;

enum FOO {
    X = 17;
};

message Test {
    required string label = 1;
    optional int32 type = 2 [default = 77];
    repeated int64 reps = 3;
    optional group OptionalGroup = 4 {
        required string RequiredField = 5;
    }
}

proto3

2 编译 .proto 文件(生成go文件)

$ protoc --go_out=. *.proto

执行 protoc,并使用 --go_out 选项指定输出目录,即可生成 Go 源码文件。因为安装了 protoc-gen-go 之后,--go_out 选项会自动搜索 protoc-gen-go,只要其在 PATH 目录中可以找到即可。

如果这个命令执行过程中出现 protoc-gen-go: program not found or is not executable 这个错误,表示该protoc-gen-go没有被加入到Path环境变量中,应该把该文件的所在目录加入Path变量中。该文件存放在环境变量GOPATH目录下的bin子目录里。

--go_out 支持以下参数

  • plugins=plugin1+plugin2 指定插件,目前只支持 grpc,即:plugins=grpc
  • M 参数 指定导入的.proto文件路径编译后对应的golang包名(不指定本参数默认就是.proto文件中import语句的路径)
  • import_prefix=xxx 为所有 import 路径添加前缀,主要用于编译子目录内的多个 proto 文件,这个参数按理说很有用,尤其适用替代一些情况时的 M 参数。
  • import_path=foo/bar 用于指定未声明 package 或 go_package 的文件的包名,最右面的斜线前的字符会被忽略

https://developers.google.com/protocol-buffers/docs/overview#generating

Protobuf 消息定义

消息由至少一个字段组合而成,类似于C语言中的结构。每个字段都有一定的格式。

字段格式:限定修饰符 | 数据类型 | 字段名称 | = | 字段编码值 | [字段默认值]

Let’s say you want to define a search request message format, where each search request has a query string, the particular page of results you are interested in, and a number of results per page. Here’s the .proto file you use to define the message type.

message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2;
  optional int32 result_per_page = 3;
}

The SearchRequest message definition specifies three fields (name/value pairs), one for each piece of data that you want to include in this type of message. Each field has a name and a type.

限定修饰符(Field Rules)

You specify that message fields are one of the following:

  • required: a well-formed message must have exactly one of this field.
  • optional: a well-formed message can have zero or one of this field (but not more than one).
  • repeated: this field can be repeated any number of times (including zero) in a well-formed message. The order of the repeated values will be preserved.

Required

Required表示是一个必须字段,必须相对于发送方,在发送消息之前必须设置该字段的值,对于接收方,必须能够识别该字段的意思。发送之前没有设置required字段或者无法识别required字段都会引发编解码异常,导致消息被丢弃。

Required Is Forever You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal.

Optional

Optional 表示是一个可选字段,可选对于发送方,在发送消息时,可以有选择性的设置或者不设置该字段的值。对于接收方,如果能够识别可选字段就进行相应的处理,如果无法识别,则忽略该字段,消息中的其它字段正常处理。—因为optional字段的特性,很多接口在升级版本中都把后来添加的字段都统一的设置为optional字段,这样老的版本无需升级程序也可以正常的与新的软件进行通信,只不过新的字段无法识别而已,因为并不是每个节点都需要新的功能,因此可以做到按需升级和平滑过渡。

Repeated

Repeated 表示该字段可以包含0~N个元素。其特性和optional一样,但是每一次可以包含多个值。可以看作是在传递一个数组的值。

message Result {
  repeated string snippets = 3;
}

Data Types - 标量(scular)

.proto Type Notes C++ Type Java Type Python Type[2] Go Type
double double double float *float64
float float float float *float32
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int *int32
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long[3] *int64
uint32 Uses variable-length encoding. uint32 int[1] int/long[3] *uint32
uint64 Uses variable-length encoding. uint64 long[1] int/long[3] *uint64
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int *int32
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long[3] *int64
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 228. uint32 int[1] int/long[3] *uint32
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 256. uint64 long[1] int/long[3] *uint64
sfixed32 Always four bytes. int32 int int *int32
sfixed64 Always eight bytes. int64 long int/long[3] *int64
bool bool boolean bool *bool
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String unicode (Python 2) or str (Python 3) *string
bytes May contain any arbitrary sequence of bytes. string ByteString bytes []byte

enum

message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2;
  optional int32 result_per_page = 3 [default = 10];
  enum Corpus {
    UNIVERSAL = 0;
    WEB = 1;
    IMAGES = 2;
    LOCAL = 3;
    NEWS = 4;
    PRODUCTS = 5;
    VIDEO = 6;
  }
  optional Corpus corpus = 4 [default = UNIVERSAL];
}

See https://developers.google.com/protocol-buffers/docs/proto#enum for more details.

Message Types

You can use other message types as field types. For example, let’s say you wanted to include Result messages in each SearchResponse message – to do this, you can define a Result message type in the same .proto and then specify a field of type Result in SearchResponse:

message SearchResponse {
  repeated Result result = 1;
}

message Result {
  required string url = 1;
  optional string title = 2;
  repeated string snippets = 3;
}

Maps

If you want to create an associative map as part of your data definition, protocol buffers provides a handy shortcut syntax:

map<key_type, value_type> map_field = N;

…where the key_type can be any integral or string type (so, any scalar type except for floating point types and bytes). Note that enum is not a valid key_type. The value_type can be any type except another map.

So, for example, if you wanted to create a map of projects where each Project message is associated with a string key, you could define it like this:

map<string, Project> projects = 3;

The generated map API is currently available for all proto2 supported languages. You can find out more about the map API for your chosen language in the relevant API reference.

Maps Features

  • Extensions are not supported for maps.
  • Maps cannot be repeated, optional, or required.
  • Wire format ordering and map iteration ordering of map values is undefined, so you cannot rely on your map items being in a particular order.
  • When generating text format for a .proto, maps are sorted by key. Numeric keys are sorted numerically.
  • When parsing from the wire or when merging, if there are duplicate map keys the last key seen is used. When parsing a map from text format, parsing may fail if there are duplicate keys.

Comments

To add comments to your .proto files, use C/C++-style // and /* ... */ syntax.

/* SearchRequest represents a search query, with pagination options to
 * indicate which results to include in the response. */

message SearchRequest {
  required string query = 1;
  optional int32 page_number = 2;  // Which page number do we want?
  optional int32 result_per_page = 3;  // Number of results to return per page.
}

Default Values

As mentioned above, elements in a message description can be labeled optional. A well-formed message may or may not contain an optional element. When a message is parsed, if it does not contain an optional element, the corresponding field in the parsed object is set to the default value for that field. The default value can be specified as part of the message description. For example, let’s say you want to provide a default value of 10 for a SearchRequest’s result_per_page value.

optional int32 result_per_page = 3 [default = 10];

If the default value is not specified for an optional element, a type-specific default value is used instead: for strings, the default value is the empty string. For bytes, the default value is the empty byte string. For bools, the default value is false. For numeric types, the default value is zero. For enums, the default value is the first value listed in the enum’s type definition. This means care must be taken when adding a value to the beginning of an enum value list. See the Updating A Message Type section for guidelines on how to safely change definitions.

Others

Updating A Message Type - Compatibility

If an existing message type no longer meets all your needs – for example, you’d like the message format to have an extra field – but you’d still like to use code created with the old format, don’t worry! It’s very simple to update message types without breaking any of your existing code. Just remember the following rules:

  • Don’t change the field numbers for any existing fields.
  • Any new fields that you add should be optional or repeated. This means that any messages serialized by code using your “old” message format can be parsed by your new generated code, as they won’t be missing any required elements. You should set up sensible default values for these elements so that new code can properly interact with messages generated by old code. Similarly, messages created by your new code can be parsed by your old code: old binaries simply ignore the new field when parsing. However, the unknown fields are not discarded, and if the message is later serialized, the unknown fields are serialized along with it – so if the message is passed on to new code, the new fields are still available.
  • Non-required fields can be removed, as long as the field number is not used again in your updated message type. You may want to rename the field instead, perhaps adding the prefix “OBSOLETE_”, or make the field number reserved, so that future users of your .proto can’t accidentally reuse the number.
  • A non-required field can be converted to an extension and vice versa, as long as the type and number stay the same.
  • int32, uint32, int64, uint64, and bool are all compatible – this means you can change a field from one of these types to another without breaking forwards- or backwards-compatibility. If a number is parsed from the wire which doesn’t fit in the corresponding type, you will get the same effect as if you had cast the number to that type in C++ (e.g. if a 64-bit number is read as an int32, it will be truncated to 32 bits).
  • sint32 and sint64 are compatible with each other but are not compatible with the other integer types.
  • string and bytes are compatible as long as the bytes are valid UTF-8.
  • Embedded messages are compatible with bytes if the bytes contain an encoded version of the message.
  • fixed32 is compatible with sfixed32, and fixed64 with sfixed64.
  • For string, bytes, and message fields, optional is compatible with repeated. Given serialized data of a repeated field as input, clients that expect this field to be optional will take the last input value if it’s a primitive type field or merge all input elements if it’s a message type field. Note that this is not generally safe for numeric types, including bools and enums. Repeated fields of numeric types can be serialized in the packed format, which will not be parsed correctly when an optional field is expected.
  • Changing a default value is generally OK, as long as you remember that default values are never sent over the wire. Thus, if a program receives a message in which a particular field isn’t set, the program will see the default value as it was defined in that program’s version of the protocol. It will NOT see the default value that was defined in the sender’s code.
  • enum is compatible with int32, uint32, int64, and uint64 in terms of wire format (note that values will be truncated if they don’t fit), but be aware that client code may treat them differently when the message is deserialized. Notably, unrecognized enum values are discarded when the message is deserialized, which makes the field’s has.. accessor return false and its getter return the first value listed in the enum definition, or the default value if one is specified. In the case of repeated enum fields, any unrecognized values are stripped out of the list. However, an integer field will always preserve its value. Because of this, you need to be very careful when upgrading an integer to an enum in terms of receiving out of bounds enum values on the wire.
  • In the current Java and C++ implementations, when unrecognized enum values are stripped out, they are stored along with other unknown fields. Note that this can result in strange behavior if this data is serialized and then reparsed by a client that recognizes these values. In the case of optional fields, even if a new value was written after the original message was deserialized, the old value will be still read by clients that recognize it. In the case of repeated fields, the old values will appear after any recognized and newly-added values, which means that order will not be preserved.
  • Changing a single optional value into a member of a new oneof is safe and binary compatible. Moving multiple optional fields into a new oneof may be safe if you are sure that no code sets more than one at a time. Moving any fields into an existing oneof is not safe.
  • Changing a field between a map<K, V> and the corresponding repeated message field is binary compatible (see Maps, below, for the message layout and other restrictions). However, the safety of the change is application-dependent: when deserializing and reserializing a message, clients using the repeated field definition will produce a semantically identical result; however, clients using the map field definition may reorder entries and drop entries with duplicate keys.

Namespaces

Importing Definitions

In the above example, the Result message type is defined in the same file as SearchResponse – what if the message type you want to use as a field type is already defined in another .proto file?

You can use definitions from other .proto files by importing them. To import another .proto’s definitions, you add an import statement to the top of your file:

import "myproject/other_protos.proto";

By default you can only use definitions from directly imported .proto files. However, sometimes you may need to move a .proto file to a new location. Instead of moving the .proto file directly and updating all the call sites in a single change, now you can put a dummy .proto file in the old location to forward all the imports to the new location using the import public notion. import public dependencies can be transitively relied upon by anyone importing the proto containing the import public statement. For example:

// new.proto
// All definitions are moved here
// old.proto
// This is the proto that all clients are importing.
import public "new.proto";
import "other.proto";
// client.proto
import "old.proto";
// You use definitions from old.proto and new.proto, but not other.proto

The protocol compiler searches for imported files in a set of directories specified on the protocol compiler command line using the -I/--proto_path flag. If no flag was given, it looks in the directory in which the compiler was invoked. In general you should set the --proto_path flag to the root of your project and use fully qualified names for all imports.

Packages

You can add an optional package specifier to a .proto file to prevent name clashes between protocol message types.

package foo.bar;
message Open { ... }

You can then use the package specifier when defining fields of your message type:

message Foo {
  ...
  required foo.bar.Open open = 1;
  ...
}

The way a package specifier affects the generated code depends on your chosen language:

  • In C++ the generated classes are wrapped inside a C++ namespace. For example, Open would be in the namespace foo::bar.
  • In Java, the package is used as the Java package, unless you explicitly provide a option java_package in your .proto file.
  • In Python, the package directive is ignored, since Python modules are organized according to their location in the file system.
  • In Go, the package directive is ignored, and the generated .pb.go file is in the package named after the corresponding go_proto_library rule.

Note that even when the package directive does not directly affect the generated code, for example in Python, it is still strongly recommended to specify the package for the .proto file, as otherwise it may lead to naming conflicts in descriptors and make the proto not portable for other languages.

Demo

test.proto

syntax = "proto3";  //指定版本,必须要写(proto3、proto2)  
package proto;

enum FOO 
{ 
    X = 0; 
};

//message是固定的。UserInfo是类名,可以随意指定,符合规范即可
message UserInfo{
    string message = 1;   //消息
    int32 length = 2;    //消息大小
    int32 cnt = 3;      //消息计数
}

client_protobuf.go

package main

import (
    "bufio"
    "fmt"
    "net"
    "os"
    	stProto "proto"
    "time"

    //protobuf编解码库,下面两个库是相互兼容的,可以使用其中任意一个
    "github.com/golang/protobuf/proto"
    //"github.com/gogo/protobuf/proto"
)

func main() {
    strIP := "localhost:6600"
    var conn net.Conn
    var err error

    //连接服务器
    for conn, err = net.Dial("tcp", strIP); err != nil; conn, err = net.Dial("tcp", strIP) {
        fmt.Println("connect", strIP, "fail")
        time.Sleep(time.Second)
        fmt.Println("reconnect...")
    }
    fmt.Println("connect", strIP, "success")
    defer conn.Close()

    //发送消息
    cnt := 0
    sender := bufio.NewScanner(os.Stdin)
    for sender.Scan() {
        cnt++
        stSend := &stProto.UserInfo{
            Message: sender.Text(),
            Length:  *proto.Int(len(sender.Text())),
            Cnt:     *proto.Int(cnt),
        }

        //protobuf编码
        pData, err := proto.Marshal(stSend)
        if err != nil {
            panic(err)
        }

        //发送
        conn.Write(pData)
        if sender.Text() == "stop" {
            return
        }
    }
}

server_protobuf.go

package main

import (
    "fmt"
    "net"
    "os"
    stProto "proto"

    //protobuf编解码库,下面两个库是相互兼容的,可以使用其中任意一个
    "github.com/golang/protobuf/proto"
    //"github.com/gogo/protobuf/proto"
)

func main() {
    //监听
    listener, err := net.Listen("tcp", "localhost:6600")
    if err != nil {
        panic(err)
    }

    for {
        conn, err := listener.Accept()
        if err != nil {
            panic(err)
        }
        fmt.Println("new connect", conn.RemoteAddr())
        go readMessage(conn)
    }
}

//接收消息
func readMessage(conn net.Conn) {
    defer conn.Close()
    buf := make([]byte, 4096, 4096)
    for {
        //读消息
        cnt, err := conn.Read(buf)
        if err != nil {
            panic(err)
        }

        stReceive := &stProto.UserInfo{}
        pData := buf[:cnt]

        //protobuf解码
        err = proto.Unmarshal(pData, stReceive)
        if err != nil {
            panic(err)
        }

        fmt.Println("receive", conn.RemoteAddr(), stReceive)
        if stReceive.Message == "stop" {
            os.Exit(1)
        }
    }
}

Reference