Skip to content
protobuf.kmcd.dev

Basics

Most teams do not hand-write serializers. They generate code from .proto files. The generated code provides typed message constructors, binary serialization, JSON mapping, and service bindings depending on the plugin.

proto/demo/v1/user.proto
syntax = "proto3";

package demo.v1;

message User {
  string id = 1;
  string username = 2;
  bool is_active = 3;
}

From Contract to Runtime API

This schema defines a User message with three fields. Each field has a type, a generated-code name, and a stable field number used by the binary format.

Once code is generated from this schema, you can:

  • Instantiate: Create User objects in your language with type checking and editor support.
  • Serialize: Convert objects into compact binary buffers for transmission or storage.
  • Validate: Apply schema and business rules before data reaches application logic.

Option 1: The Classic protoc

The protoc compiler is the original tool for Protobuf. It requires manual management of plugins and CLI flags. Using --es_out assumes that a binary named protoc-gen-es is available on your system's PATH.

INSTALL PLUGIN
$ npm install --save-dev @bufbuild/protoc-gen-es
GENERATE CODE
$ protoc --es_out=src/gen --es_opt=target=ts proto/demo/v1/user.proto

Option 2: The Modern buf

buf keeps generation declarative with a buf.gen.yaml file, making the workflow reproducible and easier to share across a team.

buf.gen.yaml
version: v2
plugins:
  - remote: buf.build/bufbuild/es:v2.12.0
    out: src/gen
    opt: target=ts
GENERATE CODE
$ buf generate

Using the Generated Code

With code generated by protobuf-es, the schema becomes a native TypeScript API.

src/main.ts
import { create, toBinary, toJsonString } from "@bufbuild/protobuf";
import { UserSchema } from "./gen/demo/v1/user_pb";

const user = create(UserSchema, {
  id: "usr_123",
  username: "cyber_ninja",
  isActive: true,
});

const bytes = toBinary(UserSchema, user);
const json = toJsonString(UserSchema, user);
Different languages and runtimes

The same schema-first workflow applies across supported languages, but import paths, package names, generated types, and runtime APIs differ by ecosystem.

Messages are the primary logical structure in Protobuf. They act as containers for your data, analogous to a struct in C/Rust or a class in Java/TypeScript.

Think of a message as a strictly-enforced contract. Once defined in a schema, the Protobuf compiler ensures that every system interacting with this data, regardless of the programming language, agrees on its structure.

One of the best features of Protobuf is that it's designed to be evolvable. You can add new fields to messages without breaking existing code, which lets servers and clients upgrade at their own pace. This is a big deal for a binary format. Many other formats don't support this level of compatibility out of the box.

SCHEMA_DEFINITION
// A simple message definition
message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 results_per_page = 3;
}

Every field in a message requires a specific type (e.g., string, int32, bool) and a name.

Since Protobuf is strictly typed, it catches many of the data-type errors that would otherwise only show up at runtime with formats like JSON. If a client expects an integer, they will never accidentally receive a string.

While names are used in your code for readability, they are mostly ignored on the wire. This allows you to rename fields in your schema without breaking binary compatibility (though it may break JSON consumers).

FIELD_DEFINITIONS
message User {
  string username = 1;
  bool is_active = 2;
  uint32 login_count = 3;
}

Field numbers are the most critical part of a Protobuf message. Instead of sending long string names (like "username") over the wire, Protobuf only sends this integer ID.

This "Tag" is what makes Protobuf so compact. Because these numbers identify fields, they must never be changed once a message type is in use. Reusing a number for a different field will cause catastrophic data corruption.

Optimization Tip: Numbers 1 through 15 take 1 byte to encode (including the field number and wire type). Numbers 16 through 2047 take 2 bytes. Use 1-15 for your most frequently sent fields!

WIRE_IDENTITY
message User {
  // "1" is the ID on the wire
  string id = 1;
  
  // Small numbers (1-15) take 1 byte to encode
  string name = 2;
}

Enums allow you to define a restricted set of named constants. This is crucial for states, roles, or configurations.

In proto3, the first constant must always map to zero. This serves as the default value when the field is not explicitly set in the binary payload.

Naming Convention

To avoid name collisions in languages like C++ or Go (where enum values are often in the parent scope), it is a best practice to prefix values with the enum name.

Open Enums: Modern Protobuf implementations support "open" enums, meaning if a server sends a value that a client doesn't recognize, the client will still preserve that value instead of crashing.

ENUM_DEFINITIONS
enum Status {
  // Prefixing avoids collisions
  STATUS_UNSPECIFIED = 0;
  STATUS_ACTIVE = 1;
  STATUS_DEFERRED = 2;
}

message User {
  Status current_status = 1;
}

As your project grows, you'll likely have many messages with similar names. Protobuf uses package declarations to prevent name clashes.

These packages often map directly to namespaces in C++, packages in Go/Java, or modules in TypeScript. They are essential for organizing large-scale schemas and ensuring that an Account in the billing service doesn't conflict with an Account in the identity service.

PACKAGE_DECLARATION
syntax = "proto3";

// Defines the namespace
package demo.identity.v1;

message Account {
  string id = 1;
}

Protobuf supports complex, hierarchical data structures. You can define messages within other messages, or use previously defined messages as field types.

This Composition allows you to build highly reusable domains of data models. For example, a Location message can be used across User, Event, and Office messages.

On the wire, embedded messages are "length-delimited", allowing decoders to skip the entire sub-message if they don't have the schema for it.

COMPOSITIONAL_SCHEMA
message Result {
  string url = 1;
  string title = 2;
}

message SearchResponse {
  // Result is embedded here
  Result top_result = 1;
}

To represent an array or list of items, use the repeated keyword. These fields can contain zero or more elements of the specified type.

In modern Protobuf, repeated scalar numeric fields (like int32, float, etc.) are "packed" by default. Instead of repeating the field tag for every element, they are stored as one single block with a length prefix. This is significantly more efficient for large arrays.

COLLECTION_SYNTAX
message SearchResponse {
  // A list of strings
  repeated string related_queries = 1;
  
  // A list of messages
  repeated Result results = 2;
}

Protobuf provides native support for associative maps (dictionaries). However, there are strict rules for map keys and values:

  • Keys: Can be any integral or string type. Messages, enums, floats, and bytes cannot be keys.
  • Values: Can be any type, including another message, but cannot be another map or a repeated field.

Behind the scenes, maps are actually just repeated messages with key and value fields, ensuring backward compatibility with older decoders.

MAP_STRUCTURE
message Project {
  string name = 1;
  
  // A dictionary of string keys to string values
  map<string, string> labels = 2;
}

If you have a message with multiple fields where only one can be set at a time, you can enforce this behavior and save memory using the oneof keyword.

Setting any field within the oneof automatically clears all other fields in that same oneof. This is Protobuf's equivalent to a tagged union or variant.

This is perfect for polymorphism, such as an Event that could be a ClickEvent, HoverEvent, or ScrollEvent.

MUTUAL_EXCLUSION
message ErrorStatus {
  string message = 1;
  
  oneof details {
    string stack_trace = 2;
    int32 error_code = 3;
  }
}

Protobuf's type system is designed for both strictly-enforced contracts and maximum binary efficiency. Basic Types (scalars) map directly to standard primitives in your programming language.

Well-Known Types (WKTs) are specialized schemas standardized by Google. They are assumed to be known by all Protobuf compilers and have specialized JSON mappings to ensure clean, idiomatic integration with web APIs.

Numeric Types

int32 / int64
Signed integers. Uses variable-length encoding (varint).
uint32 / uint64
Unsigned integers. Efficient for positive-only values.
sint32 / sint64
Signed integers. More efficient for negative numbers via ZigZag.
fixed32 / fixed64
Always 4/8 bytes. Efficient for large constants (> 2^28).
float / double
32-bit and 64-bit IEEE 754 floating point numbers.

Object Types

string
Always UTF-8 encoded text. Limited to 2GB.
bytes
Raw byte sequences. Perfect for arbitrary binary data.
bool
Encoded as a varint 0 or 1.
enum
Predefined set of named integers. Defaults to 0.

Well-Known Types

Used to indicate an API takes no parameters or returns nothing.
A point in time, independent of timezone. Maps to RFC 3339 in JSON.
A span of time. Maps to a string ending in 's' in JSON (e.g. '1.5s').
Represents a dynamically typed value, equivalent to any JSON type.
Maps directly to a free-form JSON object.

There are more well-known types in the google.protobuf reference.

Guidelines for Integers

Choosing the right integer type is important for both message size and language compatibility; here are some general guidelines.

DEFAULT_CHOICE
int32 / int64

Use for typical signed integers. int32 covers most use cases; use int64 for large IDs or timestamps.

Best for:General Data
NON_NEGATIVE
uint32 / uint64

Ideal when you know values will never be negative. Slightly more efficient than int for large positive values.

Best for:Counts & Sizes
SIGNED_ZIGZAG
sint32 / sint64

Crucial when values can be negative. Uses ZigZag encoding to keep small negative numbers compact (1–2 bytes), unlike int32/int64 which require 10 bytes for any negative value.

Best for:Negative Values
FIXED_PRECISION
fixed32 / fixed64

Always uses 4 or 8 bytes. More efficient than varints ONLY if values are consistently greater than 228.

Best for:Large Constants

ProtoJSON Mapping

While Protobuf is primarily binary, it defines a canonical ProtoJSON mapping. This ensures that every binary payload has a deterministic representation in JSON.

JSON_MAPPING_RULES
Protobuf to JSON Type Mapping Rules
Protobuf TypemessageJSON TypeObjectExample{"userName": "hiro"}NoteSerialized as a JSON object. Field names are mapped to lowerCamelCase by default, or the json_name option if set.
Protobuf TyperepeatedJSON TypeArrayExample["a", "b"]NoteSerialized as a JSON array.
Protobuf Typemap<K, V>JSON TypeObjectExample{"k": "v"}NoteSerialized as a JSON object.
Protobuf Typeint32, float, doubleJSON TypeNumberExample123.45NoteStandard JSON numbers.
Protobuf TypeboolJSON TypeBooleanExampletrueNoteStandard JSON booleans.
Protobuf Typeint64, uint64JSON TypeStringExample"9007199254740993"NoteStrings prevent precision loss in JS.
Protobuf TypeenumJSON TypeStringExample"ROLE_ADMIN"NoteUses the string name of the enum value.
Protobuf TypebytesJSON TypeStringExample"NDI="NoteBase64 encoded string.
Protobuf Typegoogle.protobuf.TimestampJSON TypeStringExample"2023-10-01T12:00:00Z"NoteRFC 3339 formatted timestamp string.
Protobuf Typegoogle.protobuf.DurationJSON TypeStringExample"1.000340012s"NoteSeconds with up to 9 fractional digits.
Protobuf Typegoogle.protobuf.FieldMaskJSON TypeStringExample"f.a,f.b"NoteComma-separated paths as a single string.
Protobuf Typegoogle.protobuf.StructJSON TypeObjectExample{"foo": "bar"}NoteStandard representation for a generic JSON object.
Protobuf Typegoogle.protobuf.ValueJSON TypeAnyExample"foo" or 123NoteCan be any valid JSON value (null, number, string, boolean, struct, or list).
Protobuf Typegoogle.protobuf.NullValueJSON TypenullExamplenullNoteThe JSON null value.
Protobuf Typegoogle.protobuf.EmptyJSON TypeObjectExample{}NoteAn empty JSON object.

64-bit Precision

JavaScript numbers are 64-bit floats, which lose precision for integers above 253 - 1.

To prevent data loss, 64-bit integer types (int64, fixed64, uint64, sint64, and sfixed64) are encoded as strings in JSON.