Build

Schemas are the heart of Protobuf. A .proto file defines the shape of your data and assigns each field a stable numeric identity. The names make the schema readable to humans. The numbers are what make the binary format compact and compatible over time.

Generating Code
The contract-first workflow using protoc and buf.
Messages
The primary containers for Protobuf data.
Fields
Strongly-typed data points with unique identifiers.
Field Numbers
The critical IDs used for compact binary encoding.
Enums
Defining a restricted set of named constants.
Packages
Using namespaces to prevent naming collisions.
Composition
Building complex models through nesting and imports.
Collections
Handling arrays and lists of data efficiently.
Maps
Using dictionaries for key-value pairs.
Oneof
Polymorphic fields for mutually exclusive data.
Type Reference
Reference for all scalar and well-known types.

Generating Code

Most teams do not hand-write serializers. They generate code from .proto files. The generated code provides typed message constructors, binary serialization, JSON mapping, and service bindings depending on the plugin.

proto/demo/v1/user.proto

syntax = "proto3";

package demo.v1;

message User {
  string id = 1;
  string username = 2;
  bool is_active = 3;
}

From Contract to Runtime API

This schema defines a User message with three fields. Each field has a type, a generated-code name, and a stable field number used by the binary format.

Once code is generated from this schema, you can:

Instantiate: Create User objects in your language with type checking and editor support.
Serialize: Convert objects into compact binary buffers for transmission or storage.
Validate: Apply schema and business rules before data reaches application logic.

Option 1: The Classic protoc

The protoc compiler is the original tool for Protobuf. It requires manual management of plugins and CLI flags. Using --es_out assumes that a binary named protoc-gen-es is available on your system's PATH.

INSTALL PLUGIN

$ npm install --save-dev @bufbuild/protoc-gen-es

GENERATE CODE

$ protoc --es_out=src/gen --es_opt=target=ts proto/demo/v1/user.proto

Option 2: The Modern buf

buf keeps generation declarative with a buf.gen.yaml file, making the workflow reproducible and easier to share across a team.

buf.gen.yaml

version: v2
plugins:
  - remote: buf.build/bufbuild/es:v2.12.0
    out: src/gen
    opt: target=ts

GENERATE CODE

$ buf generate

Using the Generated Code

With code generated by protobuf-es, the schema becomes a native TypeScript API.

src/main.ts

import { create, toBinary, toJsonString } from "@bufbuild/protobuf";
import { UserSchema } from "./gen/demo/v1/user_pb";

const user = create(UserSchema, {
  id: "usr_123",
  username: "cyber_ninja",
  isActive: true,
});

const bytes = toBinary(UserSchema, user);
const json = toJsonString(UserSchema, user);

Different languages and runtimes

The same schema-first workflow applies across supported languages, but import paths, package names, generated types, and runtime APIs differ by ecosystem.

Official Getting Started Tutorials:

Go Tutorial Python Tutorial Java Tutorial C++ Tutorial C# Tutorial Kotlin Tutorial Dart Tutorial Rust (prost)

Messages

Messages are the primary logical structure in Protobuf. They act as containers for your data, analogous to a struct in C/Rust or a class in Java/TypeScript.

Think of a message as a strictly-enforced contract. Once defined in a schema, the Protobuf compiler ensures that every system interacting with this data, regardless of the programming language, agrees on its structure.

One of the best features of Protobuf is that it's designed to be evolvable. You can add new fields to messages without breaking existing code, which lets servers and clients upgrade at their own pace. This is a big deal for a binary format. Many other formats don't support this level of compatibility out of the box.

SCHEMA_DEFINITION

// A simple message definition
message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 results_per_page = 3;
}

Fields

Every field in a message requires a specific type (e.g., string, int32, bool) and a name.

Since Protobuf is strictly typed, it catches many of the data-type errors that would otherwise only show up at runtime with formats like JSON. If a client expects an integer, they will never accidentally receive a string.

While names are used in your code for readability, they are mostly ignored on the wire. This allows you to rename fields in your schema without breaking binary compatibility (though it may break JSON consumers).

FIELD_DEFINITIONS

message User {
  string username = 1;
  bool is_active = 2;
  uint32 login_count = 3;
}

Field Numbers

Field numbers are the most critical part of a Protobuf message. Instead of sending long string names (like "username") over the wire, Protobuf only sends this integer ID.

This "Tag" is what makes Protobuf so compact. Because these numbers identify fields, they must never be changed once a message type is in use. Reusing a number for a different field will cause catastrophic data corruption.

Optimization Tip: Numbers 1 through 15 take 1 byte to encode (including the field number and wire type). Numbers 16 through 2047 take 2 bytes. Use 1-15 for your most frequently sent fields!

WIRE_IDENTITY

message User {
  // "1" is the ID on the wire
  string id = 1;
  
  // Small numbers (1-15) take 1 byte to encode
  string name = 2;
}

Enums

Enums allow you to define a restricted set of named constants. This is crucial for states, roles, or configurations.

In proto3, the first constant must always map to zero. This serves as the default value when the field is not explicitly set in the binary payload.

Naming Convention

To avoid name collisions in languages like C++ or Go (where enum values are often in the parent scope), it is a best practice to prefix values with the enum name.

Open Enums: Modern Protobuf implementations support "open" enums, meaning if a server sends a value that a client doesn't recognize, the client will still preserve that value instead of crashing.

ENUM_DEFINITIONS

enum Status {
  // Prefixing avoids collisions
  STATUS_UNSPECIFIED = 0;
  STATUS_ACTIVE = 1;
  STATUS_DEFERRED = 2;
}

message User {
  Status current_status = 1;
}

Packages

As your project grows, you'll likely have many messages with similar names. Protobuf uses package declarations to prevent name clashes.

These packages often map directly to namespaces in C++, packages in Go/Java, or modules in TypeScript. They are essential for organizing large-scale schemas and ensuring that an Account in the billing service doesn't conflict with an Account in the identity service.

PACKAGE_DECLARATION

syntax = "proto3";

// Defines the namespace
package demo.identity.v1;

message Account {
  string id = 1;
}

Composition

Protobuf supports complex, hierarchical data structures. You can define messages within other messages, or use previously defined messages as field types.

This Composition allows you to build highly reusable domains of data models. For example, a Location message can be used across User, Event, and Office messages.

On the wire, embedded messages are "length-delimited", allowing decoders to skip the entire sub-message if they don't have the schema for it.

COMPOSITIONAL_SCHEMA

message Result {
  string url = 1;
  string title = 2;
}

message SearchResponse {
  // Result is embedded here
  Result top_result = 1;
}

Collections

To represent an array or list of items, use the repeated keyword. These fields can contain zero or more elements of the specified type.

In modern Protobuf, repeated scalar numeric fields (like int32, float, etc.) are "packed" by default. Instead of repeating the field tag for every element, they are stored as one single block with a length prefix. This is significantly more efficient for large arrays.

COLLECTION_SYNTAX

message SearchResponse {
  // A list of strings
  repeated string related_queries = 1;
  
  // A list of messages
  repeated Result results = 2;
}

Maps

Protobuf provides native support for associative maps (dictionaries). However, there are strict rules for map keys and values:

Keys: Can be any integral or string type. Messages, enums, floats, and bytes cannot be keys.
Values: Can be any type, including another message, but cannot be another map or a repeated field.

Behind the scenes, maps are actually just repeated messages with key and value fields, ensuring backward compatibility with older decoders.

MAP_STRUCTURE

message Project {
  string name = 1;
  
  // A dictionary of string keys to string values
  map<string, string> labels = 2;
}

Oneof

If you have a message with multiple fields where only one can be set at a time, you can enforce this behavior and save memory using the oneof keyword.

Setting any field within the oneof automatically clears all other fields in that same oneof. This is Protobuf's equivalent to a tagged union or variant.

This is perfect for polymorphism, such as an Event that could be a ClickEvent, HoverEvent, or ScrollEvent.

MUTUAL_EXCLUSION

message ErrorStatus {
  string message = 1;
  
  oneof details {
    string stack_trace = 2;
    int32 error_code = 3;
  }
}

The Type System

Protobuf's type system is designed for both strictly-enforced contracts and maximum binary efficiency. Basic Types (scalars) map directly to standard primitives in your programming language.

Well-Known Types (WKTs) are specialized schemas standardized by Google. They are assumed to be known by all Protobuf compilers and have specialized JSON mappings to ensure clean, idiomatic integration with web APIs.

Numeric Types

int32 / int64

Signed integers. Uses variable-length encoding (varint).

uint32 / uint64

Unsigned integers. Efficient for positive-only values.

sint32 / sint64

Signed integers. More efficient for negative numbers via ZigZag.

fixed32 / fixed64

Always 4/8 bytes. Efficient for large constants (> 2^28).

float / double

32-bit and 64-bit IEEE 754 floating point numbers.

Object Types

string

Always UTF-8 encoded text. Limited to 2GB.

bytes

Raw byte sequences. Perfect for arbitrary binary data.

bool

Encoded as a varint 0 or 1.

enum

Predefined set of named integers. Defaults to 0.

Well-Known Types

google.protobuf.Empty

Used to indicate an API takes no parameters or returns nothing.

google.protobuf.Timestamp

A point in time, independent of timezone. Maps to RFC 3339 in JSON.

google.protobuf.Duration

A span of time. Maps to a string ending in 's' in JSON (e.g. '1.5s').

google.protobuf.Value

Represents a dynamically typed value, equivalent to any JSON type.

google.protobuf.Struct

Maps directly to a free-form JSON object.

There are more well-known types in the google.protobuf reference.

Guidelines for Integers

Choosing the right integer type is important for both message size and language compatibility; here are some general guidelines.

DEFAULT_CHOICE

int32 / int64

Use for typical signed integers. int32 covers most use cases; use int64 for large IDs or timestamps.

Best for:General Data

NON_NEGATIVE

uint32 / uint64

Ideal when you know values will never be negative. Slightly more efficient than int for large positive values.

Best for:Counts & Sizes

SIGNED_ZIGZAG

sint32 / sint64

Crucial when values can be negative. Uses ZigZag encoding to keep small negative numbers compact (1–2 bytes), unlike int32/int64 which require 10 bytes for any negative value.

Best for:Negative Values

FIXED_PRECISION

fixed32 / fixed64

Always uses 4 or 8 bytes. More efficient than varints ONLY if values are consistently greater than 2²⁸.

Best for:Large Constants

ProtoJSON Mapping

While Protobuf is primarily binary, it defines a canonical ProtoJSON mapping. This ensures that every binary payload has a deterministic representation in JSON.

JSON_MAPPING_RULES

Protobuf to JSON Type Mapping Rules
Protobuf Type	JSON Type(s)	JSON Value Example	Notes
Protobuf Type`message`	JSON TypeObject	Example`{"userName": "hiro"}`	NoteSerialized as a JSON object. Field names are mapped to lowerCamelCase by default, or the json_name option if set.
Protobuf Type`repeated`	JSON TypeArray	Example`["a", "b"]`	NoteSerialized as a JSON array.
Protobuf Type`map<K, V>`	JSON TypeObject	Example`{"k": "v"}`	NoteSerialized as a JSON object.
Protobuf Type`int32, float, double`	JSON TypeNumber	Example`123.45`	NoteStandard JSON numbers.
Protobuf Type`bool`	JSON TypeBoolean	Example`true`	NoteStandard JSON booleans.
Protobuf Type`int64, uint64`	JSON TypeString	Example`"9007199254740993"`	NoteStrings prevent precision loss in JS.
Protobuf Type`enum`	JSON TypeString	Example`"ROLE_ADMIN"`	NoteUses the string name of the enum value.
Protobuf Type`bytes`	JSON TypeString	Example`"NDI="`	NoteBase64 encoded string.
Protobuf Type`google.protobuf.Timestamp`	JSON TypeString	Example`"2023-10-01T12:00:00Z"`	NoteRFC 3339 formatted timestamp string.
Protobuf Type`google.protobuf.Duration`	JSON TypeString	Example`"1.000340012s"`	NoteSeconds with up to 9 fractional digits.
Protobuf Type`google.protobuf.FieldMask`	JSON TypeString	Example`"f.a,f.b"`	NoteComma-separated paths as a single string.
Protobuf Type`google.protobuf.Struct`	JSON TypeObject	Example`{"foo": "bar"}`	NoteStandard representation for a generic JSON object.
Protobuf Type`google.protobuf.Value`	JSON TypeAny	Example`"foo" or 123`	NoteCan be any valid JSON value (null, number, string, boolean, struct, or list).
Protobuf Type`google.protobuf.NullValue`	JSON Typenull	Example`null`	NoteThe JSON null value.
Protobuf Type`google.protobuf.Empty`	JSON TypeObject	Example`{}`	NoteAn empty JSON object.