Schemas are the heart of Protobuf. A .proto file defines the shape of your data and assigns each field a stable numeric identity. The names make the schema readable to humans. The numbers are what make the binary format compact and compatible over time.
Generating Code
The contract-first workflow using protoc and buf.
Messages
The primary containers for Protobuf data.
Fields
Strongly-typed data points with unique identifiers.
Field Numbers
The critical IDs used for compact binary encoding.
Enums
Defining a restricted set of named constants.
Packages
Using namespaces to prevent naming collisions.
Composition
Building complex models through nesting and imports.
Collections
Handling arrays and lists of data efficiently.
Maps
Using dictionaries for key-value pairs.
Oneof
Polymorphic fields for mutually exclusive data.
Type Reference
Reference for all scalar and well-known types.
Most teams do not hand-write serializers. They generate code from .proto files. The generated code provides typed message constructors, binary serialization, JSON mapping, and service bindings depending on the plugin.
syntax = "proto3"; package demo.v1; message User { string id = 1; string username = 2; bool is_active = 3; }
From Contract to Runtime API
This schema defines a User message with three fields. Each field has a type, a generated-code name, and a stable field number used by the binary format.
Once code is generated from this schema, you can:
- Instantiate: Create
Userobjects in your language with type checking and editor support. - Serialize: Convert objects into compact binary buffers for transmission or storage.
- Validate: Apply schema and business rules before data reaches application logic.
Option 1: The Classic protoc
The protoc compiler is the original tool for Protobuf. It requires manual management of plugins and CLI flags. Using --es_out assumes that a binary named protoc-gen-es is available on your system's PATH.
$ npm install --save-dev @bufbuild/protoc-gen-es$ protoc --es_out=src/gen --es_opt=target=ts proto/demo/v1/user.protoOption 2: The Modern buf
buf keeps generation declarative with a buf.gen.yaml file, making the workflow reproducible and easier to share across a team.
version: v2 plugins: - remote: buf.build/bufbuild/es:v2.12.0 out: src/gen opt: target=ts
$ buf generateUsing the Generated Code
With code generated by protobuf-es, the schema becomes a native TypeScript API.
import { create, toBinary, toJsonString } from "@bufbuild/protobuf"; import { UserSchema } from "./gen/demo/v1/user_pb"; const user = create(UserSchema, { id: "usr_123", username: "cyber_ninja", isActive: true, }); const bytes = toBinary(UserSchema, user); const json = toJsonString(UserSchema, user);
Messages are the primary logical structure in Protobuf. They act as containers for your data, analogous to a struct in C/Rust or a class in Java/TypeScript.
Think of a message as a strictly-enforced contract. Once defined in a schema, the Protobuf compiler ensures that every system interacting with this data, regardless of the programming language, agrees on its structure.
One of the best features of Protobuf is that it's designed to be evolvable. You can add new fields to messages without breaking existing code, which lets servers and clients upgrade at their own pace. This is a big deal for a binary format. Many other formats don't support this level of compatibility out of the box.
// A simple message definition message SearchRequest { string query = 1; int32 page_number = 2; int32 results_per_page = 3; }
Every field in a message requires a specific type (e.g., string, int32, bool) and a name.
Since Protobuf is strictly typed, it catches many of the data-type errors that would otherwise only show up at runtime with formats like JSON. If a client expects an integer, they will never accidentally receive a string.
While names are used in your code for readability, they are mostly ignored on the wire. This allows you to rename fields in your schema without breaking binary compatibility (though it may break JSON consumers).
message User { string username = 1; bool is_active = 2; uint32 login_count = 3; }
Field numbers are the most critical part of a Protobuf message. Instead of sending long string names (like "username") over the wire, Protobuf only sends this integer ID.
This "Tag" is what makes Protobuf so compact. Because these numbers identify fields, they must never be changed once a message type is in use. Reusing a number for a different field will cause catastrophic data corruption.
Optimization Tip: Numbers 1 through 15 take 1 byte to encode (including the field number and wire type). Numbers 16 through 2047 take 2 bytes. Use 1-15 for your most frequently sent fields!
message User { // "1" is the ID on the wire string id = 1; // Small numbers (1-15) take 1 byte to encode string name = 2; }
Enums allow you to define a restricted set of named constants. This is crucial for states, roles, or configurations.
In proto3, the first constant must always map to zero. This serves as the default value when the field is not explicitly set in the binary payload.
Naming Convention
To avoid name collisions in languages like C++ or Go (where enum values are often in the parent scope), it is a best practice to prefix values with the enum name.
Open Enums: Modern Protobuf implementations support "open" enums, meaning if a server sends a value that a client doesn't recognize, the client will still preserve that value instead of crashing.
enum Status { // Prefixing avoids collisions STATUS_UNSPECIFIED = 0; STATUS_ACTIVE = 1; STATUS_DEFERRED = 2; } message User { Status current_status = 1; }
As your project grows, you'll likely have many messages with similar names. Protobuf uses package declarations to prevent name clashes.
These packages often map directly to namespaces in C++, packages in Go/Java, or modules in TypeScript. They are essential for organizing large-scale schemas and ensuring that an Account in the billing service doesn't conflict with an Account in the identity service.
syntax = "proto3"; // Defines the namespace package demo.identity.v1; message Account { string id = 1; }
Protobuf supports complex, hierarchical data structures. You can define messages within other messages, or use previously defined messages as field types.
This Composition allows you to build highly reusable domains of data models. For example, a Location message can be used across User, Event, and Office messages.
On the wire, embedded messages are "length-delimited", allowing decoders to skip the entire sub-message if they don't have the schema for it.
message Result { string url = 1; string title = 2; } message SearchResponse { // Result is embedded here Result top_result = 1; }
To represent an array or list of items, use the repeated keyword. These fields can contain zero or more elements of the specified type.
In modern Protobuf, repeated scalar numeric fields (like int32, float, etc.) are "packed" by default. Instead of repeating the field tag for every element, they are stored as one single block with a length prefix. This is significantly more efficient for large arrays.
message SearchResponse { // A list of strings repeated string related_queries = 1; // A list of messages repeated Result results = 2; }
Protobuf provides native support for associative maps (dictionaries). However, there are strict rules for map keys and values:
- Keys: Can be any integral or string type. Messages, enums, floats, and bytes cannot be keys.
- Values: Can be any type, including another message, but cannot be another map or a repeated field.
Behind the scenes, maps are actually just repeated messages with key and value fields, ensuring backward compatibility with older decoders.
message Project { string name = 1; // A dictionary of string keys to string values map<string, string> labels = 2; }
If you have a message with multiple fields where only one can be set at a time, you can enforce this behavior and save memory using the oneof keyword.
Setting any field within the oneof automatically clears all other fields in that same oneof. This is Protobuf's equivalent to a tagged union or variant.
This is perfect for polymorphism, such as an Event that could be a ClickEvent, HoverEvent, or ScrollEvent.
message ErrorStatus { string message = 1; oneof details { string stack_trace = 2; int32 error_code = 3; } }
Protobuf's type system is designed for both strictly-enforced contracts and maximum binary efficiency. Basic Types (scalars) map directly to standard primitives in your programming language.
Well-Known Types (WKTs) are specialized schemas standardized by Google. They are assumed to be known by all Protobuf compilers and have specialized JSON mappings to ensure clean, idiomatic integration with web APIs.
Numeric Types
Object Types
Guidelines for Integers
Choosing the right integer type is important for both message size and language compatibility; here are some general guidelines.
Use for typical signed integers. int32 covers most use cases; use int64 for large IDs or timestamps.
Ideal when you know values will never be negative. Slightly more efficient than int for large positive values.
Crucial when values can be negative. Uses ZigZag encoding to keep small negative numbers compact (1–2 bytes), unlike int32/int64 which require 10 bytes for any negative value.
Always uses 4 or 8 bytes. More efficient than varints ONLY if values are consistently greater than 228.
ProtoJSON Mapping
While Protobuf is primarily binary, it defines a canonical ProtoJSON mapping. This ensures that every binary payload has a deterministic representation in JSON.
| Protobuf Type | JSON Type(s) | JSON Value Example | Notes |
|---|---|---|---|
Protobuf Typemessage | JSON TypeObject | Example{"userName": "hiro"} | NoteSerialized as a JSON object. Field names are mapped to lowerCamelCase by default, or the json_name option if set. |
Protobuf Typerepeated | JSON TypeArray | Example["a", "b"] | NoteSerialized as a JSON array. |
Protobuf Typemap<K, V> | JSON TypeObject | Example{"k": "v"} | NoteSerialized as a JSON object. |
Protobuf Typeint32, float, double | JSON TypeNumber | Example123.45 | NoteStandard JSON numbers. |
Protobuf Typebool | JSON TypeBoolean | Exampletrue | NoteStandard JSON booleans. |
Protobuf Typeint64, uint64 | JSON TypeString | Example"9007199254740993" | NoteStrings prevent precision loss in JS. |
Protobuf Typeenum | JSON TypeString | Example"ROLE_ADMIN" | NoteUses the string name of the enum value. |
Protobuf Typebytes | JSON TypeString | Example"NDI=" | NoteBase64 encoded string. |
Protobuf Typegoogle.protobuf.Timestamp | JSON TypeString | Example"2023-10-01T12:00:00Z" | NoteRFC 3339 formatted timestamp string. |
Protobuf Typegoogle.protobuf.Duration | JSON TypeString | Example"1.000340012s" | NoteSeconds with up to 9 fractional digits. |
Protobuf Typegoogle.protobuf.FieldMask | JSON TypeString | Example"f.a,f.b" | NoteComma-separated paths as a single string. |
Protobuf Typegoogle.protobuf.Struct | JSON TypeObject | Example{"foo": "bar"} | NoteStandard representation for a generic JSON object. |
Protobuf Typegoogle.protobuf.Value | JSON TypeAny | Example"foo" or 123 | NoteCan be any valid JSON value (null, number, string, boolean, struct, or list). |
Protobuf Typegoogle.protobuf.NullValue | JSON Typenull | Examplenull | NoteThe JSON null value. |
Protobuf Typegoogle.protobuf.Empty | JSON TypeObject | Example{} | NoteAn empty JSON object. |
64-bit Precision
JavaScript numbers are 64-bit floats, which lose precision for integers above 253 - 1.
To prevent data loss, 64-bit integer types (int64, fixed64, uint64, sint64, and sfixed64) are encoded as strings in JSON.