Protobuf offers advanced features for modeling richer APIs and evolving schemas safely over time. Reflection, plugins, custom options, and validation now live in the Tooling section.
Modeling Track
Imports, dynamic payloads, field masks, and services.
Evolution Track
Compatibility, editions, presence, required fields, and limits.
Imports
Managing dependencies and resolving import paths.
Version Editions
The modern evolution of Protobuf feature management.
Services
Defining RPC interfaces for networked communication.
Presence
Distinguishing between default values and missing data.
Required Fields
Handling 'required' fields and business rules in an evolvable way.
Size Limits
Architectural constraints and memory behavior.
03_TRACK_A
Schema Modeling
These sections cover the schema features used to model larger APIs: imports, dynamic payloads, partial updates, and service definitions.
You can use definitions from other .proto files using the import statement. However, managing these paths correctly is one of the most common points of friction in Protobuf development.
To avoid these issues, always import using the fully qualified path from the root of your project or your --proto_path.
Buf eliminates this by using a buf.yaml to define your deterministic module root. It handles imports and paths gracefully and allows for remote dependencies (similar to NPM/Cargo).
edition = "2023"; package common.v1; message User { string id = 1; string name = 2; }
edition = "2023"; package auth.v1; import "common/v1/user.proto"; message LoginResponse { common.v1.User user = 1; string session_token = 2; }
# Set the root as the import path (-I .) # This forces imports to use fully qualified paths protoc -I . \ --go_out=. \ auth/v1/service.proto
The Any type allows you to include messages where the schema isn't known at compile time.
google.protobuf.Any embeds an arbitrary serialized Protobuf message along with a URL that identifies its type (e.g., type.googleapis.com/mypackage.MyMessage).
When serialized to ProtoJSON, this type identifier is rendered as a special @type property alongside the standard JSON fields of the embedded message, allowing parsers to route the payload correctly.
// In Proto: import "google/protobuf/any.proto"; message Event { google.protobuf.Any payload = 1; } // In ProtoJSON: // { // "payload": { // "@type": "type.googleapis.com/demo.User", // "name": "Hiro" // } // }
If you are working with dynamic protobuf messages, use Any. However, if you are working with arbitrary structured JSON data that we don't want to model or is completely dynamic (like a schema-less JSON object), use google.protobuf.Value or google.protobuf.Struct.
A Value represents a dynamically typed value which can be either a null, a number, a string, a boolean, a recursive struct (object), or a list of values. It perfectly maps to any valid JSON structure.
Use this sparingly, as it defeats the purpose of Protobuf's strong typing, but it's essential for integrating with schemaless NoSQL databases or passing untyped metadata blocks.
// In Proto: import "google/protobuf/struct.proto"; message Event { // Represents any arbitrary JSON value google.protobuf.Value metadata = 1; // Represents specifically a JSON object google.protobuf.Struct custom_attributes = 2; } // In ProtoJSON: // { // "metadata": "simple string or object", // "custom_attributes": { // "dynamic_key": [1, 2, 3], // "enabled": true // } // }
google.protobuf.FieldMask is a well-known type used to identify a subset of fields in a request.
It is extremely useful for partial updates (PATCH), allowing a client to send only the modified fields instead of the entire object.
Beyond updates, FieldMask is a powerful tool for tuning read responses. You can design a single List or Get response that supports many optional fields and associations (e.g., user.profile, user.settings). The client passes a read_mask to tell the server exactly which subset of data to return, eliminating "over-fetching" without needing multiple specialized endpoints.
import "google/protobuf/field_mask.proto"; message GetUserRequest { string id = 1; // Client requests only specific fields // e.g. ["name", "email", "metadata.last_login"] google.protobuf.FieldMask read_mask = 2; } message UpdateUserRequest { User user = 1; // Client identifies which fields to update google.protobuf.FieldMask update_mask = 2; }
The service keyword is used to define RPC (Remote Procedure Call) interfaces. Frameworks like gRPC or Connect use these definitions to generate client and server code.
Services support four types of communication:
- Unary: Simple request-response.
- Streaming: Send or receive sequences of messages in a single call (Client, Server, or Bidirectional).
Note: While Protobuf provides the language to define these interfaces, the underlying networking protocols and implementation frameworks (like gRPC or Connect) are a broad topic and are out of scope for this guide.
service UserService { // Unary: One request, one response rpc GetUser(GetUserRequest) returns (User); // Server Stream: One request, many responses rpc ListUsers(ListUsersRequest) returns (stream User); // Bidirectional Stream: Real-time chat rpc Chat(stream Message) returns (stream Message); }
03_TRACK_B
Schema Evolution
Compatibility is the difficult part of long-lived Protobuf systems. These sections focus on what can change, what must be reserved, and how presence affects API behavior.
Protobuf is strictly designed for forward and backward compatibility. However, there are strict rules about what you CANNOT change.
As long as you follow the rules, old clients can read new messages (ignoring unknown fields), and new clients can read old messages (using default values for missing fields).
Automated Enforcement
Tools like buf breaking automate these checks by comparing your local changes against a previous version (e.g., your main branch) and failing if any wire-breaking changes are detected.
// Check for breaking changes against main branch $ buf breaking --against .git#branch=main // Example failure output: // user.proto:10:3: Field "1" changed type // from "string" to "int32". // user.proto:12:3: Previously present // field "3" deleted.
Protobuf Editions unifies proto2 and proto3, allowing features to be toggled individually rather than through major syntax version upgrades.
Editions allows for smooth migrations and fine-grained control over behaviors:
- Field Presence: Choose between IMPLICIT (proto3 default) or EXPLICIT (proto2 default).
- Enum Type: OPEN enums allow unknown values, while CLOSED enums treat them as invalid.
- Repeated Encoding: Standardize on PACKED (for efficiency) or EXPANDED (for compatibility).
This shift represents a fundamental change in the Protobuf lifecycle. By decoupling features from syntax versions, Editions provides a path for the ecosystem to evolve more rapidly. This approach allows new features to be introduced as optional behaviors without the disruption of a global "proto4" release.
edition = "2023"; // Globally enforce field presence option features.field_presence = EXPLICIT; message User { // Optional fields are back string name = 1; // Mixed behavior in one file! int32 age = 2 [features.enum_type = OPEN]; }
Implicit vs. Explicit
Field presence determines whether a receiver can distinguish between a field that was never set and one that was set to its default value (like 0 or ""). In short, implicit presence saves space by never sending default values, while explicit presenceincludes extra tracking to definitively tell you if a field was populated.
The Modern Solution
Due to widespread demand, the optional keyword was re-introduced in later versions of proto3 (v3.15+). Today, Protobuf Editions provides the most robust solution by allowing you to globally or locally toggle field_presence between IMPLICIT and EXPLICIT.
File-Level Default
edition = "2023"; // Set EXPLICIT presence for the entire file option features.field_presence = EXPLICIT; message Profile { string bio = 1; // Explicit (tracked) int32 views = 2; // Explicit (tracked) }
Field-Level Overrides
message LegacyData { // Override to IMPLICIT for specific fields int32 raw_id = 1 [features.field_presence = IMPLICIT]; // Follows file-level default (EXPLICIT) string note = 2; }
The Evolution of Required
The required keyword was famously removed in proto3. This was a deliberate architectural decision to ensure that schemas could evolve safely without breaking backward compatibility.
Why was it removed?
If a field is marked required, it must be present in every message. If you later decide to stop sending that field, every older client in the world will crash when they try to decode the new message. Required fields are considered harmful for long-term schema evolution.
Modern Best Practices
Application Validation: Use generated getters that return zero values if the field is missing (e.g., Go's
GetField()) and perform null checks in your business logic.Metadata Validation: Use extensions like
protovalidateto declare constraints (includingrequired) in the IDL without breaking wire compatibility.
import "buf/validate/validate.proto"; message CreateUserRequest { // Required at the validation layer // but optional at the wire layer. string email = 1 [ (buf.validate.field).string.email = true, (buf.validate.field).required = true ]; }
// Safe access even if req is nil email := req.GetEmail() if email == "" { return status.Error(InvalidArgument, "email is required") }
The Hard Limit
The absolute maximum size of a serialized protobuf message is 2 GiB. This is a hard architectural limit because the protocol relies on 32-bit signed integers to encode byte lengths and offsets. If a payload exceeds this size, standard parsers will throw an overflow error and refuse to read it.
The Typical Size
Protobuf is optimized for small, fast payloads. The official recommendation is to keep messages under a few megabytes. In practice, the ideal size is typically under 1 MB.
Once a message grows beyond 10 MB, the CPU and memory costs of parsing become highly noticeable. For moving large datasets, the standard pattern is to chunk the data into a stream of smaller messages.
Full Graph Parsing
Protobuf is fundamentally designed around the expectation that you will load the entire message into memory at once.
When you deserialize a payload, the parser reads the entire binary stream and instantiates a complete object graph.
In-Memory Expansion
As with most serialization formats, the resulting in-memory representation is significantly larger than the serialized binary. Pointers, object overhead, and data structure padding can cause memory usage to be several times the size of the original payload.