Skip to content
protobuf.kmcd.dev

Tooling

03_TOOLING

Reflection & Tooling

Descriptors make schemas machine-readable. This page covers runtime reflection, compiler plugins, custom options, and validation built on top of the descriptor layer.

Schemas Describing Schemas

When you run the Protobuf compiler (protoc), it doesn't just generate code. It can also output a binary representation of your schema called a FileDescriptorSet.

Fascinatingly, this FileDescriptorSet is itself a Protobuf message! Google defines a schema (descriptor.proto) that describes how to represent .proto files. This means you can use Protobuf tools to read and analyze Protobuf schemas dynamically at runtime.

Why is this useful?

Dynamic Decoding

Tools like this web explorer use descriptors to decode arbitrary binary data without generating static code.

Validation

Complex rule engines (like protovalidate) use descriptors to apply constraints dynamically.

Code Generation

Protoc plugins (the tools that generate your code) receive these descriptors as input. This is THE way that custom code generators are built.

DESCRIPTOR.PROTO (SNIPPET)
// The schema that describes a schema
message FileDescriptorSet {
  repeated FileDescriptorProto file = 1;
}

message FileDescriptorProto {
  optional string name = 1;
  optional string package = 2;
  repeated DescriptorProto message_type = 4;
  repeated EnumDescriptorProto enum_type = 5;
  // ...
}

message DescriptorProto {
  optional string name = 1;
  repeated FieldDescriptorProto field = 2;
  // ...
}

Try editing the schema below to see how the generated FileDescriptorSet changes in real-time.

SCHEMA_EDITOR (.proto)
edition = "2023";

package demo.v1;

import "buf/validate/validate.proto";

message User {
  string id = 1 [json_name = "uid"];
  string name = 2;
  uint32 age = 3 [(buf.validate.field).uint32.lt = 150];
  Role role = 4;

  enum Role {
    ROLE_UNSPECIFIED = 0;
    ROLE_USER = 1;
    ROLE_ADMIN = 2;
  }
}
Valid
DESCRIPTOR_OUTPUT

Correct compilation errors
to view descriptor

The protoc (or buf generate) compiler doesn't actually know how to generate code for Go, Java, or TypeScript. Instead, it parses the .proto files and hands the resulting Descriptors to a plugin.

This architecture allows anyone to write a plugin to generate a wide range of outputs, such as documentation, client libraries, or even SQL schemas, from a Protobuf definition. For more information, see the plugin.proto file itself.

PLUGIN_ARCHITECTURE
// Example: Running a custom plugin
$ protoc --plugin=protoc-gen-custom=./my-plugin \
         --custom_opt=log_level=debug,other_flag=true \
         --custom_out=./generated \
         schema.proto
I/O Architecture

The compiler starts the plugin program as a subprocess.

  • stdin: The compiler passes a binary serialized CodeGeneratorRequest message.
  • stdout: The plugin must return a binary serialized CodeGeneratorResponse message. The plugin must not modify the filesystem directly; it returns the files to be written in this response.
  • stderr: Used strictly for logging and errors. Any logging should be disabled by default and controlled by a CLI flag to keep the output clean.
Request & Response Details

Flags & Parameters: Any options passed via --<plugin>_opt are provided to the plugin in the parameter field of the CodeGeneratorRequest as a single comma-separated string. The plugin is responsible for parsing and splitting this string.

What to generate: The compiler passes many files (including dependencies), but the plugin must only generate code for the files listed in the file_to_generate field of the request.

Required Features: In the CodeGeneratorResponse, you are heavily encouraged to explicitly declare your supported features. Setting supported_features along with minimum_edition and maximum_edition is essentially required, as users cannot compile modern Protobuf Editions using your plugin without them.

Protobuf extensions allow you to declare that a message has a range of field numbers reserved for external usage. Third parties can then define new fields for that message without modifying the original file.

How extension support differs across versions:

  • proto2: Allows extensions on any message (both user-defined messages and standard options).
  • proto3: Restricts extensions exclusively to option messages (specifically to define custom options; more on that later).
  • Editions: Restores the ability to extend any message (bringing back general-purpose extensions) while keeping option definitions standard and native.

To use extensions in proto2 or Editions, you must define an extension range in the base message using the extensions keyword. External files can then declare fields targeting that range.

Extension Numbers are Field Numbers

Under the hood, extension numbers are standard field numbers. Because they occupy tag space in the serialized message, you must ensure that no two extensions targeting the same message use the same number, as this would result in collisions and data corruption.

BASE.PROTO (DEFINITION)
edition = "2023";

message UserProfile {
  string username = 1;

  // Declare range of tags reserved for third-party extensions
  extensions 100 to 199;
}
BILLING.PROTO (EXTENSION)
edition = "2023";
import "base.proto";

// Extend the custom UserProfile message directly
extend UserProfile {
  optional string stripe_customer_id = 100;
}

Protobuf options control how code is generated and how data is mapped. They are categorized by scope: File, Message, Field, or Service.

  • option go_package: Defines the Go import path.
  • option java_package: Defines the Java package.
  • option optimize_for = SPEED;: Generates highly optimized (but larger) code. Alternatives: CODE_SIZE, LITE_RUNTIME.
  • [deprecated = true]: Marks a field as deprecated.
  • [json_name = "custom"]: Sets a custom JSON key.
OPTIONS_SNIPPET
edition = "2023";

option go_package = "github.com/example/v1";
option java_multiple_files = true;
option optimize_for = SPEED;

message User {
  string user_id = 1 [json_name = "uid"];
  string old_field = 2 [deprecated = true];
}

You can define custom "options" (annotations) to attach metadata to your schema. Common use cases include defining data validation rules (e.g., protovalidate), field-level data classification (e.g., tagging PII), and service-level access control (e.g., defining required roles for RBAC).

These annotations are preserved in the binary descriptors, which makes them accessible to anything that processes your schema. This includes protoc plugins that generate custom code, systems that configure themselves during startup, or dynamic tools that load and inspect schemas on demand via reflection.

Under the hood, custom options are defined by using the extend keyword to target the built-in option descriptor messages (like FieldOptions or MethodOptions).

Available Scopes

FileMessageFieldOneofEnumEnumValueServiceMethod

Metadata can be attached to any of these points by extending the respective standard descriptor messages.

For more information, see the Editions Custom Options Guide.

Extension Registries

Because extensions are defined globally for a descriptor (like FieldOptions), you must ensure your field numbers don't conflict with others.

Google maintains a Global Extension Registry for public projects.

OPTIONS.PROTO (DEFINITION)
edition = "2023";
import "google/protobuf/descriptor.proto";

extend google.protobuf.FieldOptions {
  bool is_pii = 50001;
}

extend google.protobuf.MethodOptions {
  string required_role = 50002;
}
SERVICE.PROTO (USAGE)
edition = "2023";
import "options.proto";

service UserService {
  rpc GetSensitiveData(GetRequest) returns (GetResponse) {
    option (required_role) = "ADMIN";
  }
}

message Profile {
  string ssn = 1 [(is_pii) = true];
}

Not all breaking changes are equal. Tools like Buf categorize breaking changes into four distinct levels of severity.

  • WIRE: The most severe level. This includes changing a field number or using an incompatible type (e.g., string to int32). This causes data corruption during serialization; you should never do this.
  • WIRE_JSON: Breakage in JSON representation. Renaming a field is safe on the binary wire, but clients expecting the old JSON key will fail. You can mitigate this using the [json_name="old_name"] annotation.
  • PACKAGE: Source code breakage at the package level. Changing a type in a wire-compatible way (e.g., int32 to int64) transmits safely, but when developers update their generated code, their builds will fail until they update their types.
  • FILE: The strictest level. This ensures source code compatibility down to the individual file level. Moving a message to another file might break code generation that relies on specific file imports.
BEFORE
edition = "2023";
package api.v1;

message User {
  string id = 1;
  int32 age = 2;
  string display_name = 3;
}
AFTER
edition = "2023";
package api.v1;

message User {
  // [WIRE] breakage: type changed from string
  int32 id = 1; 

  // [PACKAGE] breakage: source code type change
  int64 age = 2;

  // [WIRE_JSON] breakage: JSON key changed
  string full_name = 3; 
}

Protobuf identifies data on the wire using field numbers rather than names, so deleting a field requires careful handling. If a schema has been used in production, older clients or databases may still hold data serialized with those field numbers. You cannot simply remove a field and reuse its number without risking collisions. Instead, you must manage its lifecycle:

  1. Deprecate: Add [deprecated = true]. This warns developers in their IDEs (via generated code annotations like @Deprecated) not to use it for new features.
  2. Stop Using: Wait until metrics show zero traffic using the field.
  3. Reserve: Remove the field entirely and add its number/name to a reserved block. This prevents future developers from accidentally reusing the number and corrupting old data that might still be in a database.
Step 1: The original schema
message Product {
  int32 price_cents = 1;
}
Step 2: Deprecate the old field, add the new one
message Product {
  int32 price_cents = 1 [deprecated = true];
  int64 price_micros = 2;
}
Step 3: Remove the old field and reserve its ID/name
message Product {
  reserved 1, "price_cents";

  int64 price_micros = 2;
}

The Source of Truth

Protobuf goes beyond simple types. By using extensions, you can augment your schema with rich metadata. A powerful example is protovalidate, which allows you to embed complex business rules directly into your schema using CEL. Try modifying the JSON data below or clicking the example buttons to see the validation rules in action.

Test Data (JSON)

JSON_INPUT
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "Hiro Protagonist",
  "email": "hiro@metaverse.com",
  "age": 30,
  "role": 2,
  "birthDate": {
    "year": 1996,
    "month": 1,
    "day": 1
  }
}

Rules Enforcement

VALIDATION_STATUS

Waiting for
valid input

Validation Strategy

By putting validation in the schema, you ensure that every part of your system enforcing the contract applies the exact same rules. This eliminates "validation drift" not just between microservices, but across your entire stack. For instance, you can use the same rules to validate a form on your web frontend (using TypeScript) before the request ever hits your backend (running Go, Java, etc.).