Services
Documentation for GCP Cloud Run microservices
This section contains documentation for the Go microservices deployed to GCP Cloud Run as part of the home lab infrastructure.
Service Architecture
All services follow a consistent architecture pattern:
- Framework: Built using
z5labs/humus framework with OpenAPI-first design - Runtime: Go 1.24+ deployed to GCP Cloud Run
- Observability: OpenTelemetry metrics, traces, and logs
- Health Checks: Standard
/health/startup and /health/liveness endpoints - Configuration: Embedded
config.yaml with OpenAPI specifications
1 - Boot Service
UEFI HTTP boot endpoints and boot profile management
The Boot Service is a custom Go microservice that provides UEFI HTTP boot endpoints for bare metal servers and manages boot profiles. It serves boot scripts, streams kernel/initrd assets, and handles boot profile administration (kernel/initrd upload, storage, and lifecycle management).
Architecture Overview
The Boot Service is deployed on GCP Cloud Run and accessed through a WireGuard VPN tunnel from bare metal servers. It integrates with:
- Machine Service: Retrieves machine hardware profiles by MAC address
- Cloud Storage: Stores and retrieves kernel/initrd blobs
- Firestore: Stores boot profile metadata
- Cloud Monitoring: OpenTelemetry observability with distributed tracing
API Endpoints
UEFI HTTP Boot Endpoints
Accessed by bare metal servers during boot process (via WireGuard VPN):
Admin API
Boot profile management endpoints for administrators:
Health Check Endpoints
Standard Cloud Run health endpoints:
Security Model
VPN-Based Access Control
Since HP DL360 Gen 9 servers do not support client-side TLS certificates for UEFI HTTP boot, all boot traffic is secured via WireGuard VPN:
- Boot Endpoints: Only accessible through WireGuard tunnel (source IP validation)
- Transport Security: WireGuard provides mutual authentication and encryption
Authentication Methods
- UEFI Boot Endpoints: VPN source IP validation (bare metal servers)
- Health Checks: Unauthenticated (used by Cloud Run for liveness/startup probes)
Common Patterns
Error Responses
All API endpoints follow the RFC 7807 Problem Details standard (see ADR-0007):
{
"type": "https://api.example.com/errors/resource-not-found",
"title": "Resource Not Found",
"status": 404,
"detail": "Machine with MAC address aa:bb:cc:dd:ee:ff not found",
"instance": "/api/v1/boot/aa:bb:cc:dd:ee:ff/profile",
"mac_address": "aa:bb:cc:dd:ee:ff"
}
Error responses use Content-Type: application/problem+json.
Standard HTTP Status Codes
200 OK - Successful request201 Created - Resource created successfully204 No Content - Successful deletion400 Bad Request - Invalid request parameters401 Unauthorized - Missing or invalid authentication403 Forbidden - Insufficient permissions404 Not Found - Resource not found409 Conflict - Resource already exists422 Unprocessable Entity - Validation error500 Internal Server Error - Server error
Content Types
application/json - JSON responses (admin API)application/problem+json - RFC 7807 error responsestext/plain - iPXE boot scriptsapplication/octet-stream - Binary boot assets (kernel, initrd)text/cloud-config - Cloud-init configuration files
1.1 - GET /boot.ipxe
Serves iPXE boot scripts customized for the requesting machine
Serves iPXE boot scripts customized for the requesting machine based on its MAC address. This endpoint is accessed by bare metal servers (HP DL360 Gen 9) during the UEFI HTTP boot process through the WireGuard VPN tunnel.
Sequence Diagram
sequenceDiagram
participant Client as Bare Metal Server
participant Boot as Boot Service
participant MachineAPI as Machine Service
participant DB as Firestore
Client->>Boot: GET /boot.ipxe?mac=52:54:00:12:34:56
Boot->>Boot: Validate MAC address format
Boot->>MachineAPI: GET /api/v1/machines?mac=52:54:00:12:34:56
MachineAPI->>DB: Query machine by NIC MAC
DB-->>MachineAPI: Machine profile (machine_id)
MachineAPI-->>Boot: Machine profile
Boot->>DB: Query boot profile by machine_id
DB-->>Boot: Boot profile (profile_id, kernel_id, initrd_id, kernel args)
Boot->>Boot: Generate iPXE script with profile_id
Boot-->>Client: 200 OK (iPXE script)Request
Query Parameters:
| Parameter | Type | Required | Description |
|---|
mac | string | Yes | MAC address of the requesting machine (format: aa:bb:cc:dd:ee:ff) |
Request Example:
GET /boot.ipxe?mac=52:54:00:12:34:56 HTTP/1.1
Host: boot.internal
Response
Response Example (200 OK):
#!ipxe
# Boot configuration for node-01 (52:54:00:12:34:56)
# Boot Profile ID: 018c7dbd-a1b2-7000-8000-987654321def
# Generated: 2025-11-19T06:00:00Z
kernel /asset/018c7dbd-a1b2-7000-8000-987654321def/kernel console=tty0 console=ttyS0 ip=dhcp
initrd /asset/018c7dbd-a1b2-7000-8000-987654321def/initrd
boot
Response Headers:
Content-Type: text/plain; charset=utf-8Cache-Control: no-cache, no-store, must-revalidate
Error Responses:
All error responses follow RFC 7807 Problem Details format (see ADR-0007) with Content-Type: application/problem+json.
400 Bad Request - Missing or invalid MAC address:
{
"type": "https://api.example.com/errors/invalid-mac-address",
"title": "Invalid MAC Address",
"status": 400,
"detail": "MAC address must be in format aa:bb:cc:dd:ee:ff",
"instance": "/boot.ipxe",
"mac_address": "invalid-mac"
}
404 Not Found - No boot configuration found for MAC:
{
"type": "https://api.example.com/errors/machine-not-configured",
"title": "Machine Not Configured",
"status": 404,
"detail": "No boot configuration found for MAC address 52:54:00:12:34:56",
"instance": "/boot.ipxe?mac=52:54:00:12:34:56",
"mac_address": "52:54:00:12:34:56"
}
500 Internal Server Error - Database or template error:
{
"type": "https://api.example.com/errors/internal-error",
"title": "Internal Server Error",
"status": 500,
"detail": "Failed to generate boot script due to an internal error",
"instance": "/boot.ipxe?mac=52:54:00:12:34:56"
}
Boot Script Variables
The iPXE script may include the following dynamic values:
- Machine-specific kernel parameters
- Asset download URLs (using boot profile ID format)
- Network configuration parameters
Security Considerations
VPN Source IP Validation
All boot endpoints validate that requests originate from the WireGuard VPN subnet:
- Allowed CIDR:
10.x.x.0/24 (WireGuard VPN network) - Validation: Performed at Cloud Run ingress or application layer
- Rejection: Requests from outside VPN return
403 Forbidden
Rate Limiting
To prevent abuse, boot endpoints are rate-limited:
- Boot Script: 10 requests/minute per MAC address
Observability
All boot endpoint requests are instrumented with OpenTelemetry following HTTP semantic conventions:
- Metrics: OpenTelemetry HTTP server metrics (request count, duration, size)
http.server.request.duration - Request duration histogramhttp.server.request.body.size - Request body sizehttp.server.response.body.size - Response body size
- Traces: End-to-end tracing from request to database retrieval
- HTTP server span captures request details (method, route, status code)
- Child spans for database queries and Machine Service API calls
- Logs: Structured logs with MAC address, boot profile ID, response status
1.2 - GET /asset/{boot_profile_id}/kernel
Streams kernel images from Cloud Storage for the boot process
Streams kernel images from Cloud Storage for the boot process. This endpoint is accessed by bare metal servers during UEFI HTTP boot through the WireGuard VPN tunnel.
Sequence Diagram
sequenceDiagram
participant Client as Bare Metal Server
participant Boot as Boot Service
participant Storage as Cloud Storage
participant DB as Firestore
Client->>Boot: GET /asset/018c7dbd-a1b2-7000-8000-987654321def/kernel
Boot->>Boot: Validate UUIDv7 format
Boot->>DB: Query boot profile by ID
DB-->>Boot: Boot profile (kernel_id)
Boot->>Storage: GET gs://bucket/blobs/{kernel_id}
Storage-->>Boot: Kernel data stream
Boot-->>Client: 200 OK (kernel stream)Request
Path Parameters:
| Parameter | Type | Required | Description |
|---|
boot_profile_id | string (UUIDv7) | Yes | Boot profile identifier (UUIDv7 format: 018c7dbd-a1b2-7000-8000-987654321def) |
Request Example:
GET /asset/018c7dbd-a1b2-7000-8000-987654321def/kernel HTTP/1.1
Host: boot.internal
Response
Response Example (200 OK):
Binary kernel image streamed from Cloud Storage.
Response Headers:
Content-Type: application/octet-streamContent-Length: 8388608 (actual kernel size in bytes)Cache-Control: public, max-age=3600ETag: "abc123..."
Error Responses:
All error responses follow RFC 7807 Problem Details format (see ADR-0007) with Content-Type: application/problem+json.
404 Not Found - Kernel image not found:
{
"type": "https://api.example.com/errors/kernel-not-found",
"title": "Kernel Not Found",
"status": 404,
"detail": "Kernel image not found for boot profile 018c7dbd-a1b2-7000-8000-987654321def",
"instance": "/asset/018c7dbd-a1b2-7000-8000-987654321def/kernel",
"boot_profile_id": "018c7dbd-a1b2-7000-8000-987654321def"
}
500 Internal Server Error - Cloud Storage error:
{
"type": "https://api.example.com/errors/storage-error",
"title": "Storage Error",
"status": 500,
"detail": "Failed to retrieve kernel from storage due to an internal error",
"instance": "/asset/018c7dbd-a1b2-7000-8000-987654321def/kernel"
}
- Streaming: File is streamed directly from Cloud Storage (no buffering in memory)
- Target Latency: < 100ms to first byte
- Typical Size: 8-15 MB for Linux kernels
Security Considerations
VPN Source IP Validation
All boot endpoints validate that requests originate from the WireGuard VPN subnet:
- Allowed CIDR:
10.x.x.0/24 (WireGuard VPN network) - Validation: Performed at Cloud Run ingress or application layer
- Rejection: Requests from outside VPN return
403 Forbidden
Rate Limiting
To prevent abuse, asset download endpoints are rate-limited:
- Asset Downloads: 5 concurrent downloads per MAC address
Asset Integrity
Boot assets are validated for integrity:
- Checksums: SHA-256 checksums stored in Firestore
- Verification: Computed on upload, verified on download (optional)
- ETag Headers: Enable client-side caching and integrity checks
Observability
All boot endpoint requests are instrumented with OpenTelemetry following HTTP semantic conventions:
- Metrics: OpenTelemetry HTTP server metrics
http.server.request.duration - Request duration histogramhttp.server.response.body.size - Response body size (tracks bytes transferred)
- Traces: End-to-end tracing from request to Cloud Storage retrieval
- HTTP server span captures request details (method, route, status code)
- Child spans for database queries and Cloud Storage operations
- Logs: Structured logs with boot profile ID, kernel ID, response status
1.3 - GET /asset/{boot_profile_id}/initrd
Streams initial ramdisk images from Cloud Storage for the boot process
Streams initial ramdisk (initrd) images from Cloud Storage for the boot process. This endpoint is accessed by bare metal servers during UEFI HTTP boot through the WireGuard VPN tunnel.
Sequence Diagram
sequenceDiagram
participant Client as Bare Metal Server
participant Boot as Boot Service
participant Storage as Cloud Storage
participant DB as Firestore
Client->>Boot: GET /asset/018c7dbd-a1b2-7000-8000-987654321def/initrd
Boot->>Boot: Validate UUIDv7 format
Boot->>DB: Query boot profile by ID
DB-->>Boot: Boot profile (initrd_id)
Boot->>Storage: GET gs://bucket/blobs/{initrd_id}
Storage-->>Boot: Initrd data stream
Boot-->>Client: 200 OK (initrd stream)Request
Path Parameters:
| Parameter | Type | Required | Description |
|---|
boot_profile_id | string (UUIDv7) | Yes | Boot profile identifier (UUIDv7 format: 018c7dbd-a1b2-7000-8000-987654321def) |
Request Example:
GET /asset/018c7dbd-a1b2-7000-8000-987654321def/initrd HTTP/1.1
Host: boot.internal
Response
Response Example (200 OK):
Binary initrd image streamed from Cloud Storage.
Response Headers:
Content-Type: application/octet-streamContent-Length: 52428800 (actual initrd size in bytes)Cache-Control: public, max-age=3600ETag: "def456..."
Error Responses:
All error responses follow RFC 7807 Problem Details format (see ADR-0007) with Content-Type: application/problem+json.
404 Not Found - Initrd image not found:
{
"type": "https://api.example.com/errors/initrd-not-found",
"title": "Initrd Not Found",
"status": 404,
"detail": "Initrd image not found for boot profile 018c7dbd-a1b2-7000-8000-987654321def",
"instance": "/asset/018c7dbd-a1b2-7000-8000-987654321def/initrd",
"boot_profile_id": "018c7dbd-a1b2-7000-8000-987654321def"
}
500 Internal Server Error - Cloud Storage error:
{
"type": "https://api.example.com/errors/storage-error",
"title": "Storage Error",
"status": 500,
"detail": "Failed to retrieve initrd from storage due to an internal error",
"instance": "/asset/018c7dbd-a1b2-7000-8000-987654321def/initrd"
}
- Streaming: File is streamed directly from Cloud Storage (no buffering in memory)
- Target Latency: < 100ms to first byte
- Typical Size: 50-150 MB for Linux initrd images
Security Considerations
VPN Source IP Validation
All boot endpoints validate that requests originate from the WireGuard VPN subnet:
- Allowed CIDR:
10.x.x.0/24 (WireGuard VPN network) - Validation: Performed at Cloud Run ingress or application layer
- Rejection: Requests from outside VPN return
403 Forbidden
Rate Limiting
To prevent abuse, asset download endpoints are rate-limited:
- Asset Downloads: 5 concurrent downloads per MAC address
Asset Integrity
Boot assets are validated for integrity:
- Checksums: SHA-256 checksums stored in Firestore
- Verification: Computed on upload, verified on download (optional)
- ETag Headers: Enable client-side caching and integrity checks
Observability
All boot endpoint requests are instrumented with OpenTelemetry following HTTP semantic conventions:
- Metrics: OpenTelemetry HTTP server metrics
http.server.request.duration - Request duration histogramhttp.server.response.body.size - Response body size (tracks bytes transferred)
- Traces: End-to-end tracing from request to Cloud Storage retrieval
- HTTP server span captures request details (method, route, status code)
- Child spans for database queries and Cloud Storage operations
- Logs: Structured logs with boot profile ID, initrd ID, response status
1.4 - POST /api/v1/profiles
Create a new boot profile for a machine
Create a new boot profile for a machine. If the machine already has a boot profile, this operation will fail - use PUT to update instead.
Cloud Storage Structure
Kernel and initrd binaries are stored in Google Cloud Storage using their UUIDv7 identifiers as object keys:
gs://{bucket}/blobs/{kernel_id}
gs://{bucket}/blobs/{initrd_id}
For example:
gs://boot-server-blobs/blobs/018c7dbd-b100-7000-8000-123456789abc
gs://boot-server-blobs/blobs/018c7dbd-b200-7000-8000-987654321fed
The UUIDv7 identifiers are generated server-side during upload, ensuring:
- Globally unique object keys
- Time-ordered storage (UUIDv7 timestamp prefix)
- No namespace collisions between profiles
Sequence Diagram
sequenceDiagram
participant Client as Admin Client
participant Boot as Boot Service
participant Storage as Cloud Storage
participant DB as Firestore
Client->>Boot: POST /api/v1/profiles (multipart/form-data)
Boot->>DB: Check if machine already has a boot profile
DB-->>Boot: No existing profile
Boot->>Boot: Generate UUIDv7 for profile
Boot->>Boot: Generate UUIDv7 for kernel blob
Boot->>Boot: Generate UUIDv7 for initrd blob
Boot->>Storage: PUT gs://bucket/blobs/{kernel_id}
Storage-->>Boot: Kernel stored
Boot->>Storage: PUT gs://bucket/blobs/{initrd_id}
Storage-->>Boot: Initrd stored
Boot->>DB: Store profile metadata (profile_id, kernel_id, initrd_id, machine_id)
DB-->>Boot: Profile created
Boot-->>Client: 201 Created (profile metadata with IDs)Request
Request Body (multipart/form-data):
Form fields:
machine_id (text): Machine identifier (UUIDv7)kernel (file): Kernel image fileinitrd (file): Initrd image filekernel_args (JSON array): Kernel command-line arguments
Example Request:
POST /api/v1/profiles HTTP/1.1
Host: boot.example.com
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW
------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="machine_id"
018c7dbd-c000-7000-8000-fedcba987654
------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="kernel"; filename="vmlinuz"
Content-Type: application/octet-stream
<kernel binary data>
------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="initrd"; filename="initrd.img"
Content-Type: application/octet-stream
<initrd binary data>
------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="kernel_args"
Content-Type: application/json
["console=tty0", "console=ttyS0", "ip=dhcp"]
------WebKitFormBoundary7MA4YWxkTrZu0gW--
Request Headers:
Content-Type: multipart/form-data
Response
Response (201 Created):
{
"id": "018c7dbd-a000-7000-8000-abcdef123456",
"machine_id": "018c7dbd-c000-7000-8000-fedcba987654",
"kernel": {
"id": "018c7dbd-b100-7000-8000-123456789abc",
"args": ["console=tty0", "console=ttyS0", "ip=dhcp"]
},
"initrd": {
"id": "018c7dbd-b200-7000-8000-987654321fed"
}
}
Error Responses:
All error responses follow RFC 7807 Problem Details format (see ADR-0007) with Content-Type: application/problem+json.
400 Bad Request - Invalid request body or missing required fields:
{
"type": "https://api.example.com/errors/validation-error",
"title": "Validation Error",
"status": 400,
"detail": "The request body failed validation",
"instance": "/api/v1/profiles",
"invalid_fields": [
{
"field": "machine_id",
"reason": "required field is missing"
}
]
}
409 Conflict - Machine already has a boot profile:
{
"type": "https://api.example.com/errors/boot-profile-exists",
"title": "Boot Profile Already Exists",
"status": 409,
"detail": "Machine 018c7dbd-c000-7000-8000-fedcba987654 already has a boot profile",
"instance": "/api/v1/profiles",
"machine_id": "018c7dbd-c000-7000-8000-fedcba987654",
"existing_profile_id": "018c7dbd-a000-7000-8000-abcdef123456"
}
422 Unprocessable Entity - Validation error (file too large, invalid JSON, machine_id not found):
{
"type": "https://api.example.com/errors/file-too-large",
"title": "File Too Large",
"status": 422,
"detail": "Kernel file exceeds maximum allowed size of 100MB",
"instance": "/api/v1/profiles",
"field": "kernel",
"file_size": 125829120,
"max_size": 104857600
}
Data Models
All data models are defined as Protocol Buffer (protobuf) messages and stored in Firestore.
Boot Profile
syntax = "proto3";
message Kernel {
string id = 1; // UUIDv7 blob identifier
repeated string args = 2; // Kernel command-line arguments
}
message Initrd {
string id = 1; // UUIDv7 blob identifier
}
message BootProfile {
string id = 1; // UUIDv7 identifier
string machine_id = 2; // Reference to machine (UUIDv7) - unique constraint
Kernel kernel = 3; // Kernel configuration
Initrd initrd = 4; // Initrd configuration
}
Note: The machine_id field has a unique constraint in Firestore, ensuring each machine has exactly one active boot profile.
1.5 - GET /api/v1/boot/{machine_id}/profile
Retrieve the active boot profile for a specific machine
Retrieve the active boot profile for a specific machine.
Sequence Diagram
sequenceDiagram
participant Client as Admin Client
participant Boot as Boot Service
participant DB as Firestore
Client->>Boot: GET /api/v1/boot/{machine_id}/profile
Boot->>DB: Query active boot profile for machine
DB-->>Boot: Boot profile
Boot-->>Client: 200 OK (boot profile)Request
Path Parameters:
| Parameter | Type | Required | Description |
|---|
machine_id | string | Yes | Machine identifier (UUIDv7 format) |
Example Request:
GET /api/v1/boot/018c7dbd-c000-7000-8000-fedcba987654/profile HTTP/1.1
Host: boot.example.com
Response
Response (200 OK):
{
"id": "018c7dbd-a000-7000-8000-abcdef123456",
"machine_id": "018c7dbd-c000-7000-8000-fedcba987654",
"kernel": {
"id": "018c7dbd-b100-7000-8000-123456789abc",
"args": ["console=tty0", "console=ttyS0", "ip=dhcp"]
},
"initrd": {
"id": "018c7dbd-b200-7000-8000-987654321fed"
}
}
Error Responses:
All error responses follow RFC 7807 Problem Details format (see ADR-0007) with Content-Type: application/problem+json.
404 Not Found - Machine not found or has no boot profile:
{
"type": "https://api.example.com/errors/boot-profile-not-found",
"title": "Boot Profile Not Found",
"status": 404,
"detail": "No boot profile found for machine 018c7dbd-c000-7000-8000-fedcba987654",
"instance": "/api/v1/boot/018c7dbd-c000-7000-8000-fedcba987654/profile",
"machine_id": "018c7dbd-c000-7000-8000-fedcba987654"
}
1.6 - PUT /api/v1/boot/{machine_id}/profile
Update the boot profile for a machine
Update the boot profile for a machine (replaces the existing profile).
Sequence Diagram
sequenceDiagram
participant Client as Admin Client
participant Boot as Boot Service
participant Storage as Cloud Storage
participant DB as Firestore
Client->>Boot: PUT /api/v1/boot/{machine_id}/profile
Boot->>DB: Get current active profile
DB-->>Boot: Current profile (old kernel_id, old initrd_id)
Boot->>Boot: Generate UUIDs for new kernel/initrd
Boot->>Storage: PUT new kernel/initrd blobs
Storage-->>Boot: Blobs stored
Boot->>DB: Update boot profile (replace kernel_id, initrd_id, args)
DB-->>Boot: Profile updated
Boot->>Storage: DELETE old kernel/initrd blobs
Boot-->>Client: 200 OK (updated profile)Request
Path Parameters:
| Parameter | Type | Required | Description |
|---|
machine_id | string | Yes | Machine identifier (UUIDv7 format) |
Request Body (multipart/form-data):
Form fields:
kernel (file): Kernel image fileinitrd (file): Initrd image filekernel_args (JSON array): Kernel command-line arguments
Example Request:
PUT /api/v1/boot/018c7dbd-c000-7000-8000-fedcba987654/profile HTTP/1.1
Host: boot.example.com
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW
------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="kernel"; filename="vmlinuz"
Content-Type: application/octet-stream
<kernel binary data>
------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="initrd"; filename="initrd.img"
Content-Type: application/octet-stream
<initrd binary data>
------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="kernel_args"
Content-Type: application/json
["console=tty0", "console=ttyS0", "ip=dhcp"]
------WebKitFormBoundary7MA4YWxkTrZu0gW--
Response
Response (200 OK):
{
"id": "018c7dbd-a000-7000-8000-abcdef123456",
"machine_id": "018c7dbd-c000-7000-8000-fedcba987654",
"kernel": {
"id": "018c7dbd-b100-7000-8000-123456789abc",
"args": ["console=tty0", "console=ttyS0", "ip=dhcp"]
},
"initrd": {
"id": "018c7dbd-b200-7000-8000-987654321fed"
}
}
Error Responses:
All error responses follow RFC 7807 Problem Details format (see ADR-0007) with Content-Type: application/problem+json.
404 Not Found - Machine not found or has no boot profile:
{
"type": "https://api.example.com/errors/boot-profile-not-found",
"title": "Boot Profile Not Found",
"status": 404,
"detail": "No boot profile found for machine 018c7dbd-c000-7000-8000-fedcba987654",
"instance": "/api/v1/boot/018c7dbd-c000-7000-8000-fedcba987654/profile",
"machine_id": "018c7dbd-c000-7000-8000-fedcba987654"
}
422 Unprocessable Entity - Validation error:
{
"type": "https://api.example.com/errors/file-too-large",
"title": "File Too Large",
"status": 422,
"detail": "Kernel file exceeds maximum allowed size of 100MB",
"instance": "/api/v1/boot/018c7dbd-c000-7000-8000-fedcba987654/profile",
"field": "kernel",
"file_size": 125829120,
"max_size": 104857600
}
1.7 - DELETE /api/v1/boot/{machine_id}/profile
Delete a machine’s boot profile and its associated blobs
Delete a machine’s boot profile and its associated blobs.
Sequence Diagram
sequenceDiagram
participant Client as Admin Client
participant Boot as Boot Service
participant Storage as Cloud Storage
participant DB as Firestore
Client->>Boot: DELETE /api/v1/boot/{machine_id}/profile
Boot->>DB: Get kernel_id and initrd_id
DB-->>Boot: Blob IDs
Boot->>Storage: DELETE gs://bucket/blobs/{kernel_id}
Boot->>Storage: DELETE gs://bucket/blobs/{initrd_id}
Boot->>DB: Delete boot profile
Boot-->>Client: 204 No ContentRequest
Path Parameters:
| Parameter | Type | Required | Description |
|---|
machine_id | string | Yes | Machine identifier (UUIDv7 format) |
Example Request:
DELETE /api/v1/boot/018c7dbd-c000-7000-8000-fedcba987654/profile HTTP/1.1
Host: boot.example.com
Response
Response (204 No Content):
Empty response body.
Error Responses:
All error responses follow RFC 7807 Problem Details format (see ADR-0007) with Content-Type: application/problem+json.
404 Not Found - Machine not found or has no boot profile:
{
"type": "https://api.example.com/errors/boot-profile-not-found",
"title": "Boot Profile Not Found",
"status": 404,
"detail": "No boot profile found for machine 018c7dbd-c000-7000-8000-fedcba987654",
"instance": "/api/v1/boot/018c7dbd-c000-7000-8000-fedcba987654/profile",
"machine_id": "018c7dbd-c000-7000-8000-fedcba987654"
}
1.8 - GET /health/startup
Startup probe endpoint for Cloud Run
Indicates whether the application has completed initialization and is ready to receive traffic.
Request
Request Example:
GET /health/startup HTTP/1.1
Host: boot.example.com
Response
Response (200 OK):
Empty response body with HTTP 200 status code.
Response (503 Service Unavailable):
Empty response body with HTTP 503 status code.
Response Headers:
Cache-Control: no-cache, no-store, must-revalidate
Startup Check Components
- Firestore Connection - Verifies database connectivity
- Cloud Storage Access - Validates access to boot image buckets
Cloud Run Configuration
startupProbe:
httpGet:
path: /health/startup
port: 8080
initialDelaySeconds: 0
timeoutSeconds: 30
periodSeconds: 10
failureThreshold: 3
Behavior
- Success (200): Application is fully initialized and ready to serve requests
- Failure (503): Application is still starting up or encountered initialization errors
- Timeout: After 30 seconds of no response, Cloud Run considers startup failed
Observability
Metrics:
health_check_total{probe="startup",status="ok"} - Successful startup checkshealth_check_total{probe="startup",status="error"} - Failed startup checkshealth_check_duration_ms{probe="startup"} - Startup check duration
Structured Logs:
{
"severity": "INFO",
"timestamp": "2025-11-19T06:00:00Z",
"message": "Health check completed",
"probe": "startup",
"status": "ok",
"duration_ms": 15
}
Alerts:
- Startup Failure: Alert if startup check fails for > 1 minute
Testing
Manual Testing
curl -v http://localhost:8080/health/startup
Automated Testing
func TestHealthStartup(t *testing.T) {
resp, err := http.Get("http://localhost:8080/health/startup")
require.NoError(t, err)
defer resp.Body.Close()
assert.Equal(t, http.StatusOK, resp.StatusCode)
}
Troubleshooting
Startup Check Never Succeeds
Symptoms:
- Container restarts repeatedly
- Cloud Run shows “unhealthy” status
- Startup probe returns 503
Debugging:
# Check Cloud Run logs for startup errors
gcloud logging read "resource.type=cloud_run_revision AND labels.service_name=boot-server" \
--limit 50 --format json | jq '.[] | select(.jsonPayload.probe=="startup")'
# Test locally with debug logging
DEBUG=true go run main.go
Common Causes:
- Firestore credentials not configured
- Cloud Storage bucket permissions missing
- Network connectivity issues
- Timeout too short for slow dependencies
1.9 - GET /health/liveness
Liveness probe endpoint for Cloud Run
Indicates whether the application is alive and healthy. Used by Cloud Run to detect and restart unhealthy instances.
Request
Request Example:
GET /health/liveness HTTP/1.1
Host: boot.example.com
Response
Response (200 OK):
Empty response body with HTTP 200 status code.
Response (503 Service Unavailable):
Empty response body with HTTP 503 status code.
Response Headers:
Cache-Control: no-cache, no-store, must-revalidate
Liveness Check Components
- HTTP Server Health - Verifies the HTTP server is responsive
- Basic health validation - Ensures the application can handle requests
Cloud Run Configuration
livenessProbe:
httpGet:
path: /health/liveness
port: 8080
initialDelaySeconds: 0
timeoutSeconds: 30
periodSeconds: 10
failureThreshold: 3
Behavior
- Success (200): Application is healthy and functioning normally
- Failure (503): Application is unhealthy and should be restarted
- Consecutive Failures: After 3 consecutive failures (30 seconds), Cloud Run restarts the instance
Graceful Degradation
The health check is designed with graceful degradation in mind:
- Critical Failures: Return 503 and trigger restart (e.g., database connection lost)
- Non-Critical Failures: Log warnings but return 200 (e.g., temporary Cloud Storage timeout)
- Transient Errors: Retry internally before reporting failure
Observability
Metrics:
health_check_total{probe="liveness",status="ok"} - Successful liveness checkshealth_check_total{probe="liveness",status="error"} - Failed liveness checkshealth_check_duration_ms{probe="liveness"} - Liveness check duration
Structured Logs:
{
"severity": "INFO",
"timestamp": "2025-11-19T06:00:00Z",
"message": "Health check completed",
"probe": "liveness",
"status": "ok",
"duration_ms": 15
}
Alerts:
- Liveness Failure: Alert if liveness check fails 3+ times consecutively
- High Restart Rate: Alert if container restarts > 3 times in 5 minutes
Testing
Manual Testing
curl -v http://localhost:8080/health/liveness
Load Testing
Health check endpoints should handle high request rates without degrading application performance:
- Target: 100 requests/second sustained
- Timeout: < 10ms average response time
- Resource Impact: < 1% CPU, < 10MB memory overhead
Troubleshooting
Liveness Check Intermittent Failures
Symptoms:
- Occasional container restarts
- Liveness probe returns 503 sporadically
- High request latency
Debugging:
# Check error rate in last 5 minutes
gcloud monitoring time-series list \
--filter='metric.type="custom.googleapis.com/health_check_total" AND metric.labels.status="error"' \
--interval-start-time="5 minutes ago"
# Check for resource exhaustion (Cloud Run)
gcloud run services describe boot-server --region=<region> --format=json | jq '.status'
Common Causes:
- Database connection pool exhausted
- Memory pressure triggering GC pauses
- High request volume overwhelming server
- Dependency timeouts
Security Considerations
Unauthenticated Access
Health check endpoints are intentionally unauthenticated to allow Cloud Run infrastructure to probe without credentials. This is safe because:
- Endpoints return only HTTP status codes (no response body)
- No sensitive data is returned
- Rate limiting prevents abuse
- Endpoints are read-only
Health checks return only HTTP status codes with no response body, ensuring:
- No internal IP addresses disclosed
- No error messages or stack traces exposed
- No database connection strings revealed
- No API keys or secrets leaked
Detailed diagnostics are logged internally (not returned in response):
{
"severity": "ERROR",
"message": "Firestore connection failed",
"error": "rpc error: code = PermissionDenied desc = Missing or insufficient permissions"
}
2 - Machine Service
Service for managing machine hardware profiles
The Machine Service is a REST API that manages machine hardware profiles for the network boot infrastructure. It stores machine specifications (CPUs, memory, NICs, drives, accelerators) in Firestore and is queried by the Boot Service during boot operations and by administrators for configuration management.
Architecture
The service is responsible for:
- Machine Profile Management: Creating, listing, retrieving, updating, and deleting machine hardware profiles
- Hardware Specification Storage: Storing detailed hardware specifications in Firestore
- Machine Lookup: Providing machine profile queries by ID or NIC MAC address
Components
- Firestore: Stores machine hardware profiles
- REST API: HTTP endpoints for machine profile management
Clients
The service is consumed by:
- Boot Service: Queries machine profiles by MAC address during boot operations
- Admin Tools: CLI or web interfaces for managing machine inventory
- Monitoring Systems: Hardware inventory and asset management tools
Deployment
- Platform: GCP Cloud Run
- Scaling: Automatic scaling based on request load
- Availability: Min instances = 1 for low-latency responses
- Region: Same region as Boot Service for minimal latency
API Endpoints
Machine Management
Rate Limiting
Admin API endpoints are rate-limited to prevent abuse:
- Per User/Service Account: 100 requests/minute
- Per IP Address: 300 requests/minute
- Global: 1000 requests/minute
Rate limit headers are included in responses:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1700000000
When rate limit is exceeded, API returns 429 Too Many Requests using RFC 7807 Problem Details format (see ADR-0007):
{
"type": "https://api.example.com/errors/rate-limit-exceeded",
"title": "Rate Limit Exceeded",
"status": 429,
"detail": "Rate limit exceeded. Try again in 30 seconds.",
"instance": "/api/v1/machines",
"retry_after": 30
}
All error responses use Content-Type: application/problem+json.
Versioning
The Admin API uses URL versioning (/api/v1/):
- Current Version: v1
- Deprecation Policy: Minimum 6 months notice before version deprecation
- Version Header:
X-API-Version: v1 included in all responses
2.1 - POST /api/v1/machines
Register a new machine with hardware specifications
Register a new machine with hardware specifications.
Sequence Diagram
sequenceDiagram
participant Client as Admin Client
participant API as Machine Service
participant DB as Firestore
Client->>API: POST /api/v1/machines
API->>API: Generate machine id (UUIDv7)
API->>API: Validate machine profile
API->>DB: Insert machine profile
DB-->>API: Machine created
API-->>Client: 201 Created (machine id)Request
Request Body:
{
"cpus": [
{
"manufacturer": "Intel",
"clock_frequency": 2400000000,
"cores": 8
}
],
"memory_modules": [
{
"size": 17179869184
},
{
"size": 17179869184
}
],
"accelerators": [],
"nics": [
{
"mac": "52:54:00:12:34:56"
}
],
"drives": [
{
"capacity": 500107862016
}
]
}
Response
Response (201 Created):
{
"id": "018c7dbd-c000-7000-8000-fedcba987654"
}
Error Responses:
All error responses follow RFC 7807 Problem Details format (see ADR-0007) with Content-Type: application/problem+json.
400 Bad Request - Invalid request body or missing required fields:
{
"type": "https://api.example.com/errors/validation-error",
"title": "Validation Error",
"status": 400,
"detail": "The request body failed validation",
"instance": "/api/v1/machines",
"invalid_fields": [
{
"field": "nics",
"reason": "at least one NIC is required"
}
]
}
409 Conflict - Machine with the same NIC MAC address already exists:
{
"type": "https://api.example.com/errors/duplicate-mac-address",
"title": "Duplicate MAC Address",
"status": 409,
"detail": "A machine with MAC address 52:54:00:12:34:56 already exists",
"instance": "/api/v1/machines",
"mac_address": "52:54:00:12:34:56",
"existing_machine_id": "018c7dbd-a000-7000-8000-fedcba987650"
}
Notes
- The machine ID is generated server-side (UUIDv7)
- MAC addresses must be unique across all machines
- All size/capacity values are in bytes
- Clock frequency is in hertz
Data Models
All data models are defined as Protocol Buffer (protobuf) messages and stored in Firestore.
Machine
syntax = "proto3";
message CPU {
string manufacturer = 1;
int64 clock_frequency = 2; // measured in hertz
int64 cores = 3; // number of cores
}
message MemoryModule {
int64 size = 1; // measured in bytes
}
message Accelerator {
string manufacturer = 1;
}
message NIC {
string mac = 1; // mac address
}
message Drive {
int64 capacity = 1; // capacity in bytes
}
message Machine {
string id = 1; // UUIDv7 machine identifier
repeated CPU cpus = 2;
repeated MemoryModule memory_modules = 3;
repeated Accelerator accelerators = 4;
repeated NIC nics = 5;
repeated Drive drives = 6;
}
2.2 - GET /api/v1/machines
List all registered machines
List all registered machines with optional filtering by MAC address.
Sequence Diagram
sequenceDiagram
participant Client as Admin Client
participant API as Machine Service
participant DB as Firestore
Client->>API: GET /api/v1/machines?mac=...
API->>DB: Query machines with filters
DB-->>API: Machine list
API-->>Client: 200 OK (machines list)Request
Query Parameters:
| Parameter | Type | Required | Description | Default |
|---|
page | integer | No | Page number (1-indexed) | 1 |
per_page | integer | No | Results per page (1-100) | 20 |
mac | string | No | Filter by NIC MAC address | - |
Example Request:
GET /api/v1/machines?page=1&per_page=20 HTTP/1.1
Host: machine.example.com
Example Request with MAC filter:
GET /api/v1/machines?mac=52:54:00:12:34:56 HTTP/1.1
Host: machine.example.com
Response
Response (200 OK):
{
"machines": [
{
"id": "018c7dbd-c000-7000-8000-fedcba987654",
"cpus": [
{
"manufacturer": "Intel",
"clock_frequency": 2400000000,
"cores": 8
}
],
"memory_modules": [
{
"size": 17179869184
}
],
"accelerators": [],
"nics": [
{
"mac": "52:54:00:12:34:56"
}
],
"drives": [
{
"capacity": 500107862016
}
]
}
],
"pagination": {
"total": 1,
"page": 1,
"per_page": 20,
"total_pages": 1
}
}
2.3 - GET /api/v1/machines/{id}
Retrieve a specific machine by ID
Retrieve a specific machine by ID.
Sequence Diagram
sequenceDiagram
participant Client as Admin Client
participant API as Machine Service
participant DB as Firestore
Client->>API: GET /api/v1/machines/{id}
API->>DB: Query machine by ID
DB-->>API: Machine profile
API-->>Client: 200 OK (machine profile)Request
Path Parameters:
| Parameter | Type | Required | Description |
|---|
id | string | Yes | Machine identifier (UUIDv7 format) |
Example Request:
GET /api/v1/machines/018c7dbd-c000-7000-8000-fedcba987654 HTTP/1.1
Host: machine.example.com
Response
Response (200 OK):
{
"id": "018c7dbd-c000-7000-8000-fedcba987654",
"cpus": [
{
"manufacturer": "Intel",
"clock_frequency": 2400000000,
"cores": 8
}
],
"memory_modules": [
{
"size": 17179869184
},
{
"size": 17179869184
}
],
"accelerators": [],
"nics": [
{
"mac": "52:54:00:12:34:56"
}
],
"drives": [
{
"capacity": 500107862016
}
]
}
Error Responses:
All error responses follow RFC 7807 Problem Details format (see ADR-0007) with Content-Type: application/problem+json.
404 Not Found - Machine with specified ID not found:
{
"type": "https://api.example.com/errors/machine-not-found",
"title": "Machine Not Found",
"status": 404,
"detail": "Machine with ID 018c7dbd-c000-7000-8000-fedcba987654 not found",
"instance": "/api/v1/machines/018c7dbd-c000-7000-8000-fedcba987654",
"machine_id": "018c7dbd-c000-7000-8000-fedcba987654"
}
2.4 - PUT /api/v1/machines/{id}
Update a machine’s hardware profile
Update a machine’s hardware profile.
Sequence Diagram
sequenceDiagram
participant Client as Admin Client
participant API as Machine Service
participant DB as Firestore
Client->>API: PUT /api/v1/machines/{id}
API->>DB: Update machine profile
DB-->>API: Machine updated
API-->>Client: 200 OK (updated profile)Request
Path Parameters:
| Parameter | Type | Required | Description |
|---|
id | string | Yes | Machine identifier (UUIDv7 format) |
Request Body:
Full machine profile (same structure as POST /api/v1/machines):
{
"cpus": [
{
"manufacturer": "Intel",
"clock_frequency": 2400000000,
"cores": 8
}
],
"memory_modules": [
{
"size": 17179869184
},
{
"size": 17179869184
}
],
"accelerators": [],
"nics": [
{
"mac": "52:54:00:12:34:56"
}
],
"drives": [
{
"capacity": 500107862016
}
]
}
Response
Response (200 OK):
Full machine profile with updated fields:
{
"id": "018c7dbd-c000-7000-8000-fedcba987654",
"cpus": [
{
"manufacturer": "Intel",
"clock_frequency": 2400000000,
"cores": 8
}
],
"memory_modules": [
{
"size": 17179869184
},
{
"size": 17179869184
}
],
"accelerators": [],
"nics": [
{
"mac": "52:54:00:12:34:56"
}
],
"drives": [
{
"capacity": 500107862016
}
]
}
Error Responses:
All error responses follow RFC 7807 Problem Details format (see ADR-0007) with Content-Type: application/problem+json.
400 Bad Request - Invalid request body:
{
"type": "https://api.example.com/errors/validation-error",
"title": "Validation Error",
"status": 400,
"detail": "The request body failed validation",
"instance": "/api/v1/machines/018c7dbd-c000-7000-8000-fedcba987654",
"invalid_fields": [
{
"field": "nics",
"reason": "at least one NIC is required"
}
]
}
404 Not Found - Machine with specified ID not found:
{
"type": "https://api.example.com/errors/machine-not-found",
"title": "Machine Not Found",
"status": 404,
"detail": "Machine with ID 018c7dbd-c000-7000-8000-fedcba987654 not found",
"instance": "/api/v1/machines/018c7dbd-c000-7000-8000-fedcba987654",
"machine_id": "018c7dbd-c000-7000-8000-fedcba987654"
}
2.5 - DELETE /api/v1/machines/{id}
Delete a machine registration
Delete a machine registration.
Sequence Diagram
sequenceDiagram
participant Client as Admin Client
participant API as Machine Service
participant DB as Firestore
Client->>API: DELETE /api/v1/machines/{id}
API->>DB: Delete machine by ID
DB-->>API: Machine deleted
API-->>Client: 204 No ContentRequest
Path Parameters:
| Parameter | Type | Required | Description |
|---|
id | string | Yes | Machine identifier (UUIDv7 format) |
Example Request:
DELETE /api/v1/machines/018c7dbd-c000-7000-8000-fedcba987654 HTTP/1.1
Host: machine.example.com
Response
Response (204 No Content):
Empty response body.
Error Responses:
All error responses follow RFC 7807 Problem Details format (see ADR-0007) with Content-Type: application/problem+json.
404 Not Found - Machine with specified ID not found:
{
"type": "https://api.example.com/errors/machine-not-found",
"title": "Machine Not Found",
"status": 404,
"detail": "Machine with ID 018c7dbd-c000-7000-8000-fedcba987654 not found",
"instance": "/api/v1/machines/018c7dbd-c000-7000-8000-fedcba987654",
"machine_id": "018c7dbd-c000-7000-8000-fedcba987654"
}
2.6 - GET /health/startup
Startup probe endpoint for Cloud Run
Indicates whether the application has completed initialization and is ready to receive traffic.
Request
Request Example:
GET /health/startup HTTP/1.1
Host: machine.example.com
Response
Response (200 OK):
Empty response body with HTTP 200 status code.
Response (503 Service Unavailable):
Empty response body with HTTP 503 status code.
Response Headers:
Cache-Control: no-cache, no-store, must-revalidate
Startup Check Components
- Firestore Connection - Verifies database connectivity
- Machine Management Service Readiness - Validates service initialization
Cloud Run Configuration
startupProbe:
httpGet:
path: /health/startup
port: 8080
initialDelaySeconds: 0
timeoutSeconds: 30
periodSeconds: 10
failureThreshold: 3
Behavior
- Success (200): Application is fully initialized and ready to serve requests
- Failure (503): Application is still starting up or encountered initialization errors
- Timeout: After 30 seconds of no response, Cloud Run considers startup failed
Observability
Metrics:
health_check_total{probe="startup",status="ok"} - Successful startup checkshealth_check_total{probe="startup",status="error"} - Failed startup checkshealth_check_duration_ms{probe="startup"} - Startup check duration
Structured Logs:
{
"severity": "INFO",
"timestamp": "2025-11-24T03:19:00Z",
"message": "Health check completed",
"probe": "startup",
"status": "ok",
"duration_ms": 15
}
Alerts:
- Startup Failure: Alert if startup check fails for > 1 minute
Testing
Manual Testing
curl -v http://localhost:8080/health/startup
Automated Testing
func TestHealthStartup(t *testing.T) {
resp, err := http.Get("http://localhost:8080/health/startup")
require.NoError(t, err)
defer resp.Body.Close()
assert.Equal(t, http.StatusOK, resp.StatusCode)
}
Troubleshooting
Startup Check Never Succeeds
Symptoms:
- Container restarts repeatedly
- Cloud Run shows “unhealthy” status
- Startup probe returns 503
Debugging:
# Check Cloud Run logs for startup errors
gcloud logging read "resource.type=cloud_run_revision AND labels.service_name=machine-service" \
--limit 50 --format json | jq '.[] | select(.jsonPayload.probe=="startup")'
# Test locally with debug logging
DEBUG=true go run main.go
Common Causes:
- Firestore credentials not configured
- Network connectivity issues
- Timeout too short for slow dependencies
2.7 - GET /health/liveness
Liveness probe endpoint for Cloud Run
Indicates whether the application is alive and healthy. Used by Cloud Run to detect and restart unhealthy instances.
Request
Request Example:
GET /health/liveness HTTP/1.1
Host: machine.example.com
Response
Response (200 OK):
Empty response body with HTTP 200 status code.
Response (503 Service Unavailable):
Empty response body with HTTP 503 status code.
Response Headers:
Cache-Control: no-cache, no-store, must-revalidate
Liveness Check Components
- HTTP Server Health - Verifies the HTTP server is responsive
- Basic health validation - Ensures the application can handle requests
Cloud Run Configuration
livenessProbe:
httpGet:
path: /health/liveness
port: 8080
initialDelaySeconds: 0
timeoutSeconds: 30
periodSeconds: 10
failureThreshold: 3
Behavior
- Success (200): Application is healthy and functioning normally
- Failure (503): Application is unhealthy and should be restarted
- Consecutive Failures: After 3 consecutive failures (30 seconds), Cloud Run restarts the instance
Graceful Degradation
The health check is designed with graceful degradation in mind:
- Critical Failures: Return 503 and trigger restart (e.g., database connection lost)
- Non-Critical Failures: Log warnings but return 200 (e.g., temporary Firestore timeout)
- Transient Errors: Retry internally before reporting failure
Observability
Metrics:
health_check_total{probe="liveness",status="ok"} - Successful liveness checkshealth_check_total{probe="liveness",status="error"} - Failed liveness checkshealth_check_duration_ms{probe="liveness"} - Liveness check duration
Structured Logs:
{
"severity": "INFO",
"timestamp": "2025-11-24T03:19:00Z",
"message": "Health check completed",
"probe": "liveness",
"status": "ok",
"duration_ms": 15
}
Alerts:
- Liveness Failure: Alert if liveness check fails 3+ times consecutively
- High Restart Rate: Alert if container restarts > 3 times in 5 minutes
Testing
Manual Testing
curl -v http://localhost:8080/health/liveness
Load Testing
Health check endpoints should handle high request rates without degrading application performance:
- Target: 100 requests/second sustained
- Timeout: < 10ms average response time
- Resource Impact: < 1% CPU, < 10MB memory overhead
Troubleshooting
Liveness Check Intermittent Failures
Symptoms:
- Occasional container restarts
- Liveness probe returns 503 sporadically
- High request latency
Debugging:
# Check error rate in last 5 minutes
gcloud monitoring time-series list \
--filter='metric.type="custom.googleapis.com/health_check_total" AND metric.labels.status="error"' \
--interval-start-time="5 minutes ago"
# Check for resource exhaustion (Cloud Run)
gcloud run services describe machine-service --region=<region> --format=json | jq '.status'
Common Causes:
- Database connection pool exhausted
- Memory pressure triggering GC pauses
- High request volume overwhelming server
- Dependency timeouts
Security Considerations
Unauthenticated Access
Health check endpoints are intentionally unauthenticated to allow Cloud Run infrastructure to probe without credentials. This is safe because:
- Endpoints return only HTTP status codes (no response body)
- No sensitive data is returned
- Rate limiting prevents abuse
- Endpoints are read-only
Health checks return only HTTP status codes with no response body, ensuring:
- No internal IP addresses disclosed
- No error messages or stack traces exposed
- No database connection strings revealed
- No API keys or secrets leaked
Detailed diagnostics are logged internally (not returned in response):
{
"severity": "ERROR",
"message": "Firestore connection failed",
"error": "rpc error: code = PermissionDenied desc = Missing or insufficient permissions"
}