This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Resource Identifiers Analysis

This analysis compares three popular distributed identifier strategies for use in modern systems: UUID (particularly v4 and v7), ULID, and Snowflake ID. The comparison focuses on three critical aspects:

  1. URI Safety: Can they be used directly in URLs without encoding?
  2. Database Performance: Storage size and index performance implications
  3. Generation Model: Centralized vs decentralized generation

Quick Comparison Table

AspectUUID v4UUID v7ULIDSnowflake ID
Size (binary)16 bytes16 bytes16 bytes8 bytes
Size (string)36 chars36 chars26 chars18-19 chars
URI Safe✅ Yes✅ Yes✅ Yes✅ Yes
Time-Ordered❌ No✅ Yes✅ Yes✅ Yes
Decentralized✅ Yes✅ Yes✅ Yes⚠️ Mostly
Index Performance⚠️ Poor✅ Good✅ Good✅ Excellent
Standardized✅ RFC 9562✅ RFC 9562❌ Spec only❌ Pattern
Database Support✅ Native🆕 Limited❌ Custom❌ Custom

Detailed Analyses

Decision Guide

Use UUID v7 when:

  • ✅ RFC standardization is important
  • ✅ Native database support is desired (PostgreSQL 18+)
  • ✅ You need URN compatibility (urn:uuid:...)
  • ✅ You want official vendor support and tooling

Use ULID when:

  • ✅ Human readability is valued (Crockford Base32)
  • ✅ You prefer compact string representation (26 vs 36 chars)
  • ✅ Lexicographic sorting is important
  • ✅ You want case-insensitive identifiers

Use Snowflake ID when:

  • ✅ Storage efficiency is critical (8 vs 16 bytes)
  • ✅ Numeric IDs are required
  • ✅ You can manage worker ID allocation
  • ✅ Maximum database performance is needed
  • ✅ You have a fixed number of generator nodes (<1024)

Use UUID v4 when:

  • ✅ Maximum randomness is required
  • ✅ Session tokens or one-time IDs
  • ✅ Time ordering is unimportant
  • ❌ Avoid for database primary keys

Modern Recommendations (2024-2025)

For new projects with database primary keys:

  1. First choice: UUID v7 or ULID

    • Both offer excellent performance with time-ordering
    • UUID v7: Better standardization and tooling
    • ULID: Better readability and compact format
  2. Storage-constrained systems: Snowflake ID

    • 50% smaller than UUID/ULID
    • Best database performance
    • Requires worker ID coordination
  3. Legacy compatibility: UUID v4

    • Only if required by existing systems
    • Significant performance penalty for databases

Avoid entirely:

  • UUID v1: Privacy concerns (leaks MAC address)
  • UUID v6: Superseded by v7
  • Auto-increment integers: Not distributed-system safe

Key Insights

URI Safety

All three identifier types are completely safe for direct use in URIs without percent-encoding:

  • UUID: Hexadecimal + hyphens (RFC 3986 unreserved characters)
  • ULID: Crockford Base32 alphabet (no confusing characters)
  • Snowflake: Decimal integers (0-9 only)

Database Performance

The critical factor is sequential vs random insertion:

Random insertion (UUID v4):

  • Causes B-tree page splits throughout the index
  • Results in fragmentation and bloat
  • Poor cache utilization
  • 2-5× slower than sequential

Sequential insertion (UUID v7, ULID, Snowflake):

  • Appends to end of B-tree
  • Minimal page splits
  • Better cache locality
  • Comparable to auto-increment integers

Storage comparison:

Snowflake ID:  8 bytes  (baseline)
UUID/ULID:    16 bytes  (2× larger)
UUID string:  36 bytes  (4.5× larger)

Generation Models

Fully decentralized (no coordination):

  • UUID v4: Pure randomness
  • UUID v7: Timestamp + randomness
  • ULID: Timestamp + randomness

Minimal coordination (worker ID only):

  • Snowflake ID: Requires unique worker ID per generator
    • One-time configuration
    • Supports 1,024 workers (10 bits)
    • Challenge: Auto-scaling environments

Performance Benchmarks

From recent studies (2024-2025):

PostgreSQL INSERT operations:

  • Snowflake ID: ~34,000 ops/sec
  • UUID v7: ~34,000 ops/sec (33% faster than v4)
  • ULID: ~34,000 ops/sec (comparable to v7)
  • UUID v4: ~25,000 ops/sec

Index fragmentation (PostgreSQL):

  • UUID v4: 85% larger indexes, 54% larger tables
  • UUID v7/ULID: Minimal fragmentation
  • Snowflake: Minimal fragmentation, 50% smaller indexes

Write-Ahead Log (WAL) generation:

  • UUID v7: 50% reduction vs UUID v4
  • Sequential IDs reduce database write amplification

Collision Resistance

All three approaches provide exceptional collision resistance:

UUID v4:

  • 122 bits of randomness
  • Need ~2.7 × 10¹⁸ IDs for 50% collision probability

UUID v7:

  • 48-bit timestamp + 74-bit random
  • Negligible collision risk even at millions per millisecond

ULID:

  • 48-bit timestamp + 80-bit random
  • 1.21 × 10²⁴ unique IDs per millisecond possible

Snowflake ID:

  • Mathematical uniqueness guarantee
  • No collisions possible if worker IDs are unique
  • 4,096 IDs per millisecond per worker

Implementation Considerations

PostgreSQL

-- UUID v7 (PostgreSQL 18+)
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_uuid_v7()
);

-- UUID v7 (PostgreSQL <18 with extension)
CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- Use custom function or library

-- ULID (custom type or text/bytea)
CREATE TABLE users (
    id BYTEA PRIMARY KEY DEFAULT ulid_generate()
);

-- Snowflake (bigint)
CREATE TABLE users (
    id BIGINT PRIMARY KEY DEFAULT snowflake_generate()
);

MySQL

-- UUID (binary storage recommended)
CREATE TABLE users (
    id BINARY(16) PRIMARY KEY DEFAULT (UUID_TO_BIN(UUID()))
);

-- Snowflake
CREATE TABLE users (
    id BIGINT PRIMARY KEY
);

Application-Level Generation

Advantages:

  • No database dependency
  • Works with any database
  • Consistent across different storage systems
  • Better control over implementation

Disadvantages:

  • Requires library/code maintenance
  • Clock synchronization considerations
  • Worker ID management (Snowflake only)

Migration Strategies

Moving from Auto-Increment

Considerations:

  • Foreign key updates required
  • Index rebuilds may be needed
  • Application code changes
  • Dual-write period during migration

Recommended approach:

  1. Add new ID column alongside existing
  2. Generate IDs for existing rows
  3. Update foreign keys progressively
  4. Migrate application code
  5. Remove old ID column

Moving from UUID v4 to v7/ULID

Benefits:

  • Same storage size (16 bytes)
  • Can keep existing IDs
  • Only new records use v7/ULID
  • Gradual performance improvement

Security Considerations

Information Leakage

UUID v4:

  • ✅ Reveals nothing (pure randomness)

UUID v7 / ULID:

  • ⚠️ Reveals creation timestamp (usually acceptable)
  • ⚠️ May reveal approximate volume (via sequence patterns)

Snowflake ID:

  • ⚠️ Reveals exact creation time (41-bit timestamp)
  • ⚠️ Reveals which worker generated it
  • ⚠️ Reveals sequence count within millisecond

Enumeration Attacks

Random IDs (UUID v4):

  • ✅ Resistant to enumeration
  • Guessing next ID is infeasible

Sequential IDs (v7, ULID, Snowflake):

  • ⚠️ Predictable patterns
  • Can estimate next ID value
  • Mitigation: Use authentication/authorization, don’t rely on ID secrecy

Recommendation

Never rely on ID unpredictability as a security mechanism. Always use proper authentication and authorization regardless of ID type.

Conclusion

The landscape of distributed identifiers has evolved significantly:

2010-2020: UUID v4 was the default distributed identifier despite performance issues

2020-2024: Community alternatives (ULID, Snowflake) gained popularity for performance

2024+: UUID v7 (RFC 9562) provides standardized time-ordered IDs with vendor support

For most modern applications, UUID v7 or ULID represent the optimal balance of performance, standardization, and operational simplicity. Snowflake IDs remain compelling for storage-constrained systems where the 8-byte size and numeric format provide tangible benefits.

The days of suffering UUID v4’s random insertion penalty for database primary keys are over—time-ordered identifiers are now the recommended default.

1 - Universally Unique Identifier (UUID) Analysis

Overview

UUIDs are 128-bit identifiers standardized in RFC 9562 (May 2024), which obsoletes the previous RFC 4122. The latest specification introduces three new versions (v6, v7, v8) while maintaining backward compatibility with existing versions.

URI Safety

✅ Fully URI-Safe

UUIDs are inherently safe for use in URIs without any encoding required.

Standard format:

550e8400-e29b-41d4-a716-446655440000

Characteristics:

  • 36 characters: 32 hexadecimal digits + 4 hyphens
  • Character set: a-f, 0-9, -
  • All characters are in RFC 3986 §2.3 unreserved set
  • Case-insensitive (lowercase recommended per RFC 9562)

Usage in URIs:

/api/users/550e8400-e29b-41d4-a716-446655440000
?id=550e8400-e29b-41d4-a716-446655440000
urn:uuid:550e8400-e29b-41d4-a716-446655440000

Alternative encodings:

  • Base64 URL-safe: 22 characters (optimization, not required)
  • Base62: Similar length, avoids + and /
  • These are for compactness, not safety

Database Storage and Performance

Storage Size

Binary format:

  • 16 bytes (128 bits) - canonical storage format
  • Defined in RFC 9562

String format:

  • 36 characters (CHAR(36))
  • Actual storage: 36-40 bytes depending on database encoding

Storage comparison:

FormatSizeOverhead
Binary (BINARY(16))16 bytesbaseline
String (CHAR(36))36 bytes2.25×
String (VARCHAR(36))38-40 bytes~2.5×

Database-Specific Implementations

PostgreSQL:

-- Use native UUID type (16 bytes internally)
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid()
);

-- PostgreSQL 18+ supports UUIDv7
CREATE TABLE posts (
    id UUID PRIMARY KEY DEFAULT gen_uuid_v7()
);

Performance impact:

  • Native UUID type: 16 bytes
  • Text storage: Tables 54% larger, indexes 85% larger

MySQL:

-- Use BINARY(16) with conversion functions
CREATE TABLE users (
    id BINARY(16) PRIMARY KEY DEFAULT (UUID_TO_BIN(UUID()))
);

-- Retrieve with conversion
SELECT BIN_TO_UUID(id) as id FROM users;

SQL Server:

CREATE TABLE users (
    id UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWSEQUENTIALID()
);
  • Note: NEWSEQUENTIALID() generates sequential UUIDs, not NEWID() which is random

Index Performance

The UUID v4 Problem

Random insertion issues:

  1. Page splits: New UUIDs insert at arbitrary positions in B-tree
  2. Fragmentation: Index becomes scattered across non-contiguous pages
  3. Wasted space: Page splits leave gaps throughout index
  4. Cache inefficiency: Poor locality leads to more cache misses
  5. Write amplification: More disk I/O per insert

Measured impact:

  • Constant page splits during INSERT operations
  • Index bloat (more pages for same data)
  • 2-5× slower than sequential IDs
  • Degraded SELECT performance

The UUID v7 Solution

Sequential insertion benefits:

  1. Append-only writes: New entries go to end of index
  2. Minimal page splits: Only last page splits when full
  3. Low fragmentation: Index remains mostly contiguous
  4. Better caching: Sequential access patterns
  5. Reduced I/O: Fewer disk operations

Measured improvements:

  • 2-5× faster insert performance vs v4
  • 50% reduction in Write-Ahead Log (WAL) rate
  • Fewer page splits comparable to auto-increment
  • Better storage efficiency

Binary vs String Storage

Index size comparison (PostgreSQL):

Storage TypeTable SizeIndex Size
Binary (UUID)100% (baseline)100% (baseline)
String (TEXT)154%185%

Why binary is faster:

  • Smaller indexes (fewer pages)
  • Better cache utilization
  • Faster CPU comparisons (128-bit integers)
  • Reduced I/O (less data transfer)

Generation Approach

✅ Fully Decentralized

One of UUID’s core design goals is decentralized generation without coordination. Multiple systems can generate UUIDs independently without collision risk.

UUID Version Comparison

UUID v1 - Time-based + MAC Address

Structure:

Timestamp (60 bits) + Clock Sequence (14 bits) + MAC Address (48 bits)

Generation:

  • Timestamp: 100-nanosecond intervals since Oct 15, 1582
  • Node ID: System’s MAC address
  • Clock sequence: Random value to prevent duplicates

Pros:

  • Sequential (sorts chronologically)
  • Very low collision risk
  • Decentralized

Cons:

  • Privacy concern: Leaks MAC address (physical location)
  • ❌ Timestamp not in sortable byte order
  • ❌ Modern systems avoid for security reasons

Use case: Legacy systems only (prefer v7)

UUID v4 - Random

Structure:

122 random bits + 6 version/variant bits

Generation:

  • Entirely random (cryptographically secure RNG recommended)
  • No coordination needed
  • No sequential ordering

Pros:

  • ✅ Maximum privacy (no identifying information)
  • ✅ Simplest to generate
  • ✅ Works offline
  • ✅ Truly decentralized

Cons:

  • Poor database performance: Random insertion causes fragmentation
  • ❌ No time information
  • ❌ Higher collision probability (still astronomically low)

Collision probability:

  • 122 bits of entropy
  • Need ~2.7 × 10¹⁸ UUIDs for 50% collision chance
  • In practice: negligible

Use cases:

  • Session IDs
  • One-time tokens
  • Non-database identifiers
  • When pure randomness is desired

UUID v6 - Reordered Time-based

Structure:

Timestamp (60 bits, big-endian) + Clock Sequence + Node ID

Generation:

  • Like v1 but timestamp bytes reordered for sorting
  • Maintains MAC address (privacy concern)

Pros:

  • Sortable (better than v1)
  • Sequential insertion performance

Cons:

  • ❌ Still leaks MAC address
  • Superseded by v7: RFC 9562 recommends v7

Use case: None - v7 is better

Structure:

Unix Timestamp (48 bits, millisecond) + Random (74 bits)

Generation:

  • Top 48 bits: Unix epoch milliseconds
  • Bottom 74 bits: Random data
  • No MAC address
  • Monotonically increasing

Pros:

  • Excellent database performance: Sequential inserts
  • Privacy-preserving: No MAC address
  • Sortable: Natural time ordering
  • Decentralized: No coordination needed
  • Random component: Prevents collisions from multiple nodes

Performance measured:

  • 2-5× faster inserts than v4
  • 50% reduction in WAL rate
  • Minimal page splits
  • Better cache locality

Cons:

  • ⚠️ Exposes creation timestamp (usually acceptable)
  • Slightly more complex than v4

Use cases:

  • Database primary keys (optimal choice)
  • Distributed systems
  • Event IDs with time ordering
  • Modern applications (default recommendation)

Decentralization Requirements

No central service required for any version:

// Example: Independent generation
// Node A
uuid1 := uuid.NewV7() // 0191e1a6-8b2c-7890-abcd-123456789abc

// Node B (same time)
uuid2 := uuid.NewV7() // 0191e1a6-8b2c-7890-xyz1-987654321def

How v7 avoids collisions:

  1. Time component: Millisecond precision provides separation
  2. Random component: 74 bits prevents same-millisecond collisions
  3. No coordination: Each node generates independently

Collision risk (UUID v7):

  • Within same millisecond: 2⁷⁴ unique values possible
  • Even at 1 billion IDs per millisecond: negligible collision risk

Version Selection Guide

┌─────────────────────────────────────────────────────┐
│ Which UUID Version?                                  │
├─────────────────────────────────────────────────────┤
│                                                      │
│  Database Primary Key? ──YES──> UUID v7            │
│         │                                            │
│         NO                                           │
│         │                                            │
│  Need time ordering? ──YES──> UUID v7              │
│         │                                            │
│         NO                                           │
│         │                                            │
│  Need pure randomness? ──YES──> UUID v4            │
│                                                      │
│  ❌ Avoid: v1 (privacy), v6 (superseded)           │
└─────────────────────────────────────────────────────┘

Go Library Support

✅ Official Google UUID Library

The most widely-used Go library for UUIDs is github.com/google/uuid, which provides full support for UUID versions 1, 3, 4, 5, 6, and 7.

Installation:

go get github.com/google/uuid

Usage examples:

import "github.com/google/uuid"

// Generate UUID v4 (random)
id := uuid.New()
fmt.Println(id.String()) // e.g., 550e8400-e29b-41d4-a716-446655440000

// Generate UUID v7 (time-ordered, recommended for databases)
id := uuid.Must(uuid.NewV7())
fmt.Println(id.String()) // e.g., 0191e1a6-8b2c-7890-abcd-123456789abc

// Parse existing UUID
parsed, err := uuid.Parse("550e8400-e29b-41d4-a716-446655440000")
if err != nil {
    log.Fatal(err)
}

Modern Recommendations (2024-2025)

For new projects:

  1. Default choice: UUID v7

    • Best performance
    • Decentralized generation
    • No privacy concerns
    • Sortable
  2. Special cases: UUID v4

    • Explicit randomness needed
    • Non-database contexts
    • Legacy compatibility
  3. Avoid: v1, v6

    • v1: Privacy issues (MAC address)
    • v6: v7 is better in every way

Recent Developments

RFC 9562 (May 2024)

  • Obsoletes RFC 4122
  • Introduces v6, v7, v8
  • Recommends v7 for database keys

PostgreSQL 18 (2025)

  • Native gen_uuid_v7() function
  • Solves B-tree fragmentation
  • Built-in time-ordered UUID generation

Industry Adoption

  • Buildkite: “Goodbye to sequential integers, hello UUIDv7”
  • Cloud providers adding native support
  • Database vendors implementing optimizations

Summary

AspectUUID v4UUID v7
Storage16 bytes binary16 bytes binary
GenerationFully randomTime + random
Decentralized✅ Yes✅ Yes
Coordination❌ No❌ No
URI safe✅ Yes✅ Yes
DB inserts⚠️ Slow (random)✅ Fast (sequential)
Fragmentation⚠️ High✅ Low
Page splits⚠️ Frequent✅ Minimal
Sortable❌ No✅ Yes (by time)
Privacy✅ Maximum✅ Good
Best forTokens, session IDsDatabase keys

Key Takeaways

  1. Always use binary storage in databases (16 bytes vs 36-40 bytes)
  2. UUID v7 is the modern default for database primary keys
  3. UUID v4 still useful for session tokens and random IDs
  4. No coordination required - all versions are fully decentralized
  5. URI-safe by design - use directly in URLs without encoding
  6. RFC standardized - wide vendor support and tooling available

2 - Universally Unique Lexicographically Sortable Identifier (ULID) Analysis

Overview

ULID is a community-driven specification for unique identifiers that combine the decentralized generation of UUIDs with the performance benefits of time-ordered sequential IDs. Created as an alternative to UUID v4’s poor database performance, ULID predates UUID v7 but shares similar design goals.

URI Safety

✅ Completely URI-Safe

ULIDs are designed with URI usage as a primary consideration.

Character set:

  • Uses Crockford’s Base32 alphabet
  • Characters: 0123456789ABCDEFGHJKMNPQRSTVWXYZ
  • Excluded: I, L, O, U (avoid confusion and potential abuse)
  • 32 unique characters

Format:

01ARZ3NDEKTSV4RRFFQ69G5FAV

Characteristics:

  • 26 characters (10 timestamp + 16 randomness)
  • No hyphens (unlike UUID’s 36 chars with hyphens)
  • Case-insensitive (can be normalized)
  • More compact than UUID string representation

Advantages over UUID:

  • Shorter (26 vs 36 characters)
  • No special characters required
  • More human-readable
  • Case-insensitive (easier to communicate verbally)

Usage in URIs:

/api/users/01ARZ3NDEKTSV4RRFFQ69G5FAV
?id=01ARZ3NDEKTSV4RRFFQ69G5FAV

Database Storage and Performance

Storage Size

Binary representation:

  • 128 bits = 16 bytes
  • Same as UUID

String representation:

  • 26 characters
  • As UTF-8 string: 26 bytes minimum
  • As MySQL CHAR(26) with utf8mb4: 72 bytes
  • Recommendation: Store as binary (16 bytes) for optimal efficiency

Storage comparison:

FormatSizeEfficiency
Binary (BYTEA/BINARY(16))16 bytesOptimal
String (CHAR(26))26+ bytes1.6× larger
UUID string (CHAR(36))36+ bytes2.25× larger

Index Performance

ULIDs provide significant performance advantages over random identifiers:

B-tree Index Benefits

Sequential insertion pattern:

  • ✅ Dramatically reduces page splits vs UUID v4
  • ✅ Minimizes write amplification
  • ✅ Improves cache utilization
  • ✅ Reduces I/O operations
  • ✅ Prevents index fragmentation and bloat

Recent benchmarks (PostgreSQL, 2024-2025):

ID TypeOps/SecondLatencyIndex Size
ULID (bytea)~34,00058 μsBaseline
UUID v7~34,00058 μsSimilar
UUID v4~25,00085 μs85% larger

Key findings:

  • ULID performance comparable to or slightly better than UUID v7
  • 33% faster than UUID v4
  • Significantly more stable performance (lower variance)

Lexicographic Sorting Benefits

Chronological ordering:

  • ULIDs sort lexicographically in timestamp order
  • No need for additional timestamp indexes
  • Natural time-based ordering

Query optimization benefits:

-- Time-range queries are efficient
SELECT * FROM events
WHERE event_id >= '01ARZ3NDEK000000000000000'
  AND event_id <= '01ARZ3NDEKZZZZZZZZZZZZZZ';

Advantages:

  • Efficient range queries on time-based data
  • Simplified debugging (IDs reveal creation time)
  • Better query planner optimization
  • Natural partitioning by time ranges

Impact on Page Splits and Fragmentation

Dramatically reduced fragmentation compared to UUID v4:

UUID v4 problems:

  • Excessive page splits even before pages are full
  • Random writes throughout B-tree structure
  • Index bloat increases size on disk
  • Temporally related rows spread across index

ULID advantages:

  • Inserts at end of B-tree
  • Minimizes splits to only last page
  • Sequential writes optimize for append-heavy workloads
  • Reduced index maintenance overhead

Storage efficiency:

  • Less wasted space from partial pages
  • More compact indexes
  • Better compression ratios
  • Lower storage costs for write-heavy applications

Sequential Nature and Timestamp Ordering

48-bit timestamp component:

  • Millisecond precision Unix timestamp
  • Representation until year 10889 AD
  • High-order bits ensure chronological insertion
  • Enables time-based partitioning strategies

Performance characteristics:

  • New records naturally fall at end of B-tree
  • Predictable insertion patterns
  • Optimizes for sequential writes
  • Reduces fragmentation over time

Generation Approach

✅ Fully Decentralized

ULIDs can be generated in a completely decentralized manner with no coordination required.

No centralized service needed:

  • Each system/node generates independently
  • Only requires system clock access
  • Cryptographically secure random number generator (CSPRNG)
  • No network coordination overhead

Structure: Timestamp + Randomness

128 bits total:

 01AN4Z07BY      79KA1307SR9X4MV3
|----------|    |----------------|
 Timestamp          Randomness
   48bits             80bits

Timestamp component (48 bits):

  • Milliseconds since Unix epoch
  • First 10 characters in encoded form
  • Provides temporal ordering

Randomness component (80 bits):

  • Cryptographically secure random value
  • Remaining 16 characters
  • Ensures uniqueness within same millisecond

Binary encoding:

  • Most Significant Byte first (network byte order)
  • Each component encoded as octets
  • Total: 16 octets (bytes)

Collision Resistance

Extremely high collision resistance:

  • 1.21 × 10²⁴ unique IDs per millisecond (2⁸⁰ possible values)
  • Collision probability is practically zero
  • Even in distributed systems, likelihood of collision is exceedingly low

Example scale:

  • Would need to generate trillions of IDs per millisecond to see collisions
  • Far exceeds any practical generation rate
  • Safe for production at any realistic scale

Monotonicity Guarantees

Standard Generation (Non-Monotonic)

Default behavior:

  • Each ULID uses fresh random 80 bits
  • Sortable by timestamp (millisecond precision)
  • No guarantee of order within same millisecond

Monotonic Mode (Optional)

Algorithm:

  1. If timestamp same as previous: increment previous random component
  2. If timestamp advanced: generate fresh random component
  3. If overflow (2⁸⁰ increments): wait for next millisecond or fail

Benefits:

  • ✅ Guarantees strict ordering even at sub-millisecond generation
  • ✅ Better collision resistance through sequential randomness
  • ✅ Maintains sortability within same timestamp

Trade-offs:

  • ⚠️ Leaks information about IDs generated within same millisecond
  • ⚠️ Potential security concern: enables enumeration attacks
  • ⚠️ Can overflow if > 2⁸⁰ IDs generated in one millisecond (theoretical only)

Collision probability in monotonic mode:

  • Actually reduces collision risk
  • Incrementing creates number groups less likely to collide
  • Safe to use in production systems

Comparison to UUID v7

Both ULID and UUID v7 solve similar problems with different approaches:

AspectULIDUUID v7
Size16 bytes16 bytes
Timestamp bits4848
Random bits8074
String format26 chars (Base32)36 chars (hex + hyphens)
StandardizationCommunity specRFC 9562 (official)
DB supportCustomNative (PostgreSQL 18+)
ReadabilityBetter (Base32)Standard (hex)
Case sensitivityInsensitiveInsensitive
HyphensNone4 hyphens

ULID advantages:

  • More compact string representation (26 vs 36)
  • Slightly more random bits (80 vs 74)
  • Better human readability (Crockford Base32)
  • No hyphens (simpler to handle)

UUID v7 advantages:

  • Official RFC standardization
  • Growing native database support
  • URN namespace compatibility (urn:uuid:...)
  • Wider vendor tooling support

2024-2025 Landscape

Current state:

  • UUID v7 (RFC 9562, 2024) now offers similar benefits with standardization
  • ULID remains compelling for human readability and compact representation
  • Both vastly superior to UUID v4 for database performance
  • Choice often: standardization (v7) vs. readability (ULID)

Industry adoption:

  • incident.io uses ULIDs for all identifiers
  • Various startups prefer ULID for API design
  • UUID v7 gaining traction in enterprise systems

Use Cases

ULIDs are excellent for:

  • ✅ Database primary keys (especially write-heavy workloads)
  • ✅ Distributed systems requiring decentralized ID generation
  • ✅ Applications needing URI-safe identifiers
  • ✅ Systems benefiting from time-ordered IDs
  • ✅ Scenarios requiring human-readable identifiers
  • ✅ APIs where compact IDs are valued

Consider alternatives when:

  • ⚠️ Strict RFC/ISO standardization required (use UUID v7)
  • ⚠️ Native database support is priority (UUID v7 has better tooling)
  • ⚠️ Absolute minimal storage (auto-increment or Snowflake)
  • ⚠️ High-security scenarios sensitive to timing information leakage

Implementation Examples

PostgreSQL

-- Store as bytea for optimal performance
CREATE TABLE events (
    event_id BYTEA PRIMARY KEY DEFAULT ulid_generate(),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    data JSONB
);

-- Custom function needed (no native support)
CREATE OR REPLACE FUNCTION ulid_generate()
RETURNS BYTEA AS $$
    -- Implementation using pgcrypto or external library
$$ LANGUAGE plpgsql;

MySQL

-- Store as BINARY(16)
CREATE TABLE events (
    event_id BINARY(16) PRIMARY KEY,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    data JSON
);

-- Generate in application layer

Application-Level Generation

Go example:

import "github.com/oklog/ulid/v2"

// Standard generation
id := ulid.Make()
fmt.Println(id.String()) // 01ARZ3NDEKTSV4RRFFQ69G5FAV

// Monotonic generation
entropy := ulid.Monotonic(rand.New(rand.NewSource(time.Now().UnixNano())), 0)
id := ulid.MustNew(ulid.Timestamp(time.Now()), entropy)

Go Library Support

✅ oklog/ulid Library

The canonical Go library for ULIDs is github.com/oklog/ulid/v2, which provides full ULID specification support with both standard and monotonic generation modes.

Installation:

go get github.com/oklog/ulid/v2

Usage examples:

import (
    "crypto/rand"
    "github.com/oklog/ulid/v2"
)

// Simple generation with default entropy
id := ulid.Make()
fmt.Println(id.String()) // e.g., 01ARZ3NDEKTSV4RRFFQ69G5FAV

// Monotonic generation for strict ordering
entropy := ulid.Monotonic(rand.Reader, 0)
id := ulid.MustNew(ulid.Timestamp(time.Now()), entropy)

Summary

ULID represents an excellent choice for modern distributed systems:

Key strengths:

  1. Fully decentralized - no coordination required
  2. URI-safe and compact - 26 characters, no special chars
  3. Excellent database performance - time-ordered, minimal fragmentation
  4. Human-readable - Crockford Base32 alphabet
  5. High collision resistance - 1.21 × 10²⁴ IDs per millisecond

Key considerations:

  1. Not officially standardized (community spec)
  2. Requires custom database functions (no native support)
  3. Exposes creation timestamp (like UUID v7)
  4. Slightly more complex than UUID v4 generation

Bottom line: ULID is an excellent choice when you value compact, human-readable identifiers and don’t require strict RFC compliance. For official standardization, UUID v7 offers similar performance with growing vendor support.

3 - Snowflake ID Analysis

Overview

Snowflake IDs are 64-bit unique identifiers originally developed by Twitter (now X) in 2010 to replace auto-incrementing integer IDs that became problematic as they scaled across multiple database shards. The format has been widely adopted by other distributed systems including Discord, Instagram, and many platforms requiring globally unique, time-ordered identifiers.

Key differentiator: Half the size of UUIDs/ULIDs while maintaining distributed generation and time-ordering properties.

URI Safety

✅ Completely URI-Safe

Snowflake IDs are inherently URI-safe in their native numeric form.

Native format:

  • 64-bit signed integer
  • Decimal string representation: 18-19 characters
  • Contains only digits: 0-9
  • No URL encoding required

Usage examples:

https://api.twitter.com/tweets/175928847299117063
https://discord.com/api/users/53908232506183680

Alternative Encodings

175928847299117063
  • Most common format (Twitter, Discord, etc.)
  • No encoding required
  • Human-readable (though not easily interpretable)
  • Length: 18-19 characters
  • Safe for both path parameters and query strings

Base62 Encoding

2BisCQ
  • Often used in URL shorteners
  • Compact, alphanumeric identifiers
  • No special characters requiring URL encoding
  • Length: ~11 characters
  • Characters: [A-Za-z0-9]

Base64URL Encoding

AJ8CWJ-eR2Q
  • Used by Twitter for media keys
  • URL-safe alphabet: - and _ instead of + and /
  • Padding (=) typically omitted
  • Length: ~11 characters

Encoding Concerns

None for standard numeric representation. Snowflake IDs as decimal integers naturally comply with URI specifications (RFC 3986) as unreserved characters.

Database Storage and Performance

Storage Size

8 bytes (64 bits) per Snowflake ID

Comparison table:

ID TypeStorage Sizevs Snowflake
Snowflake ID8 bytesbaseline
Auto-increment INT324 bytes0.5×
Auto-increment BIGINT8 bytes
UUID/ULID (binary)16 bytes2× larger
UUID (string)36 bytes4.5× larger

Impact at scale:

  • For Twitter’s billions of tweets, 8-byte advantage over UUIDs saves massive storage
  • Reduced memory footprint for indexes
  • Better cache utilization
  • Lower network transfer costs

Index Performance

Snowflake IDs provide exceptional B-tree index performance due to their time-ordered nature.

Sequential Insert Benefits

Optimal write performance:

  • ✅ No page splits (appends to end of index)
  • ✅ No expensive B-tree reorganizations
  • ✅ Minimal I/O (sequential writes minimize disk seeks)
  • ✅ Better cache utilization (hot pages remain in memory)

Comparison to Random IDs

UUID v4 causes:

  • ❌ Random index insertions throughout tree
  • ❌ Frequent page splits and reorganizations
  • ❌ Index fragmentation
  • ❌ Reduced cache efficiency
  • ❌ Higher write amplification

Benchmarks:

  • Snowflake IDs: Lower mean, variance, and standard deviation for ordered operations
  • UUID v4: Very high variance with unstable performance
  • Snowflake: Significantly better for ordered queries

Time-Ordered Nature and Benefits

The first 41 bits represent a timestamp (milliseconds since epoch), providing natural time-ordering.

Query Optimization

-- Time-range queries are highly efficient
SELECT * FROM tweets
WHERE tweet_id >= 175928847299117063
  AND tweet_id <= 175928847299999999;

Benefits:

  • Database can use range scans effectively
  • No need for separate created_at timestamp indexes (in many cases)
  • Natural partitioning by time is straightforward
  • Query planner optimizations leverage time-ordering

Sorting Benefits

  • IDs are lexicographically sortable by creation time
  • ORDER BY id implicitly orders by creation time
  • No need for separate sort operations in many scenarios
  • Simpler query plans

Data Partitioning

  • Time-based partitioning schemes align naturally with ID ranges
  • Simplifies archival strategies
  • Facilitates efficient data retention policies
  • Easy to implement hot/cold data separation

Impact on Database Operations

Write operations:

  • INSERT: Exceptional performance (sequential, append-only)
  • Batch inserts: Highly efficient due to sequential nature
  • Index maintenance: Minimal overhead

Read operations:

  • Point queries by ID: Standard B-tree performance (O(log n))
  • Range queries: Excellent for time-based ranges
  • Ordered queries: Superior to UUID-based systems
  • ⚠️ Join operations: Standard performance (64-bit integer comparison)

Storage:

  • Primary key: 8 bytes (optimal for 64-bit systems)
  • Foreign keys: 8 bytes
  • Index size: 50% smaller than UUID-based indexes
  • Memory footprint: More cache-efficient than UUIDs

Comparison to Other Numeric IDs

ID TypeSizeTime-OrderedDistributedIndex PerfSortable by Time
Snowflake8 bytes✅ Yes✅ YesExcellent✅ Yes
Auto-increment4-8 bytes✅ Yes❌ NoExcellent✅ Yes
UUID v416 bytes❌ No✅ YesPoor❌ No
UUID v716 bytes✅ Yes✅ YesGood✅ Yes
ULID16 bytes✅ Yes✅ YesGood✅ Yes

Unique combination:

  • Distributed generation capability (like UUID)
  • Time-ordered properties (like auto-increment)
  • Compact size (8 bytes)
  • Excellent index performance

Generation Approach

⚠️ Mostly Decentralized

Snowflake IDs can be generated in a mostly decentralized manner with minimal coordination.

Key characteristics:

  • ✅ No centralized coordination during ID generation
  • ✅ No network calls required between generators
  • ✅ No database round-trips for ID allocation
  • ✅ High throughput: Up to 4,096 IDs per millisecond per worker
  • ✅ Low latency: Sub-microsecond generation time
  • ⚠️ Requires one-time worker ID allocation

Structure Breakdown

A Snowflake ID is a 63-bit signed integer (within 64-bit type):

┌─────────────────────────────────────────┬──────────────┬──────────────┐
│          Timestamp (41 bits)            │ Worker (10)  │ Sequence (12)|
└─────────────────────────────────────────┴──────────────┴──────────────┘
 ← Most Significant                                  Least Significant →

1. Timestamp Component (41 bits)

Purpose: Milliseconds since custom epoch

Characteristics:

  • Range: ~69 years of unique timestamps
  • Epoch: Configurable (Twitter: 1288834974657, Discord: 1420070400000)
  • Most significant bits ensure chronological sorting
  • Enables time-range queries

Benefits:

  • Provides time-ordering
  • Natural partitioning by time
  • Debugging aid (can decode timestamp)

2. Worker/Machine ID (10 bits)

Purpose: Identifies the generator node

Characteristics:

  • Range: 0-1023 (1,024 unique workers)
  • Often split further:
    • Twitter original: 5-bit datacenter ID + 5-bit worker ID
    • Discord: 5-bit worker ID + 5-bit process ID
    • Custom: Can be adapted to organizational needs

Critical requirement: Each worker MUST have a unique ID

3. Sequence Number (12 bits)

Purpose: Counter for IDs generated in same millisecond

Characteristics:

  • Range: 0-4095 (4,096 IDs per millisecond per worker)
  • Increments for each ID within the same millisecond
  • Resets to 0 when millisecond changes
  • If exhausted: Generator waits until next millisecond

System-wide capacity:

  • Per worker: 4,096,000 IDs per second
  • With 1,024 workers: ~4.2 billion IDs per second theoretical maximum

Centralized Coordination Requirements

Minimal coordination required, but only during initial setup:

What Requires Coordination (One-Time):

  1. Worker ID allocation (during node provisioning)
  2. Epoch selection (at system design time)
  3. ⚠️ Clock synchronization (ongoing, but not critical)

What Does NOT Require Coordination:

  • ❌ Individual ID generation
  • ❌ Real-time communication between nodes
  • ❌ Distributed locks or consensus
  • ❌ Database queries for next ID

Worker ID Allocation Requirements

This is the primary coordination challenge in Snowflake ID systems.

Static Allocation (Simple)

# Configuration file
servers:
  - host: server-1
    worker_id: 1
  - host: server-2
    worker_id: 2
  - host: server-3
    worker_id: 3

Pros:

  • ✅ Simple to implement
  • ✅ No runtime coordination
  • ✅ Predictable and debuggable

Cons:

  • ❌ Doesn’t work with auto-scaling
  • ❌ Manual reconfiguration needed
  • ❌ Worker ID exhaustion in large deployments

Dynamic Allocation (Complex)

Common strategies for dynamic environments:

1. Zookeeper/etcd Coordination

- Nodes register and receive unique worker IDs
- Lease-based assignment with TTL
- Automatic reclamation of dead workers
  • ✅ Automatic worker ID management
  • ❌ Requires external coordination service
  • ❌ Added operational complexity

2. Database-Based Registry

CREATE TABLE worker_registry (
    worker_id INT PRIMARY KEY,
    instance_id VARCHAR(255),
    last_heartbeat TIMESTAMP
);
  • ✅ No additional infrastructure
  • ❌ Database dependency
  • ❌ Requires heartbeat mechanism

3. Consistent Hashing

worker_id = hash(node_ip_or_mac) % 1024
  • ✅ No coordination needed
  • ❌ Risk of collisions in large clusters
  • ❌ Requires careful hash function selection

4. Container Orchestration Integration

- Kubernetes StatefulSets with ordinal indexes
- Cloud provider instance metadata
- Environment variable injection
  • ✅ Integrates with existing infrastructure
  • ❌ Platform-specific
  • ❌ May limit to 1,024 pods/instances

Challenge in auto-scaling:

“In a dynamic environment with auto-scaling, managing worker IDs becomes challenging. You need a strategy to assign unique worker IDs to new instances.”

Collision Avoidance Mechanisms

Snowflake IDs guarantee uniqueness through multiple layers:

1. Temporal Uniqueness

  • 41-bit timestamp ensures different milliseconds get different IDs
  • System clock monotonicity prevents duplicate timestamps

2. Spatial Uniqueness

  • 10-bit worker ID ensures different nodes generate different IDs
  • Critical requirement: Each worker MUST have a unique ID

3. Sequential Uniqueness

  • 12-bit sequence counter within same millisecond
  • Allows up to 4,096 IDs per worker per millisecond

Mathematical Guarantee

Unique ID = f(timestamp, worker_id, sequence)

As long as:

  • worker_id is unique per node (most critical)
  • Clock doesn’t move backwards significantly
  • Sequence doesn’t overflow (wait 1ms if it does)

Then collisions are mathematically impossible.

Collision Risk Scenarios

Very Low Risk:

  • ⚠️ Clock skew between nodes (IDs remain unique, may not be perfectly ordered)
  • ⚠️ Leap second handling (typically managed by NTP)

High Risk (Configuration Errors):

  • Duplicate worker IDs: Multiple nodes with same worker ID
  • Clock moving backwards: System time reset or NTP correction
  • Worker ID overflow: Attempting to use more than 1,024 workers

Generation Rate Limits

Per worker:

  • Maximum: 4,096 IDs per millisecond
  • Per second: 4,096,000 IDs per worker
  • Typical usage: Far below maximum in most applications

Handling exhaustion:

// Pseudocode
if sequence >= 4096 {
    // Wait until next millisecond
    waitUntil(nextMillisecond)
    sequence = 0
}

Implementation Considerations

Advantages

  • No single point of failure (after worker ID allocation)
  • Minimal coordination overhead
  • Extremely high throughput
  • Low generation latency
  • Natural load distribution
  • Smallest storage size (8 bytes)
  • Best database performance

Disadvantages

  • ⚠️ Requires unique worker ID management
  • ⚠️ Clock synchronization needed (NTP recommended)
  • ⚠️ Fixed worker limit (1,024 without redesign)
  • ⚠️ Not truly random (predictable structure)
  • ⚠️ Information leakage (creation time, rough volume)
  • ⚠️ Auto-scaling complexity (worker ID allocation)

Security Considerations

Information Leakage

Snowflake IDs reveal more information than UUIDs:

What’s exposed:

  • ⚠️ Exact creation time (41-bit timestamp)
  • ⚠️ Which worker generated it (10-bit worker ID)
  • ⚠️ Sequence count within millisecond (12-bit sequence)

Potential concerns:

  • Business activity levels can be inferred
  • Worker distribution visible
  • Timeline of events can be reconstructed

Enumeration Attacks

Predictable patterns:

  • ⚠️ Can estimate next ID value
  • ⚠️ Can enumerate recent IDs
  • ⚠️ Can probe for existence of IDs in ranges

Mitigation:

  • ✅ Use authentication/authorization (don’t rely on ID secrecy)
  • ✅ Implement rate limiting
  • ✅ Add additional access controls
  • ✅ Consider signing/encrypting IDs if necessary

Important: Never rely on ID unpredictability as a security mechanism.

Real-World Implementations

Twitter (Original)

1 bit (unused) + 41 bits (timestamp) + 5 bits (datacenter) +
5 bits (worker) + 12 bits (sequence)
  • Epoch: November 4, 2010, 01:42:54 UTC
  • 32 datacenters, 32 workers per datacenter
  • Up to 4,096 IDs per millisecond per worker

Discord

1 bit (unused) + 41 bits (timestamp) + 5 bits (worker) +
5 bits (process) + 12 bits (sequence)
  • Epoch: January 1, 2015, 00:00:00 UTC
  • Allows multiple processes per worker
  • Custom epoch for longer lifespan

Instagram

  • Similar structure to Twitter
  • Sharded database architecture
  • Combines Snowflake with PostgreSQL sequences

Go Library Support

✅ bwmarrin/snowflake Library

The most popular Go library for Snowflake IDs is github.com/bwmarrin/snowflake, which provides a production-ready implementation with configurable epoch and node ID.

Installation:

go get github.com/bwmarrin/snowflake

Usage example:

import "github.com/bwmarrin/snowflake"

// Create a new node with worker ID (must be unique per instance)
node, err := snowflake.NewNode(1) // Worker ID: 1 (range: 0-1023)
if err != nil {
    log.Fatal(err)
}

// Generate a Snowflake ID
id := node.Generate()
fmt.Println(id.Int64())   // e.g., 175928847299117063
fmt.Println(id.String())  // e.g., "175928847299117063"

Alternative: sony/sonyflake

github.com/sony/sonyflake is another option that uses a different bit layout (39-bit time, 8-bit sequence, 16-bit machine ID), providing finer-grained machine ID space at the cost of time precision.

Migration Strategies

From Auto-Increment

Considerations:

  • Must provision worker ID allocation system
  • May need to widen integer columns (INT to BIGINT)
  • Application code changes for ID generation
  • Foreign key updates required

Recommended approach:

  1. Add Snowflake ID column alongside auto-increment
  2. Generate Snowflake IDs for existing rows
  3. Update application to use Snowflake IDs for new records
  4. Migrate foreign keys progressively
  5. Eventually remove auto-increment column

From UUID

Considerations:

  • Significant storage reduction (16 → 8 bytes)
  • Different data type (binary/string → bigint)
  • Worker ID allocation system needed
  • May require application changes

Benefits:

  • 50% storage reduction
  • Better performance
  • Numeric type easier for some use cases

Summary

Snowflake IDs represent an elegant solution for distributed systems:

Key Strengths:

  1. Compact size: 8 bytes (half of UUID/ULID)
  2. Excellent performance: Sequential insertion, optimal for B-trees
  3. Time-ordered: Natural sorting and partitioning
  4. High throughput: Millions of IDs per second per worker
  5. URI-safe: Decimal integers require no encoding

Key Challenges:

  1. Worker ID management: Requires coordination (one-time)
  2. Auto-scaling complexity: Dynamic worker ID allocation needed
  3. Information leakage: Exposes timestamp and worker information
  4. Fixed limits: 1,024 workers without redesign

Best For:

  • High-scale distributed systems with predictable worker counts
  • Storage-constrained environments
  • Systems requiring time-ordered numeric IDs
  • Applications where 8-byte size matters

Consider Alternatives When:

  • Auto-scaling is critical and worker ID management is complex
  • Strict randomness required (use UUID v4)
  • Official standardization needed (use UUID v7)
  • More than 1,024 concurrent generators needed

Bottom Line: For systems that can manage worker IDs and value storage efficiency, Snowflake IDs offer the best combination of size, performance, and distributed generation capabilities.