Cluster & Operations

Threshold Signing & Cluster Cryptography

Distributed signing where no single node holds the full private key — used for OIDC tokens, internal bearer tokens, and CA certificates

Overview

Threshold signing means that certificates and tokens are signed by a quorum of cluster nodes working together. No single node ever holds the full private key — the key is split into shares, and a minimum number of nodes (the “threshold”) must cooperate to produce a valid signature.

The cluster runs two signing schemes in parallel:

ECDSA (ES256/ES384/ES512) — for EXTERNAL tokens

   Used for: OIDC tokens, Personal Access Tokens (PATs), standard OAuth
   Why: industry-standard algorithms that third-party tools verify natively

FROST Ed25519 — for INTERNAL operations

   Used for: proxy bearer tokens, bastion device codes, internal service auth
   Why: faster signing (~15ms) optimized for high-volume internal operations

These two schemes are not fallbacks for each other — they run in parallel, each serving different consumers. The only brief fallback window is during cluster startup: internal tokens temporarily use ECDSA until FROST key generation completes. This is a few seconds, not a steady-state condition.

Token routing (when signing_algorithm = ES256):

  Token Type                 | Scheme  | Reason
  ──────────────────────────────────────────────────────────
  Proxy bearer tokens        | FROST   | Internal — speed, backend verifies via JWKS
  Bastion device codes       | FROST   | Internal — bastion authentication
  Internal device codes      | FROST   | Internal service callers
  Personal Access Tokens     | ECDSA   | External — distributed to users
  Standard OIDC tokens       | ECDSA   | External — third-party OAuth clients

Quorum model (default: 2-of-3 nodes):

  - Any 2 nodes can sign; 1 node alone cannot forge signatures
  - 1 node failure = still operational (2 remaining nodes can sign)
  - 2 nodes down = signing blocked (quorum lost)
  - 1 node compromised = attacker has 1 share, cannot forge (needs 2)

Startup sequence:

  1. Nodes perform distributed key generation (DKG) for the ECDSA scheme
  2. Once ECDSA is ready, FROST key generation auto-triggers
  3. Once both complete, all signing paths are available
  During the brief window between steps 1-3, internal tokens use ECDSA.
  Zero downtime throughout.

When signing_algorithm is EdDSA: everything uses FROST (no dual mode needed).

Key management

Key lifecycle and rotation:

Threshold signing keys (both ECDSA and FROST):

  - Generated collaboratively by all cluster nodes (distributed key generation)
  - No single node ever holds the full private key — only its own share
  - Stored encrypted at rest using authenticated encryption derived from cluster_key
  - When nodes join or leave, key shares are redistributed while preserving the
    same public key (external verifiers like JWKS consumers are not affected)

ECDSA key rotation (for OIDC tokens):

  - Automatic: triggered when key generation completes or cluster membership changes
  - New JWKS entry published; old key retained for verification (grace period)
  - Relying parties cache JWKS — existing tokens remain valid until cache refresh

FROST key rotation (for internal tokens):

  - Independent from ECDSA rotation — separate lifecycle
  - Auto-triggered when cluster membership changes
  - Internal tokens are short-lived — rotation is seamless with no visible impact

Inter-node encryption:

  Nodes communicate over an encrypted channel with forward secrecy. This means
  each inter-node session uses unique encryption keys, and even if long-term
  keys were compromised, past communications remain unreadable.
  - Encryption keys rotate automatically on a schedule (2PC protocol with quorum)
  - Grace period: old + new keys accepted simultaneously during rotation
  - Temporary fallback to derived keys if the key exchange is briefly unavailable
  - SPK rotation uses publish-before-swap: new bundle published before private key swaps
  - Key rotation defers if SPK just rotated (SPK recency guard, 5s window)
  - On quorum failure, key rotation retries once after flushing stale bundle caches
  - All rotation events emitted as audit entries via OnKeyRotationEvent callback

IMPORTANT — All rotations are automatic:

  Certificate rotation, signing key rotation, and encryption key rotation are
  all handled by background health monitors. Operators do NOT need to set
  calendar reminders or manually trigger rotations.
  Only investigate when 'health components' or 'hexdcall status' shows warnings.

Deterministic signing

Clarification on “deterministic” in cluster crypto context:

The word “deterministic” appears in two contexts — they mean different things:

AutoTLS deterministic KEY DERIVATION:

   - Private keys derived deterministically from cluster_key for each renewal cycle
   - Ensures all nodes produce the same public key for SPKI pinning
   - ECDSA SIGNATURE nonces remain fully random (standard randomness)
   - This is NOT "deterministic signing" — only the key material is deterministic
   - Security equivalent to random key generation (entropy from cluster_key)

Threshold ECDSA CA signing:

   - Uses multi-party nonce generation across nodes
   - Each node contributes randomness; combined nonce is unpredictable
   - No single node controls the nonce — distributed trust model
   - The signing ceremony is interactive (requires t+1 nodes), not deterministic

Neither case reduces cryptographic security. “Deterministic” in these contexts refers to reproducibility (same inputs → same key) or coordination protocol, not to removing randomness from signatures.

When operators ask about “no randomness” or “deterministic signing”:

  - AutoTLS: "The leaf certificate keys are derived deterministically for cluster
    consistency and SPKI pinning. Signatures still use standard randomness."
  - Threshold CA: "The signing uses multi-party nonce generation — each node
    contributes randomness, and the combined nonce is cryptographically secure."

Do NOT present deterministic key derivation as a security concern.

Troubleshooting

Common symptoms and diagnostic steps:

ECDSA threshold signing errors (external tokens not being issued):

  - Run 'hexdcall status' to check signer state and health
  - State=Active + Health=Healthy → signing should work
  - Health=Degraded → at minimum quorum, one more node failure blocks signing
  - Health=Unhealthy → cannot sign, check node reachability with 'cluster nodes'
  - Run 'hexdcall threshold test' to verify end-to-end signing

FROST signing errors (internal tokens failing, e.g. proxy bearer or device codes):

  - Check FROST state and health in 'hexdcall status'
  - FROST key generation runs after ECDSA completes — if FROST shows Idle
    but ECDSA is Active, FROST key generation has not triggered yet
  - Internal tokens fall back to ECDSA while FROST initializes (no outage)
  - Run 'hexdcall threshold test --trace' for detailed phase-level timing

Key generation not completing:

  Key generation (DKG) is the process where nodes collaboratively create a
  shared signing key. It requires all participating nodes to be reachable.
  - Check 'cluster nodes' — all expected nodes must be online
  - Key generation requires the inter-node encryption channel to be healthy
  - Check for membership mismatches: all nodes must agree on the participant set
  - Rolling restarts are handled gracefully — key generation is not re-triggered unnecessarily

Inter-node encryption issues:

  Nodes encrypt all cluster communication using a forward-secret key exchange.
  - Low key pool → automatic replenishment triggers (usually self-healing)
  - Key exchange failures → check NATS JetStream connectivity ('cluster status')
  - Signature verification failures → possible clock skew between nodes (check NTP)
  - During degradation, non-critical operations are deferred and auto-retry on recovery
  - Key rotation audit events: search 'logs tail --audit' for module=keyrotation
    Events: initiated, deferred, commit_all, commit_quorum, retry, aborted, completed,
    activated, abort_received, spk_completed, spk_failed
  - "deferred" = SPK recency guard fired (normal when SPK and key rotation intervals match)
  - "retry" = first PREPARE attempt failed, retried after bundle cache refresh
  - "commit_quorum" = some nodes missed PREPARE, committed with partial ACKs

Interpreting ‘hexdcall status’ output:

  ECDSA: Active/Healthy + FROST: Active/Healthy → optimal state (all signing paths available)
  ECDSA: Active/Healthy + FROST: Idle → FROST key generation pending, internal tokens use ECDSA
  ECDSA: Active/Degraded → at minimum quorum, lost fault tolerance margin — monitor closely
  ECDSA: DKG → key generation in progress, signing not yet available
  Inter-node encryption: Healthy → encrypted communication between nodes is nominal

Monitoring thresholds for CA certificate:

  >90 days until expiry: HEALTHY (normal — renewal is automatic)
  20-90 days: INFO (approaching renewal window — still automatic)
  5-20 days: WARN (renewal should have happened — check logs)
  <5 days: CRITICAL (rotation may have failed — investigate immediately)

Diagnostic commands:

  'hexdcall status'            - Signing health, key generation state, inter-node encryption
  'hexdcall threshold test'    - End-to-end ECDSA signing test
  'cluster nodes'              - List cluster nodes and reachability
  'cluster status'             - Overall cluster health including NATS connectivity
  'health components'          - All system components with health status

Relationships

Module dependencies and interactions:

OIDC provider: Consumes ECDSA threshold signer for JWT signing (ES256/384/512).

  JWKS endpoint serves the threshold public key. When signing_algorithm changes,
  new DKG runs and JWKS updates.

OIDC provider (internal tokens): Uses FROST signer for proxy bearer tokens and

  device codes. Falls back to ECDSA if FROST is not yet ready.

ACME CA (threshold mode): When acme_ca_threshold=true, CA signing uses the

  ECDSA threshold scheme. Quorum of nodes must cooperate to issue certificates.

Bastion: Device code authentication uses FROST-signed tokens (internal path).
Proxy: Bearer token minting uses FROST for low-latency signing.
X3DH: Forward secrecy for DKG messages and key rotation coordination.

  Threshold signing uses a dedicated encrypted data plane, separate from X3DH.

NATS JetStream: Persistent storage for DKG state, key shares, and PreKey bundles.
Health monitor: Periodically computes signer health from peer reachability.

  Auto-triggers FROST DKG when ECDSA is Active. Detects membership mismatches.

Configuration System

Centralized TOML configuration with hot-reload, env overrides, GitOps, and directory-based multi-file merge

Overview

The configuration system is responsible for loading, validating, and serving configuration to all HexonGateway components. It supports multiple config sources with a well-defined precedence order:

  1. Default values (security-focused, applied automatically)
  2. TOML literal values (single file, directory of files, or Git repository)
  3. ${VAR} template substitution in TOML (arbitrary env var names, pre-parse)
  4. HEXON_* auto-computed overrides (post-parse, highest priority)

Key capabilities:

Thread-safe access with atomic reads and mutex-protected writes
Hot-reload with SHA256 change detection, callback throttling (default 100ms window),

  section caching, and delta change logging

Environment variable overrides for all fields including array items:

  HEXON_SECTION_KEY for singletons, HEXON_SECTION_ARRAY_<NAME>_KEY for array items
  Automatic type conversion (string, int, bool, comma-separated arrays)

${VAR} template substitution: embed arbitrary env var names in TOML values,

  expanded pre-parse. Operators choose their own naming convention.

GitOps: clone from Git repo (HTTPS or SSH), automatic polling with

  cluster-aware leader-only execution, multi-TOML file merge

Directory-based config: pass a directory path, all *.toml files merged

  recursively in alphabetical order (maps merge, arrays concatenate, scalars last-wins)

Self-documenting schema: struct tags (desc, hint, default, min, max, enum, format,

  example, required, sensitive, rfc, depends) drive runtime documentation

Config diff history: ring buffer (default 10 entries) tracking per-key old/new values,

  exposed via "config diff" admin CLI command

Invalid config handling: hash-based dedup prevents retry storms, logs every 5 minutes
File deletion handling: service continues with last valid config, ALERT logged,

  status set to "file_missing" for health check visibility

Configuration is organized into domain-specific sections:

  Service, Telemetry, Cluster, Operations, Protection, Authentication, Filesystem

The config package is imported by virtually every component in the system. It has no dependencies on other gateway modules (only standard library + go-toml/v2).

Config

Configuration is loaded from TOML files. Default path: /tmp/hexon.toml

[service]

  hostname = "auth.example.com"        # Public hostname (required)
  port = 443                           # HTTPS listen port (required)
  public_port = 8443                   # Public-facing port for URL generation (behind NAT/LB)
  tls_cert = "/path/to/cert.pem"       # TLS certificate (file path or inline PEM)
  tls_key = "/path/to/key.pem"         # TLS private key (file path or inline PEM)
  read_timeout = 30                    # HTTP read timeout in seconds (default: 30)
  write_timeout = 30                   # HTTP write timeout in seconds (default: 30)
  idle_timeout = 120                   # HTTP idle timeout in seconds (default: 120)
  max_header_bytes = 65536             # Max header size in bytes (default: 65536)
  http2_enable = true                  # Enable HTTP/2 (default: true)
  handshake_timeout = 10               # TLS handshake timeout in seconds (default: 10)
  block_malformed_tls = true           # Reject invalid TLS (default: true)
  mtls_mode = "none"                   # mTLS mode: "none", "optional", "mandatory"
  x509_auto_auth = true                # Auto-authenticate with client certificate (default: true)
  hot_reload_enabled = true            # Enable automatic file watching (default: true)
  hot_reload_poll_interval = "1s"      # File polling interval (default: 1s)
  hot_reload_callback_throttle = "100ms"  # Callback throttle window (default: 100ms)

[telemetry]

  log_level = "info"                   # trace|debug|info|warn|error|fatal (default: info)
  log_format = "json"                  # json|human (default: json)
  output = "stdout"                    # stdout|otlp|both (default: stdout)
  otlp_endpoint = "otel-collector:4317"  # Required when output is otlp or both
  log_buffer_size = 10000              # Ring buffer for log queries (default: 10000)

[cluster]

  cluster_mode = true                  # Enable clustering (default: false)
  cluster_peers = ["10.0.0.2", "10.0.0.3"]  # Static peers (IPs or hostnames)
  cluster_dns = "hexon.cluster.local"  # OR DNS-based discovery (ignored when cluster_peers is set)
  cluster_key = "32-char-secret"       # Cluster key, exactly 32 chars (required)
  cluster_refresh = "15s"              # Peer refresh interval (default: 15s)
  threshold_required = false           # Fail-closed threshold signing after bootstrap grace (default: false)
  threshold_bootstrap_grace = "2m"     # Grace period for DKG completion (default: 2m)
  threshold_nodes = 0                  # Threshold t value: 0=auto (n/2), explicit integer for override

Environment variable overrides (three layers):

  Precedence: HEXON_* override > ${VAR} expansion > TOML literal > defaults

  HEXON_* auto-computed overrides (post-parse, highest priority):
    Singleton fields:  HEXON_<SECTION>_<KEY>=value    # e.g., HEXON_SERVICE_PORT=8443
    Array item fields: HEXON_<SECTION>_<ARRAY>_<ITEMNAME>_<KEY>=value
                       # e.g., HEXON_AUTHENTICATION_OIDC_CLIENTS_MYAPP_CLIENTSECRET=secret
    Item names are sanitized: uppercased, non-alphanumeric → underscore, collapsed.
    Only existing items (defined in TOML) can be overridden.
    Use 'config describe <section>' to see the exact env var for each field.

  ${VAR} template substitution (pre-parse, in TOML source):
    clientsecret = "${VAULT_OIDC_SECRET}"   # Arbitrary env var names
    Pattern: ${VARNAME} — unset vars left as-is, no recursive expansion.

  Type conversion: string, int, bool (true/false/1/0/yes/no), arrays (comma-separated)

GitOps environment variables:

  CONFIG_GIT_REPO                      # Repository URL (HTTPS or SSH, required for GitOps)
  CONFIG_GIT_BRANCH                    # Branch name (required for GitOps)
  CONFIG_GIT_PATH                      # Local clone path (default: /tmp/hexon-config)
  CONFIG_GIT_POLLING                   # Enable remote polling (default: false)
  CONFIG_GIT_POLLING_TIME              # Polling interval (default: 5m, min: 30s)
  CONFIG_GIT_USER / CONFIG_GIT_TOKEN   # HTTPS authentication
  CONFIG_GIT_SSH_KEY                   # SSH private key (inline PEM or file path)

Directory-based config:

  Pass a directory path instead of file: --config /etc/hexon/conf.d/
  All *.toml files merged recursively in alphabetical order.
  Merge: maps merge recursively, arrays concatenate, scalars last-wins.
  Use numeric prefixes for ordering: 00-base.toml, 90-overrides.toml.
  World-writable files (chmod 0002) are rejected for security.

Config diff history:

  config_diff_history_enabled = true   # Enable/disable diff storage (default: true)
  config_diff_history_size = 10        # Max entries retained, range 1-100 (default: 10)

Hot-reloadable: all config values via Get(). Application code must handle changes. Cold (restart required): listener bind address/port, TLS certificate paths at startup.

Troubleshooting

Common symptoms and diagnostic steps:

Config file not loading at startup:

  - TOML syntax error: check error message for line number, validate with 'config validate'
  - Missing required fields: hostname, port, tls_cert, tls_key must be present
  - Invalid CIDR notation: check proxy_cidr, ip_whitelist, ip_blacklist format
  - World-writable file: chmod to remove 0002 permission from TOML files

Environment variable overrides not applying:

  - Check naming: HEXON_<SECTION>_<KEY> in uppercase (e.g., HEXON_SERVICE_PORT)
  - Dots become underscores: HEXON_VPN_NETWORK_SUBNET for [vpn.network] subnet
  - Boolean values: accepts true/false, 1/0, yes/no (case-insensitive)
  - Arrays: comma-separated (HEXON_SERVICE_PROXY_CIDR=10.0.0.0/8,172.16.0.0/12)
  - Array items: item must exist in TOML first; env var uses sanitized name
    (e.g., HEXON_AUTHENTICATION_OIDC_CLIENTS_MYAPP_CLIENTSECRET for client "MyApp")
  - ${VAR} not expanding: variable must be set (os.LookupEnv), pattern must use
    braces (${VAR} not $VAR), name must match [a-zA-Z_][a-zA-Z0-9_]*
  - Use 'config describe <section>' to see the exact env var name for each field
  - Check active overrides: 'config env' shows all HEXON_* variables in effect

Hot-reload not detecting changes:

  - File hash unchanged: hot-reload uses SHA256, not mtime
  - Throttle window: rapid changes coalesce within 100ms window
  - Check status: 'config diff' for recent changes
  - Callback timeout: callbacks exceeding 30s are logged at WARN
  - hot_reload_enabled=false: file watching is disabled entirely

Config file deleted while running:

  - Service continues with last valid config (graceful degradation)
  - ALERT logged immediately, reminder every 5 minutes
  - Status set to "file_missing" visible in 'health components'
  - When file is restored, normal operation resumes automatically

GitOps config not syncing:

  - Repository credentials: verify CONFIG_GIT_USER/TOKEN or CONFIG_GIT_SSH_KEY
  - Polling disabled: CONFIG_GIT_POLLING must be "true" for automatic updates
  - Cluster leader-only: in cluster mode, only the leader node polls Git
  - Multi-file merge: check logs for "[CONFIG] Multi-file mode:" to verify

Directory config merge issues:

  - File order: alphabetical by full path, use numeric prefixes (00-base, 90-overrides)
  - Scalar override: later files win
  - Array concatenation: proxy.mappings from multiple files combine (not override)
  - Only *.toml files included, rename to .disabled or .bak to exclude

Threshold signing issues:

  - threshold_required=true but tokens return 503 after bootstrap grace:
    DKG did not complete in time. Check 'status summary' for threshold state,
    'logs search threshold' for DKG errors. Ensure cluster_mode=true and ≥2 nodes.
  - Threshold signing not activating: requires cluster_mode=true, ≥2 nodes,
    X3DH healthy. Check 'health components' for x3dh status.
  - Re-DKG not triggering after node join/leave: stale route timeout is 5 minutes,
    then 10s stabilization. Wait ~5m10s after membership change.
  - threshold_nodes: 0 = auto (floor(n/2)), explicit value sets t directly.
    t+1 nodes must cooperate to sign. With t=1 and n=2, both nodes required.

Relationships

Module dependencies and interactions:

listener: Consumes service config for TLS settings, bind address, port, HTTP/2 parameters,

  handshake timeout. Listener reads config via Get() on startup and handles hot-reload
  for certificate rotation.

cluster: Config changes propagate to all nodes via cluster broadcast.

  GitOps polling runs on the cluster leader only.

telemetry: Reads log_level, log_format, output, otlp_endpoint. log_buffer_size controls

  ring buffer for admin CLI log queries.

protection: Rate limiting, PoW, IP whitelist/blacklist, IKEv2 IDS settings all loaded

  from [protection] section. Hot-reloadable for threshold tuning without restart.

authentication: All auth backend configuration (LDAP, OIDC, SAML, TOTP, WebAuthn, x509)

  loaded from [authentication] sub-sections.

Git config sync: Handles CONFIG_GIT_* env vars, repository cloning, SSH/HTTPS auth,

  multi-file merge, and polling coordination.

Hot reload: Infrastructure module that manages file watching lifecycle, callback

  registration, and reload orchestration.

proxy: Reverse proxy mappings, load balancer, circuit breaker settings from [proxy] section.
threshold signing: [cluster] threshold_required, threshold_bootstrap_grace, threshold_nodes

  control the threshold signing subsystem (GG18 ECDSA / FROST Ed25519). The algorithm is
  driven by [authentication.oidc] signing_algorithm. Config is cold (restart required).
  The threshold signing subsystem consumes these values at startup.

admin CLI: ‘config show’, ‘config describe’, ‘config example’, ‘config set’, ‘config diff’,

  'config validate', 'config env' commands for operational visibility and management.

schema: Self-documenting system driven by struct tags. Schema extraction produces

  field metadata, description formatting, TOML example generation, and auto-computed
  env var names for operator-facing output. Each field shows its HEXON_* env var
  in 'config describe'. The config guide MCP resource is generated from this schema data.

Git Configuration Management

Cluster-coordinated git-based configuration synchronization with leader polling and broadcast reload

Overview

The gitconfig module enables cluster-wide configuration synchronization from a git repository. It implements leader-only polling with broadcast-based reload notification, ensuring all cluster nodes maintain consistent configuration.

Core capabilities:

Leader-only git repository polling (prevents duplicate change detection)
Cluster-wide reload to all members on change detection
Hard reset to remote HEAD for deterministic config state
Commit tracking with hash, author, message, and timestamp
Quorum wait for cluster-wide consistency confirmation
Integration with config hot-reload pipeline for seamless updates

Cluster synchronization flow:

  1. Leader node polls git repository at configured interval
  2. When changes detected, leader pulls and applies config locally
  3. Leader notifies all cluster members to pull the latest config
  4. Each member pulls latest git config and triggers hot-reload
  5. Quorum wait ensures cluster-wide consistency

The module provides GitOps-style configuration management where infrastructure teams push configuration changes to a git repository, and the cluster automatically picks up and applies those changes. This enables:

Version-controlled configuration with full audit trail
Pull request review workflows for config changes
Rollback capability via git revert
Branch-based staging/production config separation

Leadership determines which node polls the repository:

  - Only the cluster leader runs the git polling loop
  - If leadership changes, the new leader automatically starts polling
  - In standalone mode, the single node polls directly

Config

Git configuration is managed under [config] section in hexon.toml:

[config]

  # Git repository settings
  git_enabled = true                    # Enable git-based config management
  git_repo = "/etc/hexon/config.git"   # Local path to git repository
  git_remote = "origin"                 # Git remote name (default: origin)
  git_branch = "main"                   # Branch to track (default: main)
  git_poll_interval = "30s"             # Polling interval (default: 30s)

  # Authentication
  git_ssh_key = "/etc/hexon/deploy.key" # SSH key for git authentication
  git_username = ""                      # Username for HTTPS auth (optional)
  git_password = ""                      # Password for HTTPS auth (optional)

  # Directory-based config
  config_dir = "/etc/hexon/config.d"    # Directory for split config files
  merge_strategy = "deep"               # How to merge directory configs

The git repository should contain the hexon.toml (or split config files) at the repository root. The module performs a hard reset to the remote branch HEAD on each pull, ensuring deterministic state regardless of local modifications.

Polling behavior:

  - Only the cluster leader polls the git repository
  - Poll interval determines change detection latency
  - SHA comparison detects changes (not file timestamps)
  - On detection, local reload happens first, then broadcast

Hot-reloadable: git_poll_interval. Cold (restart required): git_enabled, git_repo, git_remote, git_branch,

  git_ssh_key, git_username, git_password.

Architecture

Operational model and design:

Pull operation details:

  Each successful pull reports: commit hash, commit author, commit message,
  and pull timestamp. These are visible in structured logs and health status
  for auditing which config version is active on each node.

Operational model:

  The module is passive on member nodes -- it responds to cluster-wide pull
  notifications by performing a local git pull and triggering config reload.
  The active polling runs only on the leader node, which detects changes and
  initiates the cluster-wide pull.

Leader election dependency:

  The module relies on the cluster's leader election mechanism. Only the
  elected leader runs the git polling loop. If leadership changes, the new
  leader automatically starts polling. This prevents duplicate pulls and
  conflicting notification storms.

Troubleshooting

Common symptoms and diagnostic steps:

Config changes in git not being applied:

  - Verify git_enabled = true in [config] section
  - Check if this node is the leader: cluster status shows leader node
  - Verify git remote is accessible: net tcp <git_host:port>
  - Check git_poll_interval (default 30s) - changes may be within latency
  - Look for git pull errors in logs: logs search "gitconfig" --level=error
  - Verify branch name matches: git_branch must match remote branch

Authentication failures (git pull fails):

  - SSH: verify git_ssh_key path exists and has correct permissions (0600)
  - SSH: check host key is in known_hosts for the git server
  - HTTPS: verify git_username and git_password are correct
  - HTTPS: check if token has expired (for token-based auth)
  - Look for auth errors: logs search "git" --level=error

Cluster members out of sync:

  - Check cluster health: cluster status shows all nodes
  - Verify pull delivery: logs search "gitconfig" on member nodes
  - Member pull failure is local only - check individual node logs
  - Force sync: trigger a manual git push (any change) to cause re-poll
  - Check quorum: if quorum lost, broadcast may not reach all members

Config validation failure after pull:

  - Invalid TOML in repository causes reload failure
  - Leader reload failure prevents broadcast (protects cluster)
  - Member reload failure logged locally, does not affect other nodes
  - Check: config validate to verify current config
  - Check git log for the problematic commit

Hard reset behavior:

  - The module performs git reset --hard to remote HEAD
  - Local modifications to the config file are overwritten
  - This is intentional: git is the source of truth
  - If local changes are needed, commit them to the repository

Standalone mode (no cluster):

  - Git polling runs on the single node directly
  - No broadcast occurs (no cluster to notify)
  - Config reload happens locally after pull
  - Suitable for development and single-node deployments

Relationships

Module dependencies and interactions:

config: Primary integration point. The config system performs the actual

  git fetch and hard reset. Config hot-reload pipeline processes the updated
  TOML after pull.

cluster: Leader election determines which node runs the git polling loop.

  Cluster-wide notification delivers the pull signal to all members.
  Quorum wait (optional) ensures cluster-wide consistency.

Hot reload: Complementary module — gitconfig handles git-based config

  changes while hot reload handles file-based config changes. Both trigger
  the same cluster-wide reload pipeline.

telemetry: Structured logging for pull operations with commit hash, author,

  and success/failure status. Metrics for pull frequency and latency.

Hot Reload

Cluster-coordinated configuration hot-reload with leader file watching and broadcast notification

Overview

The hotreload module provides cluster-coordinated configuration reloading for HexonGateway. When the leader node detects config file changes via file watching, it reloads locally and broadcasts a ReloadConfig operation to all cluster members.

Core capabilities:

Leader-only file watching (prevents duplicate change detection across cluster)
SHA256 hash comparison for reliable change detection (1-second poll interval)
Cluster-wide reload notification to all members after leader detects changes
Graceful degradation to standalone mode (single node, no coordination)
Atomic config swap with validation before apply
Independent node recovery (each node can recover on next poll or restart)

Cluster reload flow:

  1. Leader's file watcher polls config file every 1 second
  2. SHA256 hash computed and compared to previous hash
  3. On change: leader re-reads config, validates, applies defaults
  4. Atomic config swap on leader node
  5. On success: leader notifies all cluster members to reload
  6. Each member independently re-reads file, validates, and swaps config
  7. Notification is best-effort (local success is sufficient)

Standalone mode:

  When running as a single node or when cluster coordination is not initialized,
  every node watches and reloads independently. No broadcast occurs. This mode
  provides backward compatibility for development environments, single-node
  deployments, and testing scenarios.

Error handling philosophy (best effort):

  - Leader reload success: always broadcast to cluster
  - Leader reload failure: do NOT propagate (protect cluster from bad config)
  - Member reload failure: logged locally, does not affect other nodes
  - Cluster propagation failure: logged, local reload already succeeded

Config

Hot reload is an infrastructure module that watches the main config file. Its behavior is controlled by the overall config system rather than a dedicated config section.

The file watcher monitors the main hexon.toml config file path. The poll interval is fixed at 1 second for responsive change detection without excessive I/O overhead.

Key behaviors:

  - File watcher only runs on the cluster leader node
  - SHA256 hash comparison avoids false-positive reloads from timestamp changes
  - Config validation occurs before applying changes (fail-safe)
  - Invalid config is rejected; previous config remains active
  - Atomic swap ensures no partial config state is visible to readers

Leadership determines which node watches the config file:

  - Only the cluster leader runs the file watcher
  - If leadership changes, the new leader automatically starts watching
  - In standalone mode, every node watches independently

The config system also exposes reload status and metrics for health checks and monitoring.

Hot-reloadable fields vary by module. Each module documents which of its config fields support hot-reload vs. requiring a restart. The hotreload module itself has no user-configurable settings.

Architecture

Operational model and design:

Config version tracking:

  Each successful reload increments a version counter. This allows health
  checks and monitoring tools to detect whether a node is on the latest
  config by comparing version numbers. The version, reload timestamp, and
  any error message are exposed via health status.

File watching approach:

  The file watcher uses a polling approach (not inotify/kqueue) for maximum
  portability across Linux, macOS, and container environments. The 1-second
  poll interval provides a good balance between responsiveness and overhead.
  SHA256 hashing is more reliable than mtime/ctime comparison, which can
  produce false positives with NFS or container volume mounts.

Separation from gitconfig:

  hotreload handles direct file modifications (edit, cp, mount update).
  gitconfig handles git repository-based changes (git pull, merge).
  Both trigger the same config reload pipeline but through different
  detection mechanisms. They complement each other:
    - Use gitconfig for GitOps workflows with version control
    - Use hotreload for direct file modifications or mounted config maps

Troubleshooting

Common symptoms and diagnostic steps:

Config changes not being picked up:

  - Verify this node is the cluster leader: cluster status
  - In standalone mode, every node watches independently
  - Check if file was actually modified: SHA256 hash must change
  - Editing in place (vi, nano) changes hash; truncate+write may race
  - NFS/mount delays: file may not be visible for up to 1 second
  - Check logs for reload attempts: logs search "reload" --level=info

Reload fails with validation error:

  - Invalid TOML syntax: config validate to check current file
  - Missing required fields after edit
  - Leader detects failure and does NOT broadcast to cluster
  - Fix the config file; watcher will detect next change automatically
  - Check error details: logs search "reload" --level=error

Cluster members not reloading:

  - Check cluster connectivity: cluster status and health status
  - Verify reload delivery: logs search "reload" on member nodes
  - Member failure is independent: check individual node logs
  - Network partition: members reload on next local file change or restart
  - Quorum issues: cluster-wide reload requires quorum for delivery

Reload succeeded but feature not updated:

  - Not all config fields are hot-reloadable
  - Check module documentation for which fields require restart
  - Cold fields (e.g., listen ports, TLS certs) need full restart
  - Verify config version incremented: health status shows config version

Standalone mode issues:

  - No broadcast occurs in standalone mode (expected behavior)
  - Each node watches independently when cluster is not initialized
  - Verify the config file path is correct and accessible
  - File permissions: process must have read access to config file

File watcher consuming resources:

  - SHA256 computation on large config files is negligible
  - 1-second poll interval is fixed and not configurable
  - For very large configs (rare), hashing overhead is still minimal
  - If concerned, monitor CPU via metrics prometheus "cpu"

Relationships

Module dependencies and interactions:

config: Primary integration point. The config system owns file reading,

  TOML parsing, validation, and atomic swap logic. Reload status and metrics
  are exposed for health checks.

cluster: Leader election determines which node runs the file watcher.

  Cluster-wide notification delivers the reload signal to all members.
  Leadership changes automatically transfer the file watching responsibility.

GitOps config: Complementary module for git-based config changes.

  Both modules trigger the same config reload pipeline. gitconfig is for
  GitOps workflows; hotreload is for direct file modifications.

All modules with hot-reloadable config: When reload occurs, each module

  receives updated config via their registered reload callbacks. Modules
  include firewall (ACL rules), proxy (mappings), ratelimit (limits),
  forwardproxy (rate/bandwidth), and many others.

telemetry: Structured logging for reload events with success/failure status,

  config version, and timing. Metrics for reload frequency and duration.

Module Data Storage

Per-user module-specific data storage backed by Hexon KV (NATS JetStream)

Overview

The moduledata module provides a unified interface for storing module-specific user data such as WebAuthn/Passkey credentials, VPN PSK keys, user preferences, and other per-user settings. It acts as a facade with input validation, metrics, and cache refresh over Hexon KV (NATS JetStream) storage.

Core capabilities:

Hexon KV (NATS JetStream) storage with automatic cluster replication
Per-user, per-module namespace isolation (e.g., “vpn”, “webauthn”, “x509”)
Reserved “preferences” namespace for cross-module user settings (language, etc.)
Automatic language preference storage when Language field is set on SetRequest
Directory cache refresh broadcast after Set and Delete operations
Input validation at facade and storage levels
Base64url key encoding for NATS KV compatibility (handles @, :, spaces)

Operations: Get, Set, Delete, check existence, get all data for a user, and bulk load.

Key format uses base64url-encoded usernames for storage compatibility.

Config

Configuration for moduledata storage:

Hexon KV Requirements:

  [cluster]
  cluster_path = "/var/lib/hexon"   # Required for NATS JetStream persistence

  - NATS JetStream must be available (cluster mode)
  - Data automatically replicated across cluster nodes
  - LoadAll returns all stored data (efficient for bootstrap)

Input Validation Rules:

  Username:
    - Cannot be empty
    - Maximum 200 characters (before base64url encoding)
    - Any characters allowed (gets base64url encoded)

  Module Name:
    - Cannot be empty
    - Maximum 64 characters
    - Pattern: [a-zA-Z0-9][a-zA-Z0-9\-_]* (no dots or colons)
    - Examples: "vpn", "webauthn", "ssh_keys", "user-preferences"

  Combined key maximum: 256 characters after encoding

Reserved Namespaces:

  - "preferences": User-wide settings (language, notification preferences)

Troubleshooting

Common symptoms and diagnostic steps:

“Backend unavailable” error (ErrBackendUnavailable):

  - Check cluster_path exists and NATS JetStream is running
  - Check cluster status for NATS availability

“Invalid username” or “Invalid module name” errors:

  - Username must be non-empty and under 200 characters
  - Module name must match [a-zA-Z0-9][a-zA-Z0-9\-_]* pattern
  - Module name must be under 64 characters
  - No dots or colons allowed in module name (NATS KV restriction)

Data not appearing across cluster nodes:

  - Verify NATS JetStream cluster health
  - Check if directory cache refresh broadcast is working
  - Run 'moduledata inspect <username>' to check data on local node

Language preference not being stored:

  - Language is stored asynchronously (fire-and-forget) in "preferences" namespace
  - Check if Set operation for the primary module succeeded first
  - Verify language code is a valid string (e.g., "en", "es", "fr", "zh")
  - Query preferences directly: Get with ModuleName="preferences"

Encoding/decoding errors:

  - ErrEncodingFailed: data contains types that cannot be JSON-serialized
  - ErrDecodingFailed: stored data is corrupted or not valid JSON
  - Check NATS KV key format (base64url encoding)
  - Verify data values are JSON-compatible (maps, strings, numbers, bools)

Performance and metrics:

  - moduledata_operations_total: counter by operation type and status
  - moduledata_operation_duration_seconds: latency histogram
  - High latency: check NATS JetStream performance

Security

Security considerations for module data storage:

User enumeration prevention:

  HTTP handlers should return generic error messages to clients (e.g.,
  "Invalid credentials" instead of "User not found"). Detailed errors are
  logged internally with traceID for debugging.

Input validation (defense in depth):

  All inputs validated at facade and storage levels.
  Username length limit (200 chars) prevents DoS via oversized inputs.
  Module name character restrictions prevent injection attacks in NATS KV keys.
  Base64url encoding of usernames prevents NATS KV key injection.

Data isolation:

  Each module's data is stored under its own namespace key.
  Modules cannot accidentally overwrite another module's data.
  The "preferences" namespace is reserved for cross-module user settings.

Thread safety:

  All state managed by NATS JetStream.
  Concurrent operations are safe and independent.

Cache consistency:

  After Set and Delete operations, a directory cache refresh is
  replicated cluster-wide to keep all node caches consistent.
  This is fire-and-forget; transient broadcast failures are non-fatal.

Relationships

Module dependencies and interactions:

Directory: Provides user existence validation and group lookups.

  After Set/Delete, moduledata broadcasts RefreshUserCache to directory
  for cluster-wide cache consistency.

WebAuthn: Stores passkey credentials per user in

  "webauthn" namespace. Uses Get/Set for credential CRUD operations.

X.509: Stores X.509 certificate data per user.
vpn: Stores VPN PSK keys and enrollment data in “vpn” namespace.
signin: Stores sign-in flow state and user preferences.

  Uses Language field on Set to automatically store user language preference.

UI templates: Language preference from “preferences” namespace used for

  localized email rendering and UI template selection.

smtp: Looks up user language preference from “preferences” namespace

  for localized email delivery (OTP, cert renewal, passkey expiration).

cluster: Requires NATS JetStream (cluster_path configured).

  Data automatically replicated across cluster nodes.

telemetry: Metrics exported for operation counts and latency histograms.

  Structured logging with operation type, username (redacted), and traceID.

Distributed Sessions

Cross-protocol session management with distributed KV store, dual-key indexing, and TTL expiration

Overview

The sessions module provides cluster-aware, cross-protocol session management with dual-key indexing for efficient lookups. It supports:

Unique session IDs (crypto/rand UUID v4, base64url-encoded, 256-bit) or custom IDs (e.g., SHA256 hash)
Dual-key indexing: primary by session ID, secondary by type+module_key
Automatic TTL expiration managed by distributed memory storage
Saga-based atomic session+index creation with rollback on failure
Pluggable extend validators (e.g., X.509 certificate revocation checks)
Pluggable create callbacks (e.g., VPN device code generation)
Pluggable delete callbacks (e.g., VPN session termination on authz timeout)
Session ID regeneration for session fixation protection
Lazy index cleanup on GetByModuleKey (handles missed OnDelete callbacks)
Thread-safe callback/validator registration (RWMutex)
Metrics: sessions_created, validations_success, validations_failed,

  sessions_extended, sessions_revoked, sessions_bulk_revoked, sessions_regenerated

Available operations:

  Create         - Create session with atomic dual-key indexing
  Validate       - Validate session, update LastActivity (does NOT extend TTL)
  Extend         - Extend TTL (runs validators first, caps to cert_not_after for X.509)
  Revoke         - Delete single session (index cleaned automatically)
  RevokeAll      - Delete all sessions for a type+module_key
  List           - List all sessions of a given type (filters expired)
  GetByModuleKey - Reverse lookup by type+module_key with lazy cleanup
  RegenerateID   - New ID with same data (session fixation protection)

Session types in use:

  user              - Authenticated user sessions (web login, OIDC callback, X.509 auto-auth)
  ikev2             - IKEv2 VPN sessions (triggers device code callback)
  bastion           - SSH bastion connection tracking
  cobrowse          - Proxy co-browse viewer sessions
  password_expired  - Temporary session for password change flow (short TTL)
  mfa_pending       - MFA verification pending (short TTL)
  flow_pending      - Signup/enrollment flow pending
  jit2fa_pending    - JIT 2FA OTP verification pending
  jit2fa_auth       - JIT 2FA authenticated session
  pow               - Proof-of-Work challenge session
  saml_test_client  - SAML test client session
  bearer_cache      - JWT Bearer token verification cache (custom ID = SHA256 of token)

Memory usage: ~600 bytes per active session (500 bytes session + 100 bytes index entry). For 1 million active sessions: ~600 MB cluster-wide.

Config

Sessions have no dedicated [sessions] config section. TTL and cookie settings are controlled by the calling module via [service] and per-feature config:

[service]

  cookie_name = "hexon"                # Default session cookie name (default: "hexon")
  cookie_domain = ".example.com"       # Cookie domain for cross-subdomain sharing (default: current hostname only)
  cookie_ttl = "12h"                   # Default session cookie TTL (default: "12h")
  session_ttl = "24h"                  # Authenticated user session TTL (default: "24h")
  session_password_expired = "15m"     # Password expired session TTL (default: "15m")
  session_mfa_pending = "5m"           # MFA pending session TTL (default: "5m")
  max_concurrent_sessions = 1          # Max concurrent sessions per user (default: 1, 0=unlimited)

[authentication.saml]

  session_ttl = "8h"                   # SAML session TTL for Single Logout tracking (default: "8h")

[jit2fa]

  cookie_name = "jit2fa_key"           # Cookie name for JIT 2FA sessions (default: "jit2fa_key")
  session_ttl = "8h"                   # JIT 2FA authenticated session TTL (default: "8h")

[forward_proxy]

  session_cookie = "hexon_session"     # Forward proxy session cookie name (default: "hexon_session")

[protection]

  pow_cookie_name = "hexon_pow"        # PoW session cookie (default: "hexon_pow", MUST differ from session cookie)

Recommended TTL values by session type:

  Interactive web sessions (user):    12-24 hours
  API tokens:                         30-90 days
  OAuth state:                        5-10 minutes
  MFA pending (mfa_pending):          5 minutes
  Password expired (password_expired): 15 minutes
  PoW/temporary tokens:               1-5 minutes
  JIT 2FA (jit2fa_auth):              8 hours
  VPN (ikev2):                        Caller-determined (IKEv2 manager)
  Bastion:                            Caller-determined (bastion manager)
  Bearer cache (bearer_cache):        5 minutes (default, configurable via [proxy].bearer_cache_ttl)

TTL behavior:

  - Validate does NOT extend TTL but persists LastActivity when stale > sessionTTL/10 (clamped 1m–5m), fire-and-forget
  - Extend explicitly sets new TTL from current time, requires cluster broadcast
  - X.509 sessions: TTL capped to cert_not_after on both Create and Extend
  - Minimum effective storage TTL is 1 minute (enforced as floor)
  - Expired sessions are filtered out by List and GetByModuleKey
  - Storage-level TTL expiry triggers OnDelete callback for automatic index cleanup
  - TTLCapped field in Create/Extend responses indicates certificate-based capping

Troubleshooting

Common symptoms and diagnostic commands:

Session not persisting across requests:

  - Cookie domain mismatch: verify [service].cookie_domain includes all subdomains
  - Secure flag on non-HTTPS: cookies with Secure=true require HTTPS transport
  - SameSite=Strict blocking cross-origin: check if auth redirect crosses domains
  - Cookie name conflict: ensure cookie_name differs from pow_cookie_name and jit2fa cookie
  - max_concurrent_sessions exceeded: new session may evict previous one
  - Check: 'sessions list --user=<username>' to verify session exists in storage

Cross-node session loss (works on one node, fails on another):

  - JetStream KV replication lag: check cluster quorum status with 'status'
  - Saga partial failure: session created but index missing, or vice versa
  - Network partition: quorum requirement (>50% nodes) prevents writes during partition
  - Validate is local-only: session must be replicated to the validating node
  - Check: 'sessions show <session_id>' from multiple nodes to compare
  - Check: 'status' for cluster health and node connectivity

VPN session not created:

  - Cluster quorum not met: Create requires >50% of cluster nodes to confirm
  - Create callback (vpn_device_code) panicked: check logs for panic recovery messages
  - Check: 'sessions list --type=ikev2' to see active VPN sessions
  - Check: 'logs --module=sessions --level=error' for creation failures

Premature session expiration:

  - TTL too short: check [service].session_ttl (default 24h) or caller-specific TTL
  - Clock skew between nodes: ensure NTP is running (chrony or systemd-timesyncd)
  - X.509 TTL capping: session capped to cert_not_after, verify certificate validity
  - TTLCapped=true in response indicates certificate-based cap was applied
  - Check: 'sessions show <session_id>' to compare ExpiresAt vs current time

Session extend rejected:

  - Extend validator rejecting: check 'logs --module=sessions --level=warn'
  - x509_revocation validator: certificate revoked (check OCSP/serial index)
  - Certificate already expired: X.509 sessions cannot extend past cert_not_after
  - Session not found: already expired or revoked before extend attempt
  - Check: 'logs --module=sessions --keyword=validator' for rejection details

Stale sessions appearing in index (ghost sessions):

  - OnDelete callback failed during network partition or node crash
  - GetByModuleKey performs lazy cleanup: stale entries removed on next lookup
  - Manual cleanup: 'sessions revoke <session_id>' for individual sessions
  - Bulk cleanup: 'sessions revoke-user <username>' to clear all user sessions

Session fixation concerns:

  - RegenerateID should be called after authentication or privilege escalation
  - RegenerateID atomically creates new ID with same data, revokes old session
  - Uses Saga: new session stored, index updated, old session deleted (with compensation)
  - Check: 'logs --module=sessions --keyword=regenerated' for regeneration events

Diagnostic commands:

  sessions list                   - List first 20 sessions (all types)
  sessions list --type=ikev2      - List VPN sessions only
  sessions list --type=user       - List authenticated user sessions
  sessions list --user=alice      - List sessions for specific user
  sessions list --offset=20       - Paginate to next page
  sessions list --limit=50        - Show 50 sessions per page
  sessions show <session_id>      - Show full session details with metadata
  sessions revoke <session_id>    - Revoke a single session
  sessions revoke-user <username> - Revoke all sessions for a user
  diagnose user <username>        - Full access diagnostic including session info
  logs --module=sessions          - Session operation logs
  status                          - Cluster health (affects quorum operations)

Architecture

Dual-key storage strategy:

  Primary key:   sessions/{uuid}                     -> Session object
  Secondary key: sessions_index/{type}/{module_key}   -> SessionIndex (list of session IDs)

  Uses '/' separator because NATS KV disallows ':' in key names.

Session lifecycle:

  1. Create: custom ID or crypto/rand 32-byte UUID (base64url) -> Saga(store session + update index)
     -> OnDelete callback registered for automatic index cleanup
     -> Create callbacks fired post-commit (e.g., vpn_device_code for IKEv2)
     -> Replicated to cluster with quorum requirement (>50% nodes)
  2. Validate: Local read from memorystorage -> update LastActivity (local + throttled persist)
     -> Persists to storage when stale > sessionTTL/10 (clamped 1m–5m), fire-and-forget
     -> Does NOT extend TTL (explicit Extend call required for renewal)
  3. Extend: Load session -> run all registered validators in sequence
     -> Cap to cert_not_after for X.509 -> broadcast with quorum, OnDelete preserved
  4. Revoke: Replicated delete to all nodes -> callback fires -> index cleaned
  5. RevokeAll: Load index -> delete each session fire-and-forget -> delete index itself
  6. RegenerateID: Saga(store new session + update index + delete old session)
     -> Preserves original CreatedAt timestamp, copies all metadata
     -> Compensation: rollback new session if old session deletion fails

Saga operations (atomic multi-step with rollback):

  - Create: Step 1 store session (compensate: delete), Step 2 update index
  - RegenerateID: Step 1 store new (compensate: delete), Step 2 add to index,
    Step 3 delete old (compensate: restore old session with TTL and OnDelete callback)
  - Saga commit marks success; saga finalization defers cleanup/rollback

Index consistency model:

  - Automatic cleanup: OnDelete callback removes session_id on TTL expiry or manual delete
  - Lazy cleanup: GetByModuleKey validates each session in index, removes stale entries
  - Saga atomicity: Create and RegenerateID use compensating transactions
  - Delete callbacks execute even if index removal fails (resource cleanup not blocked)

Cluster behavior:

  Sessions (Create/Extend): Replicated with quorum (>50% nodes must confirm)
  Indices (Create/RegenerateID): Replicated with quorum (consistency required)
  Validate: Local read + throttled fire-and-forget broadcast when LastActivity stale (sessionTTL/10, clamped 1m–5m)
  Revoke/RevokeAll: Replicated to all nodes (eventual consistency acceptable)
  OnDelete callbacks: Local execution per node, fire-and-forget, independent of cluster

Callback and validator architecture:

  ExtendValidator: called BEFORE extend, CAN reject (returning error rejects extension)
    Built-in: x509_revocation (checks cert revocation via OCSP cache and serial index)
    For internal certs: checks serial index and moduledata
    For external certs: checks OCSP cache and responder (soft-fails on infra errors)
  CreateCallback: called AFTER successful create, fire-and-forget with panic recovery
    Built-in: vpn_device_code (generates RFC 8628 device codes for IKEv2 sessions)
  DeleteCallback: called AFTER delete and index cleanup, fire-and-forget with panic recovery
    Built-in: vpn_authz_timeout (terminates VPN tunnel when auth session expires)
  Registration: thread-safe via RWMutex, map copied under read lock before execution
  Execution: sequential, each callback wrapped in defer/recover, panics logged not propagated

Performance:

  Direct lookup (Validate): O(1) by session ID, local read only
  Reverse lookup (GetByModuleKey): O(1) index lookup + O(n) session loads
  List all of type: O(N) scan of all sessions in storage, filtered by type
  Typical sessions per user: 1-5 (bounded by max_concurrent_sessions)
  Session object: ~500 bytes with metadata, index entry: ~100 bytes per reference

Security:

  Session IDs: 256-bit crypto/rand, base64url (RawURLEncoding), no padding
  Collision probability: ~2^-61 for 1 billion sessions
  X.509 TTL capping: cert_not_after metadata enforced on Create and Extend
  Revocation: instant via Revoke/RevokeAll (stateful, no blacklist needed)
  Session fixation: RegenerateID for post-authentication ID rotation
  Metadata privacy: plaintext module_keys for lookup (hash sensitive identifiers)

Type registration:

  All request/response types registered for cluster RPC serialization during init.

Interpreting tool output:

  'sessions list':
    Normal: Active sessions show User, Type, IP, Age — all expected
    Stale: Sessions with Age > max_session_duration — cleanup may be delayed (runs every 5m)
    Types: "authenticated" (normal), "mfa_pending" (waiting for MFA, 5min TTL), "password_expired"
    High count: Many sessions for one user → check max_concurrent_sessions setting
    Action: Suspicious session → 'sessions show <id>' for details, 'sessions revoke <id>' to terminate

  'sessions list --user=<username>':
    Empty: User has no active sessions — they are not logged in anywhere
    Multiple types: "authenticated" + "mfa_pending" = user may be stuck in MFA flow
    Action: Clear stuck MFA → 'sessions revoke-user <username>' (terminates ALL sessions)

Relationships

Module dependencies and interactions:

Distributed memory cache: KV store backend. Sessions stored in “sessions” cache type,

  indices in "sessions_index" cache type. Provides TTL expiration and OnDelete callbacks.
  All session CRUD operations delegate to the distributed cache.

proxy: Creates “user” sessions during OIDC SSO callback. Validates

  sessions on every proxied request for authentication enforcement. Session group monitor
  refreshes group membership and revokes sessions on group changes. Creates "cobrowse"
  sessions for co-browse viewer tracking. Creates "bearer_cache" sessions to cache JWT
  ID token verifications (SHA256 of token as custom session ID, configurable TTL).

signin: Creates “user” sessions after successful authentication, “password_expired”

  sessions for password change flow, "mfa_pending" sessions for MFA verification.

signup: Creates “flow_pending” sessions during enrollment, “mfa_pending” during

  TOTP/passkey setup, "user" sessions after completed registration.

vpn (IKEv2): Creates “ikev2” sessions for VPN tunnel tracking. Registers

  vpn_device_code create callback for RFC 8628 device code generation. Registers
  vpn_authz_timeout delete callback for VPN session termination when auth expires.

bastion: Creates “bastion” sessions for SSH connection tracking. Session metadata

  includes connection details for audit trail and session sharing features.

authentication.x509: Registers x509_revocation extend validator. Checks certificate

  revocation status before allowing session extension. Sets cert_not_after metadata
  for TTL capping on both Create and Extend operations.

authentication.jit2fa: Creates “jit2fa_pending” and “jit2fa_auth” sessions with

  separate cookie (jit2fa_key) and configurable TTL (default 8h).

passwordchange: Validates “user” and “password_expired” session types. Creates new

  "user" session after successful password change. Triggers revocation of old sessions.

pow: Creates “pow” sessions after successful proof-of-work challenge. Uses separate

  cookie (hexon_pow) to avoid conflicts with main session cookie.

profile: Creates “user” sessions during profile management operations.
authentication.saml: Uses “saml_test_client” session type for test client.

  Configurable session_ttl for Single Logout tracking (default 8h).

Directory: Group membership changes can trigger session revocation via

  proxy session monitor. Provides fresh group lookups for per-request authorization.

middleware (handlers): Creates “user” sessions during X.509 auto-authentication

  in the middleware chain when client certificate is present.

telemetry: All operations log with structured entries including

  trace IDs and security context (session ID, username). Levels: Error (storage/saga
  failures), Warn (not found, expired, validator rejections), Info (create/revoke
  events), Debug (normal validate/extend operations).

metrics: Runtime counters for all session operations (created, validated, extended,

  revoked, bulk_revoked, regenerated, validation failures by reason).

config ([service]): Provides default TTL values, cookie configuration, and

  max_concurrent_sessions limit. No dedicated [sessions] config section; TTL policies
  are caller-determined (each module passes its own TTL to Create).

Persistent File Storage

JSON-based filesystem storage with atomic writes, NFS shared mode, and path traversal protection

Overview

The filesystem module provides persistent JSON-based file storage for modules that need durable on-disk data. It supports two deployment modes: shared (NFS) where all cluster nodes see the same filesystem, and replicated (local) where each node maintains its own copy with broadcast synchronization.

Core capabilities:

Module-namespaced directories (each module gets isolated storage)
Atomic writes via temporary file + rename pattern (crash-safe)
Optional file locking via flock for NFS shared mode
JSON marshaling/unmarshaling for structured data
Full file lifecycle: Save, Load, Delete, Move, List, Exists
Path traversal protection with multi-layer validation
Fuzz-tested security boundary (traversal, null bytes, unicode attacks)

Storage modes:

  Shared (NFS): All nodes see the same files. Operations are local only
  (no broadcast needed). File locking prevents race conditions between nodes.
  Example path: /shared/webauthn/passkeys/active/abc123.json

  Replicated (Local): Each node maintains its own filesystem. Write operations
  writes are replicated to all nodes. No locking needed since each node
  owns its local copy.
  Example path: /data/webauthn/passkeys/active/abc123.json

File permissions: 0644 (files), 0755 (directories). Module directories are created on demand during Save operations.

Config

Configuration under [filesystem]:

[filesystem]

  base_path = "/shared"    # Root directory for all module storage
  mode = "shared"          # "shared" (NFS) or "local" (replicated per node)
  use_flock = true         # Enable file locking (recommended for shared mode)

Mode selection guidance:

  shared: Use when all nodes mount the same NFS/distributed filesystem.
    - Set use_flock = true to prevent concurrent write races
    - Operations are local only (no cluster replication needed)
    - Simplest setup, but requires reliable NFS infrastructure

  local (replicated): Use when each node has independent local storage.
    - Write operations (Save, Delete, Move) are replicated to all nodes
    - Read operations (Load, List, Exists) are local only
    - No file locking needed (each node owns its storage)
    - More resilient to NFS failures, but eventual consistency

Operation routing by mode:

  Shared mode:
    Save, Load, Delete, Move -> all execute locally (no cluster broadcast)

  Replicated mode:
    Save, Delete, Move -> replicated to all cluster nodes
    Load               -> local only (read from local storage)

Hot-reloadable: None. Changes to base_path, mode, or use_flock require restart.

Troubleshooting

Common symptoms and diagnostic steps:

File not found after Save (replicated mode):

  - Verify Save used cluster-wide replication (replicated mode requires it)
  - Check if querying node received the write replication (network partition)
  - Replication is eventually consistent; small delay before Load on other nodes
  - Verify base_path is correct on all nodes (must match across cluster)

Permission denied errors:

  - Check filesystem permissions: files need 0644, directories need 0755
  - Verify the hexon process user has write access to base_path
  - NFS mount options: ensure no_root_squash or correct uid/gid mapping
  - SELinux/AppArmor may block writes to NFS mounts

Path traversal error (ErrPathTraversal):

  - Module name contains '/', '\', or '..' (invalid characters)
  - Subpath starts with '/' or '\' (must be relative)
  - Subpath contains '..' traversal sequences after path cleaning
  - Resolved path escapes the module directory boundary
  - This is a security feature; do not attempt to bypass it

File locking issues (shared/NFS mode):

  - Stale locks after crash: flock is released on process exit by the OS
  - NFS lock daemon (lockd/statd) must be running on all nodes
  - NFSv4 has built-in locking; NFSv3 requires separate lock services
  - Deadlock: operations hold locks briefly (JSON marshal + write + rename)
  - If use_flock = false on shared mode, concurrent writes may corrupt files

Atomic write failures:

  - Disk full: temporary file creation fails before rename
  - Cross-device rename: base_path and temp dir must be on same filesystem
  - Check disk space: df -h on the base_path partition
  - Temp file cleanup: orphaned .tmp files indicate interrupted writes

List operation returns empty:

  - Verify the subpath directory exists (directories created on Save only)
  - Check glob pattern syntax (uses filepath.Glob matching rules)
  - Pattern is matched against filenames only, not full paths
  - Module directory is base_path/module_name/subpath

Data corruption or invalid JSON:

  - Atomic writes prevent partial writes; corruption suggests disk issues
  - NFS cache coherence: mount with actimeo=0 for immediate consistency
  - Check for concurrent writes without flock enabled
  - Validate JSON: load the file directly and check for syntax errors

Architecture

Write path (Save operation):

  1. Validate path (module name + subpath traversal checks)
  2. Create module directory tree if needed (MkdirAll with 0755)
  3. Marshal data to JSON with indentation
  4. Create temporary file in same directory
  5. Write JSON content to temporary file
  6. Sync to disk (fsync)
  7. Atomic rename: tmp file -> target path
  8. Optional: acquire/release flock around steps 4-7 (shared mode)

Read path (Load operation):

  1. Validate path
  2. Read file contents (os.ReadFile)
  3. Unmarshal JSON into interface{}
  4. Return data with Found=true, or Found=false if file not found

File locking (shared mode only):

  Uses syscall.Flock with LOCK_EX (exclusive) for writes and LOCK_SH (shared)
  for reads. Locks are advisory and only effective when all accessors use flock.
  Lock scope is per-file, not per-directory.

Module isolation:

  Each module's storage is confined to base_path/module_name/. Path validation
  ensures no module can read or write outside its own directory. The validation
  is defense-in-depth: multiple checks at different levels prevent escape.

Relationships

Module dependencies and interactions:

webauthn: Stores passkey credentials as JSON files. Uses shared mode for

  cross-node passkey availability. Files organized in active/revoked subdirectories.

acme (CA): Stores issued certificates, private keys, and ACME account data.

  Requires persistent storage that survives restarts.

config: Filesystem base_path and mode read from TOML configuration.

  No hot-reload; changes require restart.

telemetry: Structured logging for all file operations (save, load, delete, move)

  with module name, subpath, and error details.

memory (memorystorage): Complementary storage. Use filesystem for persistent

  data that must survive restarts; use memory for ephemeral data with TTL.
  Some modules use both: memory for fast lookups, filesystem for durable backup.

cluster: In replicated mode, cluster health affects write propagation.

  Node failures may result in missed broadcasts (eventually consistent).

Distributed Memory Storage

Cluster-aware in-memory key-value store with TTL expiration, callbacks, and NATS JetStream persistence

Overview

The memory module provides distributed in-memory key-value storage with automatic TTL-based expiration, cluster-wide replication, and optional NATS JetStream persistence. It is the primary ephemeral storage layer used by authentication, sessions, OTP, PoW, OIDC, and other security-critical modules.

Core capabilities:

Namespace-isolated caches (cache types prevent key collisions)
Automatic TTL-based expiration with background eviction every 30 seconds
OnSet and OnDelete callback support (fire-and-forget, local only)
Thread-safe operations with mutex protection
Cluster-wide replication (writes replicated to all nodes)
Eventually consistent reads (local only, no network overhead)
NATS JetStream KV persistence for crash recovery (optional)
Peer-to-peer bootstrap fallback when JetStream unavailable
SetNX for atomic set-if-not-exists (distributed locks)
Touch for TTL renewal without value modification

Consistency model:

  Reads (Get, All): Local only, O(1), eventually consistent, no network
  Writes (Set, Delete): Local immediate + optional replication to all nodes
  Writes are best-effort with no quorum requirement by default.
  For strong consistency, use cluster-wide replication with quorum confirmation.

Storage architecture: two-level map structure

  caches[cache_type][key] -> storageEntry with Value, Expiration, Callbacks

Data types stored in memory must be compatible with the cluster serialization layer. Custom structs, slices, and maps with custom types are supported.

Config

Configuration under [cluster] (memory persistence):

[cluster]

  cluster_path = "/var/lib/hexon/cluster"  # Base path for JetStream storage
  persist_memory = true                    # Use FileStorage for KV bucket

  memory_kv_max_write = 10               # Max concurrent KV writes (1-100)

When persist_memory = true and cluster_path is set:

  - NATS JetStream KV bucket "hexon_storage_memory" is created
  - Writes are asynchronously persisted to JetStream after local cache update
  - Concurrent KV writes throttled by memory_kv_max_write (default 10)
  - On startup, all entries are bootstrapped from JetStream KV
  - JetStream uses Raft consensus in 3+ node clusters for durability
  - Data survives full cluster restarts

When persist_memory = false or cluster_path is unset:

  - KV bucket uses MemoryStorage (data lost on restart)
  - Falls back to peer-to-peer bootstrap from live cluster nodes
  - Suitable for truly ephemeral data (PoW challenges, rate limit counters)

Key encoding for NATS KV:

  NATS KV keys only allow [-/_=\.a-zA-Z0-9]+. Keys from external sources
  (LDAP groups with spaces, email addresses) are base64url encoded:
    Format: {cacheType}/{base64url(key)}
    Example: "directory_groups/UmVwbGljYXRpb24gQWRtaW5pc3RyYXRvcnM"

Bootstrap sequence on node startup:

  1. Attempt to read all entries from NATS JetStream KV
  2. Populate in-memory cache with non-expired entries
  3. If JetStream unavailable, request data from cluster peers
  4. Merge peer responses into local cache
  5. Live broadcasts during bootstrap take precedence over stale KV data

No hot-reloadable settings. Changes to cluster_path or persist_memory require a full restart.

Troubleshooting

Common symptoms and diagnostic steps:

Key not found after Set (cross-node):

  - Verify Set used cluster-wide replication, not local-only (replication required for cross-node visibility)
  - Reads are local only; small propagation delay is normal
  - Use quorum-confirmed replication before reading for strong consistency
  - Check cluster health: nodes must be reachable for broadcast delivery
  - Verify the stored type is compatible with cluster serialization

Serialization errors (encoding/decoding failures):

  - Custom types stored in memory must be compatible with cluster serialization
  - Type registration happens during module initialization
  - Built-in types (string, int, bool, []byte) work out of the box
  - Error message includes the unregistered type name

TTL expiration not working (entries persist beyond TTL):

  - Background eviction runs every 30 seconds (not instantaneous)
  - Expired entries are immediately invisible to Get (Found=false)
  - Physical cleanup happens on next eviction cycle
  - Very large caches (100K+ entries) may slow eviction scans
  - Check if TTL was set to 0 (zero TTL means no expiration)

OnDelete callback not firing:

  - Callbacks are local only (fire on the node that runs eviction)
  - Callbacks are fire-and-forget (errors are logged but not returned)
  - The callback module and operation must exist and be registered
  - Check telemetry logs at ERROR level for callback failures
  - Callbacks do NOT fire on nodes that receive broadcast deletions
    (only the originating node triggers the callback)

Data lost after cluster restart:

  - Verify persist_memory = true in [cluster] config
  - Verify cluster_path is set and writable
  - Check NATS JetStream health (3+ nodes needed for Raft consensus)
  - 2-node clusters: JetStream may not achieve quorum, data at risk
  - Without persistence, data is only in memory (lost on restart)
  - Bootstrap logs show how many entries were recovered from KV

Memory usage growing unbounded:

  - Check TTL values: missing or zero TTL entries never expire
  - Use All operation to inspect cache sizes per cache type
  - Per-entry overhead: approximately 150 bytes plus key and value sizes
  - Monitor eviction cycle: entries should be cleaned every 30 seconds
  - Consider partitioning large cache types into smaller namespaces

SetNX returning Set=false unexpectedly:

  - Key already exists in local cache (including expired-but-not-evicted)
  - Another node set the key via broadcast before your SetNX
  - SetNX is local atomic only; not a distributed lock by itself
  - For distributed locking, combine SetNX with cluster-wide replication + short TTL

Bootstrap failures on startup:

  - JetStream KV unavailable and no peer nodes responding
  - Node starts with empty cache; data populates as broadcasts arrive
  - Check NATS connection health and cluster discovery
  - Verify cluster_path directory exists and has correct permissions
  - Base64url decoding errors: corrupted KV keys (manual cleanup needed)

KV “too many requests” errors at startup (memory.kv.put_error):

  - Caused by bulk operations (e.g. directory fullSync) spawning many concurrent
    KV writes that overwhelm JetStream rate limits
  - Each user/group sync fires ~3 Set() calls per user + ~2 per group
  - A directory with 40 users and 20 groups = ~160 concurrent writes
  - Fix: increase memory_kv_max_write in [cluster] config (default 10, max 100)
  - These errors are non-fatal: data is already in local cache, only persistence
    is delayed. Entries will be persisted on subsequent writes or next restart.
  - Monitor: logs search "memory.kv.put_error" --since=5m

Architecture

Data flow for write operations:

  1. Caller invokes Set/Delete (local-only or cluster-wide)
  2. Local cache updated immediately (mutex-protected)
  3. OnSet callback triggered if registered (fire-and-forget)
  4. If cluster-wide: replicated to all cluster nodes
  5. Async persistence goroutine acquires semaphore slot (bounded by memory_kv_max_write)
  6. KV write to NATS JetStream (best-effort; skipped on shutdown)
  7. JetStream Raft replicates to follower nodes (3+ node clusters)
  Note: SyncSet bypasses the semaphore (synchronous, caller-blocking, used for signing keys)

Data flow for read operations:

  1. Caller invokes Get/All (local-only)
  2. Local cache lookup (O(1) for Get, O(n) for All)
  3. Expired entries filtered out (Found=false)
  4. No network overhead, no disk I/O

Background eviction loop:

  1. Wakes every 30 seconds
  2. Scans all cache types and all entries
  3. Identifies entries with Expiration < now
  4. Deletes expired entries from local cache
  5. Replicates eviction to cluster nodes
  6. Triggers OnDelete callbacks on local node
  7. Persists deletion to JetStream KV

NATS JetStream KV architecture (when persistence enabled):

  - Bucket: hexon_storage_memory
  - Raft consensus for writes (3+ nodes)
  - Leader election with automatic failover
  - Write-ahead log replicated to followers
  - Can tolerate N/2-1 node failures (e.g., 1 of 3)

Peer-to-peer bootstrap fallback:

  - Used when JetStream is unavailable (2-node clusters, JetStream down)
  - Requests data from all cluster peers
  - Merges responses, preferring newest entries on conflict
  - Graceful degradation: memory storage works without persistence

Relationships

Module dependencies and interactions:

sessions: Primary consumer. Stores user sessions with 12-24h TTL.

  Uses OnDelete callback for session cleanup and index removal.
  Session indices stored in separate "sessions_index" cache type.

OTP: Stores one-time passwords with 5-10 minute TTL.

  Keys are hashed email addresses for privacy. Replicated cluster-wide for
  OTP availability. OnDelete triggers expiration notifications.

OIDC provider: Stores authorization codes, access tokens, refresh

  tokens, and DPoP JTI values. Each in separate cache types with appropriate
  TTLs (codes: 5-10min, tokens: 1-24h). Critical for OAuth2 flow integrity.

Proof-of-work: Stores proof-of-work challenge tokens with short TTL.

  Local-only storage (challenges are node-specific).

WebAuthn: Stores WebAuthn challenges during registration

  and authentication ceremonies. Short TTL (5 minutes).

Kerberos: Stores Kerberos ticket data with ticket lifetime TTL.
firewall: Uses SetNX for cluster-wide hostname tracking (wildcard DNS).

  Replicated to all nodes for cross-node hostname state. OnDelete for TTL-based rule cleanup.

VPN IP pool: Uses Touch to renew IP allocation TTLs during active sessions.

  SetNX for atomic IP reservation (prevents double allocation).

storage.filesystem: Complementary module. Use memory for fast ephemeral

  lookups; use filesystem for persistent data surviving restarts.

telemetry: All operations logged at DEBUG level. Errors (callback failures,

  eviction issues) logged at ERROR. Metrics for cache sizes and hit rates.

cluster (NATS): JetStream KV persistence layer. Raft consensus provides

  durability for 3+ node clusters. Bootstrap reads from JetStream on startup.