Connectivity

Client Access (HexonClient)

Transparent L3 network access via QUIC tunnels for CLI tools and native applications

Overview

The Client Access subsystem enables end users (DBAs, developers, operators) to transparently access internal resources through a lightweight QUIC tunnel. The HexonClient binary captures IP packets via TUN + gVisor netstack, extracts TCP flows, and dials each flow as a QUIC stream to the gateway.

The gateway side (this module) handles:

QUIC listener on a dedicated port with ALPN “hexon-client” and TLS 1.3
Two authentication paths: server-side device code (RFC 8628) for interactive use, JWT with RFC 5705 channel binding for reconnect/automation
Per-user route derivation from firewall ACL rules (CIDR + Site routes)
Virtual IP allocation from a dedicated subnet (default 100.64.208.0/22)
Per-stream firewall ACL check before dialing backends
Direct dial or connector tunnel routing based on HostAlias Site field
Bidirectional splice with 32KB pooled buffers and half-close propagation
DNS resolution on the control stream for split DNS
DNS defense-in-depth: per-session O(1) rate limiting + ACL enforcement (RFC 8914)
Token refresh with group-change detection and mid-session route updates
Cluster-wide session tracking

This mirrors the connector architecture but reversed: the client opens streams, the gateway accepts and dials backends.

Configuration

Configuration uses the [client_access] TOML section:

  [client_access]
  enabled = true
  port = 8445
  # network_interface = ""        # Bind to specific interface (falls back to service.network_interface)
  # cert = ""                     # Dedicated TLS cert (falls back to SNI/auto-TLS)
  # key = ""                      # Dedicated TLS key
  subnet = "100.64.208.0/22"         # Virtual IP pool for clients (1022 addresses)
  gateway_ip = "100.64.208.1"        # Gateway IP within subnet (excluded from pool)
  dns_upstream = ["10.0.0.53"]    # DNS resolvers for client queries
  dns_domains = []                # Additional DNS domains pushed to all clients
  # cidrs = ["10.0.0.0/22"]      # Additional CIDR routes pushed to all clients
  heartbeat_interval = "30s"      # Heartbeat frequency (session TTL = 3x this)
  token_refresh_interval = "45m"  # Client token refresh interval
  max_idle_timeout = "5m"         # QUIC idle timeout
  max_clients = 1000              # Maximum concurrent client connections
  max_streams_per_client = 100    # Maximum concurrent TCP streams per client
  dns_rate_limit = 100            # Maximum DNS queries per second per client
  # required_groups = ["engineers", "operators"]  # Empty = any authenticated user

The subnet must not overlap with the IKEv2 VPN subnet (default 100.64.0.0/22). Each connected client gets one virtual IP from the pool.

Routes pushed to clients come from two sources:

Firewall host aliases: CIDRs and IPs from aliases matched by user groups
Config-level cidrs: pushed to all clients regardless of group membership Both are merged (deduplicated) before sending in ClientAck.

Admin commands

Admin CLI commands:

  clients list [--user=X]              List connected hexonclient sessions (cluster-wide)
  clients show <session_id>            Show full session details (device, network, streams, traffic, timing)
  clients disconnect <user> [id]       Disconnect all sessions for user, or a specific session [WRITE]

Bastion shell commands (self-service, filtered to own sessions):

  clients                              List your active hexonclient sessions
  clients list                         Same as above
  clients disconnect [session_id]      Disconnect your own session(s)

Security

Two authentication paths (determined by whether client sends a token):

Device code flow (interactive — RFC 8628, same as bastion SSH):

Client establishes QUIC/TLS 1.3 connection (ALPN: “hexon-client”)
Client sends ClientAuth with empty token (signals device code request)
Gateway initiates device code authorization (server-side, no HTTP from client)
Gateway sends DeviceCodeChallenge: verification URI, user code, expiry
Client displays QR code + clickable URL + user code
Gateway polls the device code service until authorized, denied, or expired
On authorization: gateway extracts claims (username, email, groups) from poll response
Gateway checks required_groups, derives routes, allocates VIP
Gateway sends ClientAck with VIP, routes, DNS, and JWT tokens for reconnection Reconnected sessions use the JWT path below (no re-authentication needed).

JWT flow (reconnect / automation):

Client establishes QUIC/TLS 1.3 connection (ALPN: “hexon-client”)
Client sends ClientAuth: JWT + HMAC-SHA256(token, TLS exporter) proof
Gateway validates JWT (extracts username, groups)
Gateway verifies channel binding proof (RFC 5705 prevents token replay)
Gateway checks required_groups (if configured): user must have ANY listed group
Client sends ClientRegister with device metadata
Gateway derives per-user routes from firewall ACL rules
Gateway sends ClientAck with VIP, routes, DNS, token refresh interval

Per-stream access control:

Each QUIC stream carries a DialHeader (host, port, protocol)
Gateway checks firewall access control (user groups vs target host/port/protocol)
Denied streams get DialStatusDenied response immediately
Only allowed streams proceed to backend dial

DNS defense-in-depth (same model as IKEv2 VPN DNS):

Per-session rate limiting: O(1) time-bucketed rolling window (dns_rate_limit qps)
DNS ACL: after resolve, firewall checks user groups vs host aliases
ACL-denied queries return DNSStatusDenied (RFC 8914 REFUSED) — prevents information leakage
ACL call failure fails open (dial-time ACL is the authoritative control)

Token refresh:

Client sends TokenRefresh with new JWT + proof before token expires
Gateway re-validates JWT and channel binding
Gateway re-checks required_groups: if user lost membership, connection is terminated
If groups changed: re-derive routes, send RouteUpdate with add/remove entries
Bad token on refresh kills the connection (security boundary)

Troubleshooting

Common symptoms and diagnostic steps:

Client cannot connect:

  - Check listener: 'status summary' shows clientaccess listener state
  - Check config: 'config show client_access' (enabled, port, subnet)
  - Check required_groups: 'config show client_access' — user must be in listed groups (empty = any)
  - Check certs: 'certs list' or 'diagnose domain <hostname>'
  - Max clients reached: 'logs search clientaccess --level=warn'
  - Group denied: 'logs search clientaccess --level=warn' shows "group access denied"

Client connected but cannot reach services:

  - Check pushed routes: 'config show client_access' — cidrs must include destination subnet
  - Check firewall rules: user's groups must match rule sources
  - Check HostAlias: destination alias must have matching hosts (CIDRs for TUN routes, wildcards for DNS only)
  - Check connector: if Site is set, connector must be connected
  - 'logs search clientaccess-dial --level=warn' for denied dials

DNS not resolving:

  - Check dns_upstream config: must point to reachable resolvers
  - Check dns_domains: domains must be in the pushed list for split DNS
  - 'logs search clientaccess-dns' for resolution errors
  - DNS ACL denied (REFUSED): check user's groups match firewall rules for the hostname
  - DNS rate limited (SERVFAIL): check dns_rate_limit setting (default 100 qps)

Token refresh failures:

  - 'logs search clientaccess-refresh --level=warn'
  - Invalid token: OIDC provider may have rotated keys
  - Channel binding failure: possible MITM or TLS session change

Relationships

Module dependencies:

devicecode: Server-side device code authorization (RFC 8628) for interactive authentication
oidc: JWT validation for reconnect/automation authentication
firewall: Per-stream access control, DNS ACL enforcement, host alias route derivation
dns: DNS resolution for client split DNS queries
sessions: Cluster-wide session tracking (create, validate, revoke)
connectors: Site-based routing through connector tunnels
IP pool: Virtual IP allocation from dedicated subnet
listener: QUIC listener with TLS 1.3 and idle timeout
telemetry: Structured logging and Prometheus metrics

QUIC Connector

Identity-aware tunnels to remote sites via QUIC for proxy, bastion, and forward proxy traffic

Overview

The Connector subsystem enables identity-aware access to services running at remote sites (Kubernetes clusters, data centers, VPCs) without VPN tunnels, IPsec, or network-level connectivity.

A lightweight binary (hexonconnect) deployed at the remote site establishes an outbound QUIC connection to Hexon. Hexon then sends “dial” commands through this tunnel whenever a proxy mapping, bastion session, forward proxy rule, or firewall policy references that site via the “site” parameter.

Key capabilities:

Zero-trust remote access: connector dials only what Hexon asks, nothing else
Opaque site namespace: same IPs and DNS names across sites are irrelevant
Stateless token auth: HMAC-derived tokens validated without storage
Channel-bound authentication: RFC 5705 TLS Exported Keying Material prevents

  replay and MITM attacks — the token never travels on the wire

Multi-instance HA: multiple connectors per site with adaptive load balancing
Cross-node routing: any cluster node can route to any connector via adaptive

  inter-node forwarding — requests arriving at a node without connector instances
  are transparently forwarded to a node that has them

Auto-reconnect: connector never gives up, exponential backoff on disconnect
CDN-compatible: optional dedicated hostname and TLS certificate for direct access

Configuration

Configuration uses the [connector] TOML section:

  [connector]
  enabled = true
  port = 8444
  # hostname = "connector.example.com"  # Optional: dedicated hostname (CDN bypass)
  # cert = "/path/to/cert.pem"          # Optional: file path or inline PEM
  # key = "/path/to/key.pem"            # Optional: file path or inline PEM

  [[connector.sites]]
  id = "prod-asia-a8f3c1"
  name = "Production Asia"
  cidrs = ["203.0.113.0/24"]
  max_instances = 3
  rebalance = true              # Distribute across cluster nodes (default: true)
  rebalance_retries = 5         # Accept after N soft-rejects (default: 5, 1-10)

TLS certificate resolution:

  1. connector.cert/key when set (static certificate)
  2. SNI callback: auto-TLS (ACME), certmanager, wildcard, or service certificate
  If connector.hostname is set and no cert/key is provided, ACME will automatically
  provision a certificate for the connector hostname.

Usage across subsystems — add “site” parameter:

  [[proxy.mapping]]
  app = "API Asia"
  host = "api-asia.example.com"
  service = "http://api.default.svc.cluster.local:8080"
  site = "prod-asia-a8f3c1"

  # Shadow targets can also route through connectors:
  [[proxy.mapping.shadow]]
  name = "staging-mirror"
  service = "https://staging.internal:8443"
  site = "staging-eu"

  # Circuit breaker fallback can use a different connector site:
  [proxy.mapping.circuit_breaker]
  fallback_mode = "service"
  fallback_service = ["http://dr-backend:8080"]
  fallback_site = "dr-europe"

  # SSH cert rules — route bastion SSH through connector:
  [[bastion.ssh_cert.rules]]
  name = "remote-dc-ssh"
  groups = ["devops"]
  destinations = ["*.internal"]
  site = "prod-asia-a8f3c1"

  # SQL bastion — route database connections through connector:
  [[sql_bastion.sites]]
  name = "postgres-remote"
  type = "postgres"
  host = "pg.internal"
  port = 5432
  site = "prod-asia-a8f3c1"

  # Firewall host aliases — route forward proxy traffic through connector:
  [[firewall.aliases.hosts]]
  name = "remote_services"
  hosts = ["gitlab.internal", "jenkins.internal"]
  site = "prod-asia-a8f3c1"
  # Aliases with site skip nft rules — traffic goes through userspace QUIC tunnel

Token generation is deterministic from the cluster key — any node can validate.

Admin commands

Admin CLI:

  connector list                        List configured sites and live connections
  connector show <site-id>              Show site config, token, and connected instances
                                        (includes platform, origin with geo/ASN, system labels)
  connector create <site-id>             Create new site (generates token)
  connector revoke <site-id>            Block site, disconnect active QUIC tunnels
  connector instances <site-id>         List connected instances with metrics

The “connector show” output includes per-instance details: platform (OS/arch), origin IP with country and ASN (via geo module), and system labels reported by the connector binary (kernel, OS version, runtime environment, memory, virtualization, PID, UID/GID).

The “connector revoke” command disconnects active QUIC tunnels in addition to revoking cluster sessions, causing connectors to reconnect (and be rejected).

Config reload cleanup: when a site is removed from config (via GitOps or hot reload), active QUIC connections for that site are automatically disconnected. The connector binary will reconnect but be rejected because the site is no longer in config. This prevents stale sessions from lingering in JetStream KV.

Security

Trust boundaries:

Hexon Cluster (full trust): policy enforcement, identity, routing
Connector (minimal trust): dials only what Hexon asks, no autonomous access

Authentication flow:

QUIC/TLS 1.3 connection established (server cert, ECDHE, forward secrecy)
Both sides compute TLS exporter keying material (RFC 5705) with an application-specific label
Connector sends: site_id + HMAC of token bound to the TLS channel
Hexon validates by recomputing from cluster key

Additional protections:

Optional CIDR allowlist per site restricts connector source IPs
max_instances limit prevents token abuse
Instance selection uses epsilon-greedy adaptive algorithm with circuit breaker
QUIC relay loop prevention: relay handler only dispatches locally,

  preventing infinite forwarding loops between nodes

Cluster-wide rebalancing: soft-rejects excess connectors so they redistribute

  across gateway nodes (configurable per site, default 5 retries before accepting)

Inter node forwarding

All cluster nodes can route to any connector site through QUIC relay.

When a request arrives at a node without local connector instances (or after local retries are exhausted), the dispatcher transparently relays through a peer node. The relay uses QUIC on the same connector port (8444) with ALPN “hexon-relay” and mTLS for peer authentication. Each relay request opens a QUIC stream, sends a dispatch header, and the peer dispatches locally through its QUIC connector tunnel.

All traffic types converge through the same dispatch path — this covers reverse proxy, forward proxy, client access (TCP/UDP), SSH bastion, SQL bastion, shadow targets, and probes.

Remote node IPs are cached (5s refresh) from cluster-wide connector sessions. Failed nodes are tracked by the cluster discovery health checks.

Loop prevention:

The relay handler only dispatches locally (never relays further)
A peer with no local instances returns an immediate error

Troubleshooting relay:

Client-side metrics: relay_total (attempts), relay_success_total, relay_errors_total
Server-side metrics: relay_served (requests handled), relay_rejected_total (auth failures)
Relay rejected with “no_certificate”: peer isn’t presenting its service cert
Relay rejected with “not_peer”: source IP not in cluster discovery peer list
Relay “no_instances”: the peer node also has no local connectors for the site
Check logs: ‘logs search connectors.relay —level=warn’

Quic tuning

QUIC performance tuning applied to both gateway and connector sides:

Flow control windows (tuned for database and bulk transfer workloads):

  - Stream: 2MB initial, 8MB max
  - Connection: 4MB initial, 20MB max
  - Stream-to-connection ratio: 2:5
  Larger initial windows reduce round-trips for big responses (SQL results, file transfers).

Persistent QUIC transport (connector side):

  - hexonconnect reuses one UDP socket across reconnections
  - Avoids per-connection socket allocation and kernel offload state loss
  - Enables future QUIC connection migration if network interface changes

Stream error handling:

  - Error paths immediately release QUIC stream resources instead of graceful close
  - Frees resources under load without waiting for peer acknowledgment

Max concurrent streams:

  - Gateway: configurable per listener (default 100)
  - Connector: 1024 (high concurrency for multiplexed tunnel streams)

Rebalancing

When multiple connector replicas start simultaneously (e.g., Kubernetes Deployment with 3 replicas), they may all connect to the same gateway node via DNS or a load balancer. The rebalance mechanism redistributes them:

First connector for a site on a node is always accepted
Subsequent connectors check cluster distribution: if this node has more instances

   than the least-loaded remote node, the registration is soft-rejected

The connector reconnects with a short backoff (2 seconds) — DNS/LB randomness

   typically sends it to a different node

After N soft-rejects (configurable, default 5), the node accepts anyway

Per-site configuration:

  rebalance = true             # Enable cluster-wide load distribution (default: true)
  rebalance_retries = 5        # Max soft-rejects before accepting (1-10, default: 5)

Rebalance is best-effort — sticky load balancers may prevent redistribution, so the retry budget ensures connectors are never stuck. Metrics: rebalance_reject_total and rebalance_accept_total track distribution activity per site.

Forward Proxy

HTTP CONNECT and CONNECT-UDP service layer with bearer token auth, MASQUE UDP proxying, PAC endpoints, and CDN bypass

Overview

The forward proxy provides VPN-like access to internal resources directly from the browser — no client software needed. It processes HTTP CONNECT requests for TCP tunneling (RFC 9114) and CONNECT-UDP requests for UDP proxying via MASQUE (RFC 9298).

Core capabilities:

HTTP CONNECT handling for TCP proxy tunneling
CONNECT-UDP handling for UDP proxy tunneling (MASQUE/QUIC)
PAC file endpoint serving at configurable path (default /proxy.pac)
Browser extension config endpoint at /proxy/config
Browser extension setup/login endpoint at /proxy/setup
CONNECT rejected on main service port (421 Misdirected) — proxy port only
Geo-IP and time-based restriction enforcement before tunneling
DNS resolution with system DNS fallback
Bidirectional TCP relay with idle timeout and max connection duration
HTTP/2+ full duplex CONNECT stream support (RFC 8441)
HTTP/1.1 connection hijacking for classic CONNECT tunneling
Connection tracking and byte-level metrics recording

The service runs on a dedicated port (forward_proxy.port) separate from the main service port for security isolation. CONNECT requests on the main port receive 421 Misdirected Request, directing clients to the correct proxy port.

TCP CONNECT request flow:

  1. Extract client IP (CDN bypass mode uses RemoteAddr directly)
  2. Check geo-IP and time-based restrictions
  3. Validate target host:port format (RFC 1035 hostname length limit)
  4. Extract bearer token from Proxy-Authorization header
  5. Authenticate token and check user is not disabled
  6. Check ACL (firewall group rules for target destination)
  7. Check per-user rate limit (fail-closed)
  8. Resolve hostname via DNS module (system DNS fallback)
  9. Establish backend TCP connection with configurable timeout
  10. Start bidirectional relay with idle timeout and max duration
  11. Record metrics (bytes sent/recv, duration, success)

CONNECT-UDP request flow:

  1-7. Same as TCP (restrictions, auth, ACL, rate limit)
  8. MASQUE UDP proxying (capsule protocol, socket management)
  9. Record metrics after session completes

Bearer token authentication supports two formats:

  - "Bearer <token>" header (direct bearer token)
  - "Basic <base64>" header where username is "_bearer_" and password is the token
    (Chrome's onAuthRequired format for Proxy-Authorization)

Config

Service-level configuration under [forward_proxy] in hexon.toml:

[forward_proxy]

  enabled = true                       # Enable forward proxy (default: false)
  port = 8443                          # Dedicated proxy port (must differ from service.port)
  public_port = 8443                   # External port for PAC URLs (NAT/LB scenarios)
  hostname = "proxy.example.com"       # Separate hostname for CDN bypass (optional)
  enable_tcp = true                    # Enable TCP CONNECT handling (default: true)
  enable_udp = true                    # Enable CONNECT-UDP/MASQUE handling (default: true)
  udp_proxy_path = "/masque"           # URI path for CONNECT-UDP requests (default: /masque)
  auth_mode = "bearer"                 # Authentication mode for CONNECT requests
  buffer_size = "32KB"                 # TCP relay buffer size (default: 32KB)
  connect_timeout = "10s"              # Backend connection timeout
  idle_timeout = "5m"                  # Idle connection timeout (no data flowing)
  max_connection_duration = "24h"      # Maximum connection duration (hard limit)
  preserve_client_port = true          # Use client's port in Alt-Svc header

  # Token settings (used by /proxy/config endpoint)
  token_ttl = "5m"                     # Token validity duration (default: 5m, min: 30s)
  token_refresh_interval = "60s"       # Extension refresh interval (default: 60s, min: 5s)

  # TLS certificate for the proxy hostname (when hostname differs from service)
  cert = "/path/to/cert.pem"           # File path or inline PEM
  key = "/path/to/key.pem"             # File path or inline PEM

  # Geo-IP restrictions (overrides [service] if set)
  geo_enabled = true                   # Enable geo-IP restrictions
  geo_allow_countries = ["US", "CA"]   # Allowed country codes (ISO 3166-1 alpha-2)
  geo_deny_countries = []              # Denied country codes
  geo_bypass_cidr = ["10.0.0.0/8"]    # CIDR ranges that bypass geo checks
  geo_deny_code = 403                  # HTTP status for geo denial
  geo_deny_message = "Access denied from your location"

  # Time-based restrictions (overrides [service] if set)
  time_enabled = true                  # Enable time-based restrictions
  time_timezone = "America/New_York"   # Timezone for time checks
  time_allow_days = ["Mon","Tue","Wed","Thu","Fri"]
  time_allow_hours = "09:00-18:00"     # Allowed hours range
  time_deny_code = 403                 # HTTP status for time denial
  time_deny_message = "Access not permitted at this time"

PAC file settings

[forward_proxy.pac]

  enabled = true                       # Enable PAC endpoint (default: true)
  path = "/proxy.pac"                  # PAC file URL path
  cache_ttl = "15m"                    # PAC response Cache-Control max-age
  group = "proxy-users"                # Required group for PAC/config/setup access (optional)
  use_firewall_targets = true          # Derive PAC targets from firewall rules

Endpoints registered by the service:

  GET /proxy.pac       - PAC file (requires auth, optional group)
  GET /proxy/config    - JSON: PAC + token + refresh interval + username + server_time
  GET /proxy/setup     - Login trigger page for browser extensions

CDN bypass mode:

  When forward_proxy.hostname differs from service.hostname, the proxy accepts
  direct connections (no CDN in between). Client IP is extracted from RemoteAddr
  instead of X-Forwarded-For. This is typical because CDNs do not support HTTP CONNECT.

Hot-reloadable: token_ttl, token_refresh_interval, geo/time restrictions, PAC settings,

  rate_limit_per_user, bandwidth_limit_per_user, buffer_size, idle_timeout,
  max_connection_duration.

Cold (restart required): enabled, port, hostname, enable_tcp, enable_udp,

  udp_proxy_path, preserve_client_port.

Troubleshooting

Common symptoms and diagnostic steps:

CONNECT requests returning 421 Misdirected Request:

  - Client is sending CONNECT to the main service port instead of the proxy port
  - The forward proxy middleware rejects CONNECT on the main port by design
  - Verify client is configured to use forward_proxy.port (or public_port)
  - Check error message for the correct proxy hostname:port

407 Proxy Authentication Required:

  - Missing Proxy-Authorization header on CONNECT request
  - Token format not recognized (must be "Bearer <token>" or "Basic <base64>")
  - For Chrome extension: username must be "_bearer_" in Basic auth format
  - Token exceeds max length (8192 bytes) — check token generation
  - Verify token is being refreshed before expiry: check /proxy/config response

403 Forbidden on CONNECT:

  - ACL denied: user's groups do not match firewall rules for the target
  - Check: 'forwardproxy check <user> <target>' for ACL evaluation
  - Check: 'forwardproxy targets <user>' to see allowed destinations
  - Check: 'firewall check <user>' for firewall rule details
  - Geo-IP denial: 'geo lookup <client_ip>' and 'geo check <client_ip>'
  - Time-based denial: verify time_timezone and time_allow_hours in config

429 Too Many Requests:

  - Per-user rate limit exceeded: check rate_limit_per_user setting
  - Per-user bandwidth limit exceeded: check bandwidth_limit_per_user
  - Retry-After header in response indicates when to retry
  - Monitor: 'forwardproxy metrics' for per-user rate limit stats
  - Consider increasing limits for legitimate high-volume users

502 Bad Gateway on CONNECT:

  - DNS resolution failed: 'dns test <target_hostname>'
  - Backend unreachable: 'net tcp <target_host:port>'
  - Connect timeout too short: check forward_proxy.connect_timeout
  - All resolved IPs failed (tries IPv4 first, then IPv6)
  - DNS module failure with system DNS fallback also failing

Connection drops or timeouts during tunnel:

  - Idle timeout: no data flowing for forward_proxy.idle_timeout (default 5m)
  - Max duration exceeded: forward_proxy.max_connection_duration hard limit
  - Check relay buffer_size: default 32KB, increase for high-throughput tunnels
  - HTTP/2 full duplex not supported by server: check error logs for full duplex support errors
  - Intermediate firewall blocking long-lived connections or UDP (QUIC)

PAC file returns DIRECT for all traffic:

  - PAC endpoint requires authentication; verify session cookie is sent
  - Check forward_proxy.pac.enabled = true
  - Check use_firewall_targets = true and user has firewall rules
  - Unauthenticated PAC intentionally returns DIRECT-only (security by design)
  - Inspect PAC: curl -b session=<cookie> https://host/proxy.pac

/proxy/config returns 401 or 403:

  - 401: session cookie missing or expired; trigger re-login via /proxy/setup
  - 403: user not in required group (forward_proxy.pac.group)
  - Verify group membership: 'directory user <username>'

Extension not refreshing token:

  - Verify token_refresh_interval < token_ttl in config
  - Check /proxy/config endpoint accessibility from extension
  - Look for clock skew between client and server (server_time in response)
  - Monitor: 'forwardproxy metrics' for token generation counts

CONNECT-UDP/MASQUE failures:

  - QUIC port (UDP) blocked by intermediate firewall
  - forward_proxy.enable_udp = false in config
  - URI template mismatch: check udp_proxy_path setting
  - MASQUE parse error: malformed CONNECT-UDP request
  - Verify: 'net tcp <proxy_hostname:port> --tls' for TLS connectivity

Geo/time restriction inconsistencies:

  - Forward proxy has its own geo/time config that overrides [service] settings
  - Check both forward_proxy.geo_enabled and service.geo_enabled
  - Restrictions on /proxy/config and CONNECT may behave differently
  - CONNECT restrictions fail-open if the cluster is not ready

Metrics and monitoring:

  - 'forwardproxy metrics' — cluster-wide connection counts and byte totals
  - 'forwardproxy metrics <user>' — per-user breakdown
  - Bytes sent/recv recorded per TCP connection; UDP records duration and
    success only (MASQUE library limitation)

Relationships

Dependencies and interactions:

Forward proxy module: All authentication, ACL, rate limiting, PAC generation,

  metrics, and restriction checks handled cluster-wide.

DNS: Hostname resolution for CONNECT targets. Falls back to system DNS if the

  DNS module is unavailable. IPv4 preferred over IPv6 in resolution order.

Firewall: ACL rules determine which groups can access which destination host:port.

  Firewall rules also drive PAC file generation (use_firewall_targets).

Directory: User disabled status checked during authentication. Group membership

  resolved server-side from the directory memory index during ACL evaluation
  (not embedded in the bearer token).

Geo/Time access: Location and time-based access checks on both /proxy/config

  endpoint and CONNECT requests. Forward proxy can override [service] geo/time
  settings with its own configuration.

Sessions: Session cookies used for /proxy/config, /proxy/setup, and /proxy.pac.

  Browser extension first authenticates via session, then receives a bearer token
  for subsequent CONNECT requests.

Reverse proxy: Complementary service — reverse proxy handles inbound traffic to

  backends, forward proxy handles outbound traffic from users. Both share the same
  TLS listener and session subsystem.

Forward Proxy Engine

Authentication, ACL evaluation, rate limiting, and PAC generation engine for the forward proxy

Overview

The forward proxy module provides browser-native VPN-like access using the MASQUE protocol (RFC 9298) over QUIC. It enables authenticated, policy-controlled tunneling of TCP and UDP traffic through the Hexon gateway without requiring a traditional VPN client.

Core capabilities:

Bearer token authentication using HMAC-SHA256 signed tokens with configurable TTL
Firewall ACL integration for group-based destination access control
Per-user rate limiting (requests/sec) and bandwidth limiting (bytes/sec)
PAC (Proxy Auto-Configuration) file generation for browser proxy setup
JA4/JA4Q fingerprint binding for session-based authentication
Geo-IP and time-based access restrictions (fail-closed)
Active connection tracking with per-user and per-target metrics
DNS resolution via the DNS module (prevents DNS poisoning)
Separate proxy hostname and TLS certificate support for CDN bypass
Token refresh mechanism for long-lived browser sessions

Transport security model:

  The PAC file returns "HTTPS host:port", so the browser always connects to
  the proxy over TLS. The forward proxy listener only speaks TLS.

  HTTPS target (e.g. https://example.com):
    Browser --TLS--> Proxy --TLS--> Target
             CONNECT        tunnel (end-to-end encrypted)
             + token        (raw bytes, no proxy headers)

  Plain HTTP target (e.g. http://ifconfig.io):
    Browser --TLS--> Proxy --plain--> Target
             GET http://...           (content visible on last hop)
             + token                  (token STRIPPED before forwarding)

  The bearer token only travels on the encrypted browser-to-proxy leg.
  Hop-by-hop headers (Proxy-Authorization, Connection, etc.) are removed
  before forwarding. The token never reaches the target server.

Authentication flow (bearer token):

  1. User logs in via any method, receives session cookie
  2. Browser extension fetches /proxy/config with session cookie
  3. Service generates HMAC-SHA256 signed token with user/groups/expiry
  4. Extension sends Proxy-Authorization: Bearer <token> on CONNECT
  5. Token validated locally (no round-trip for validation)
  6. User disabled status checked against directory
  7. CheckAccess enforces firewall ACL rules
  8. Connection established and traffic relayed
  9. Extension periodically refreshes token via /proxy/config

Config

Core configuration under [forward_proxy] section in hexon.toml:

[forward_proxy]

  enabled = true                       # Enable forward proxy module
  port = 8443                          # Dedicated proxy port (must differ from service.port)
  public_port = 8443                   # External port for PAC URLs (for NAT/LB scenarios)
  preserve_client_port = true          # Use client's port in Alt-Svc header
  hostname = "proxy.example.com"       # Separate hostname for CDN bypass (optional)
  fingerprint_binding = true           # Enable JA4/JA4Q fingerprint-to-session binding
  fingerprint_binding_ttl = "8h"       # Fingerprint binding TTL (match session TTL)
  rate_limit_per_user = 1000           # Max requests per second per user
  bandwidth_limit_per_user = "100mbps" # Max bandwidth per user

  # Token settings
  token_ttl = "5m"                     # Token validity duration (default: 5m)
  token_refresh_interval = "60s"       # Extension refresh interval (default: 60s)

  # TLS certificate for the proxy hostname (optional)
  # Only needed when hostname differs from service.hostname
  # Value can be a file path or inline PEM content
  # If not set, uses ACME (add hostname to acme.additional_domains) or service cert
  cert = "/path/to/cert.pem"
  key = "/path/to/key.pem"

  # Geo-IP restrictions (optional, falls back to [service] if not set)
  geo_enabled = true                   # Enable geo-IP restrictions
  geo_allow_countries = ["US", "CA"]   # Allowed country codes (ISO 3166-1 alpha-2)
  geo_deny_countries = []              # Denied country codes
  geo_bypass_cidr = ["10.0.0.0/8"]    # CIDR ranges that bypass geo checks
  geo_deny_code = 403                  # HTTP status code for geo-denied requests
  geo_deny_message = "Access denied from your location"

  # Time-based restrictions (optional, falls back to [service] if not set)
  time_enabled = true                  # Enable time-based restrictions
  time_timezone = "America/New_York"   # Timezone for time checks
  time_allow_days = ["Mon", "Tue", "Wed", "Thu", "Fri"]
  time_allow_hours = "09:00-18:00"     # Allowed hours range
  time_deny_code = 403                 # HTTP status code for time-denied requests
  time_deny_message = "Access not permitted at this time"

PAC file configuration

[forward_proxy.pac]

  enabled = true                       # Enable PAC endpoint
  path = "/proxy.pac"                  # PAC file URL path
  cache_ttl = "15m"                    # PAC response cache TTL
  use_firewall_targets = true          # Derive PAC targets from firewall rules

PAC authentication requirement: unauthenticated requests receive a minimal PAC that routes all traffic directly. Authenticated users get a PAC with targets derived from their firewall rules.

Hot-reloadable: rate_limit_per_user, bandwidth_limit_per_user, geo/time restrictions, PAC settings, token_ttl, token_refresh_interval. Cold (restart required): enabled, port, hostname, fingerprint_binding.

Security

Security layers and hardening measures:

Bearer token security:

  Tokens signed with HMAC-SHA256 using the cluster-wide secret key.
  Short TTL (default 5 minutes) limits exposure window for stolen tokens.
  Token contains user ID, groups, and expiry; validated locally without
  round-trip for minimal latency.
  Tokens are not stored server-side (stateless validation via signature).
  Token transport is always encrypted: the browser-to-proxy connection is
  TLS (PAC returns "HTTPS"), and the token is stripped (hop-by-hop header)
  before forwarding to the target. Even for plain HTTP targets, the token
  never leaves the TLS tunnel.

Fingerprint binding:

  JA4/JA4Q TLS fingerprint bound to session via BindFingerprint operation.
  Prevents token replay from a different client/browser. Binding has its own
  TTL that should match the session TTL for consistent expiry.

Access control (multi-layer):

  1. Bearer token authentication (identity verification)
  2. User disabled check via directory.IsUserDisabled (account status)
  3. Firewall ACL via CheckAccess (group-based destination control)
  4. Rate limiting per user (abuse prevention)
  5. Bandwidth limiting per user (network saturation prevention)
  6. Geo-IP restrictions (location-based access, fail-closed)
  7. Time-based restrictions (schedule-based access, fail-closed)
  8. DNS resolution via the DNS module (prevents DNS poisoning)

Geo-IP and time restrictions:

  Both use fail-closed semantics: if the check cannot be performed
  (e.g., GeoIP database unavailable), access is denied.
  Forward proxy has its own geo/time config that overrides [service] defaults,
  allowing different policies for proxy vs. web access.

PAC file security:

  PAC endpoint requires authentication to return proxy-routed targets.
  Unauthenticated PAC returns DIRECT-only routing (no information leak).
  Username embedded in PAC for browser extension display only.

Rate and bandwidth limiting:

  Per-user rate limiting prevents connection flooding.
  Per-user bandwidth limiting prevents single-user network saturation.
  Both return RetryAfter hints for well-behaved clients.

Troubleshooting

Common symptoms and diagnostic steps:

User cannot connect through forward proxy:

  - Verify forward_proxy.enabled = true and port is correct
  - Check bearer token: token_ttl may have expired, verify refresh is working
  - Check user disabled status: directory user <username>
  - Verify firewall rules allow the target: forwardproxy check <user> <target>
  - Check geo restrictions: geo lookup <client_ip> and geo check <client_ip>
  - Check time restrictions: ensure current time is within allowed window
  - DNS resolution: verify target hostname resolves via dns test <hostname>

PAC file returns DIRECT for all traffic:

  - PAC requires authentication; check session cookie is being sent
  - Verify forward_proxy.pac.enabled = true
  - Check use_firewall_targets = true and firewall rules exist for the user
  - Inspect PAC content: curl -b session=<cookie> https://host/proxy.pac

Token refresh failing (extension shows expired):

  - Check token_refresh_interval is shorter than token_ttl
  - Verify /proxy/config endpoint is accessible with session cookie
  - Check for clock skew between client and server
  - Monitor token generation metrics via forwardproxy metrics

Rate limited (429 responses):

  - Check rate_limit_per_user setting (requests/sec)
  - Check bandwidth_limit_per_user setting
  - Monitor per-user metrics: forwardproxy metrics <username>
  - RetryAfter header indicates when to retry

Fingerprint binding failures:

  - Verify fingerprint_binding = true in config
  - Check fingerprint_binding_ttl matches session TTL
  - JA4 fingerprint changes between requests indicate client switching
  - Browser updates can change JA4 fingerprint (rebind needed)

Connection drops or timeouts:

  - Check backend connectivity: net tcp <target_host:port>
  - Check QUIC port (UDP) is not blocked by intermediate firewalls
  - Verify TLS certificate: net tls <proxy_hostname:port>
  - Check active connections: forwardproxy metrics to see connection counts

Geo-IP or time-based denial (403/451):

  - Geo denial: geo lookup <ip> shows country, geo check <ip> shows policy
  - Time denial: verify time_timezone is correct, check time_allow_hours
  - Bypass CIDR: add client network to geo_bypass_cidr for exemption
  - Forward proxy geo/time overrides [service] config if set

Metrics and monitoring:

  - Active connections: forwardproxy metrics (cluster-wide)
  - Per-user breakdown: forwardproxy metrics <username>
  - Connection success/failure rates tracked via RecordMetrics
  - Bytes sent/received per user for bandwidth accounting

Relationships

Module dependencies and interactions:

Firewall: ACL rule evaluation determines which destinations each user group

  can reach. Firewall rules also drive PAC file generation when
  use_firewall_targets is enabled.

Directory: User disabled check on every authentication call. Group membership

  embedded in token for ACL evaluation.

Forward proxy service: Service layer handles HTTP CONNECT (TCP tunneling),

  CONNECT-UDP (UDP tunneling), and absolute-form HTTP requests (plain HTTP
  forwarding), plus HTTP endpoints (/proxy/config, /proxy/setup, /proxy.pac).
  Service calls this engine for auth, ACL, metrics.

DNS: Hostname resolution for target destinations, with system DNS fallback.
Rate limiting: Per-user request throttling and bandwidth controls.
Geo-IP: Location-based access restrictions. Forward proxy can override

  [service] geo config with its own settings.

Sessions: Session cookie used for initial token generation. Fingerprint

  binding ties proxy session to TLS fingerprint.

Configuration: Hot-reload of rate limits, bandwidth limits, geo/time

  restrictions, PAC settings. Token TTL changes apply to new tokens only.

Telemetry: Structured logging for authentication, ACL decisions, rate limit

  events. Metrics for active connections, bytes transferred, token generation.

Auto TLS: ACME certificate for proxy hostname when using a separate hostname

  (add to acme.additional_domains).

Network Listener

High-performance network listeners with composite client fingerprinting, session affinity, and TLS security

Overview

The listener module manages all network interfaces for HexonGateway, providing high-performance connection handling with built-in security features. It supports:

TCP with optional TLS, HTTP/1.1, HTTP/2, HTTPS, UDP, and gRPC over HTTP/2
Composite client fingerprinting combining JA4 (TLS), HTTP/2, and TCP/IP stack layers
Session affinity routing based on composite fingerprint hash for cluster-wide persistence
Malformed TLS blocking (enabled by default) to reject invalid ClientHello messages
Graceful shutdown with configurable connection draining timeout
Platform-specific TCP optimizations: Fast Open (RFC 7413) and Window Scaling
Per-SNI mTLS with dynamic CA rotation
Proxy mode for deployment behind CDN/load balancer with header-based client identification
QUIC/HTTP/3 fingerprinting with multi-packet reassembly and replay protection
Connection metrics with batched flush (every 100ms or on close) for low overhead
HXEP (Hexon Edge Protocol) for real client IP through edge proxies and SNAT
Correlation ID propagation for end-to-end distributed tracing
HTTP middleware chain: security headers, geo restriction, time restriction, rate limiting
Proof-of-Work challenge middleware for bot protection
Configurable Server header (HexonGateway/<version>, can be disabled)

Fingerprint Components:

  JA4 (TLS)        : t13004d_[cipher_hash]_[ext_hash] — extracted at Accept() before TLS handshake
  HTTP/2            : h2_[settings_hash] — SETTINGS frame parameters and pseudo-header ordering
  TCP/IP Stack      : tcp_[window_mss_ttl_hash] — p0f-style OS identification
  Composite Hash    : SHA256(ja4|http2|tcp) truncated to 32 hex chars
  JA4Q (QUIC)       : QUIC transport parameters fingerprint for HTTP/3 clients

Fingerprint data is stored in a unified structure across all protocols (HTTP/1.1, HTTP/2, HTTP/3), providing a consistent interface for rate limiting, session affinity, and client identification.

Config

Core configuration under [service] in config TOML:

[service]

  hostname = "auth.example.com"        # Service hostname
  tls_cert = "/path/to/cert.pem"       # TLS certificate path
  tls_key = "/path/to/key.pem"         # TLS private key path
  handshake_timeout = 10               # TLS handshake timeout in seconds (default: 10)
  block_malformed_tls = true           # Reject invalid TLS ClientHello (default: true)
  max_header_bytes = 65536             # Max ClientHello size in bytes (default: 64KB)
  disable_server_header = false        # Suppress HexonGateway/<version> header (default: false)
  correlation_id_header = "X-Hexon-ID" # Correlation ID header name (default: "X-Hexon-ID")
  cookie_name = "hexon"                # Session cookie name (default: "hexon")

  # Mutual TLS
  mtls_mode = "none"                   # "none", "optional", "mandatory" (default: "none")

  # HTTP/2 settings
  http2_enable = true                  # Enable HTTP/2 (default: true)
  http2_maxstreams = 1000              # Max concurrent streams per connection
  http2_maxframesize = 1048576         # Max frame payload size (default: 1MB)
  http2_idletimeout = 120              # Idle timeout in seconds
  http2_keepalive = true               # Enable HTTP/2 keepalive
  http2_keepaliveseconds = 30          # Keepalive interval in seconds

  # Fingerprint cache
  fingerprint_max_entries = 10000      # Max entries in addr fingerprint map (default: 10000)
  fingerprint_ttl_seconds = 300        # Base TTL in seconds (default: 5 min)
  fingerprint_cleanup_seconds = 30     # Cleanup sweep interval (default: 30s)
  fingerprint_max_entries_per_ip = 10  # Max fingerprints per IP, anti-abuse (default: 10)

  # JA4 parsing security limits
  ja4_max_extensions = 200             # Max TLS extensions to parse (default: 200, typical: 10-30)
  ja4_max_sigalgs = 100                # Max signature algorithms to parse (default: 100)

  # HTTP/2 fingerprint cache
  http2_fingerprint_cache_size = 10000     # Max entries (default: 10000)
  http2_fingerprint_cache_evict_pct = 10   # % of oldest entries to evict when full (1-50)

  # QUIC fingerprint reassembly
  quic_fingerprint_reassembly_max_packets = 10   # Max packets for reassembly (default: 10)
  quic_fingerprint_reassembly_max_bytes = 15360  # Max reassembly buffer (default: 15KB)
  quic_fingerprint_reassembly_timeout_s = 5      # State timeout (default: 5s)
  quic_max_crypto_frame_offset = 65536           # Max CRYPTO frame offset (default: 64KB)

  # Proxy mode (behind CDN/LB)
  proxy = false                                  # Enable proxy mode (default: false)
  proxy_cidr = ["10.0.0.0/8"]                    # Trusted proxy IPs (REQUIRED when proxy=true)
  proxy_header_clientip = "X-Forwarded-For"      # Real client IP header (REQUIRED when proxy=true)
  proxy_header_clientcert = "SSL_CLIENT_CERT"    # Client certificate header (optional)
  proxy_header_clientfingerprint = "CF-Ray"      # Client fingerprint header (optional)
  proxy_header_traceid = "X-Request-ID"          # Trace ID header for distributed tracing (optional)

  # Geo restriction (router-level middleware)
  geo_enabled = false                  # Enable geo restrictions (default: false)
  geo_database = "GeoLite2-Country.mmdb"
  geo_asn_database = "GeoLite2-ASN.mmdb"
  geo_allow_countries = []             # ISO 3166-1 alpha-2 codes (empty = all)
  geo_deny_countries = []              # Deny takes precedence over allow
  geo_allow_asn = []                   # ASN allow list
  geo_deny_asn = []                    # ASN deny list
  geo_bypass_cidr = []                 # CIDRs that skip geo checks
  geo_deny_code = 403                  # HTTP status for denials
  geo_deny_message = ""                # Custom denial message

  # Time restriction (router-level middleware)
  time_enabled = false                 # Enable time restrictions (default: false)
  time_bypass_cidr = []                # CIDRs that skip time checks
  time_default_timezone = "UTC"        # Default timezone (IANA format)

[protection]

  rate_limit = "100/1m"                # Requests per interval (empty = disabled)
  rate_limit_type = "fingerprint"      # "fingerprint" or "ip" (default: "ip")
  rate_limit_bantime = "5m"            # Ban duration when limit exceeded

Fingerprint adaptive TTL (based on cache utilization):

  Normal (<60%):  base TTL (default 5 min)
  Medium (60-80%): base TTL / 2 (min 2 min)
  High (>80%):    base TTL / 5 (min 1 min)
  LRU eviction triggers when TTL cleanup is insufficient.

  # HXEP (Hexon Edge Protocol)
  hexon_edge_protocol = false         # Enable HXEP header parsing (default: false)
  hexon_edge_cidr = [                 # Trusted CIDRs for HXEP (default: trust all)
    "10.244.0.0/16",                  # Kubernetes pod network
  ]

HXEP (Hexon Edge Protocol) — real client IP through edge proxies:

  When traffic flows: External Client → Edge Proxy → Gateway (via k8s Service/LB),
  the edge proxy prepends a binary header with the original client IP and port.
  Format: Magic "HXEP" (4B) + Type (1B: 0x04=IPv4, 0x06=IPv6) + IP (4/16B) + Port (2B)
  Required for: geo-IP accuracy, rate limiting, IDS, and RADIUS NAS identification
  when the gateway sits behind an edge proxy or Kubernetes service with SNAT.

  Config:
  - service.hexon_edge_protocol = true   → enables HXEP parsing on all listeners
  - service.hexon_edge_cidr = [...]      → only these source CIDRs are trusted for HXEP
    Default: ["0.0.0.0/0", "::/0"] (trust all) — restrict to pod CIDR in production
  - Packets from untrusted CIDRs: HXEP header stripped, socket address used
  - Set automatically via Helm when edge.enabled=true

  Protocols: TCP (parsed on first read, before TLS handshake), UDP (PacketConn wrapper),
  HTTP/3 QUIC (HXEP wrapping applied transparently, GSO/ECN/GRO OOB data preserved).

  Used by: reverse proxy, VPN (IKEv2), RADIUS (RADSEC + UDP), SSH bastion.

Hot-reloadable: TLS certificates, mTLS CA pool, proxy mappings, geo/time rules, rate limit settings, fingerprint cache limits. Cold (restart required): listen addresses, HTTP/2 enable, proxy mode toggle, HXEP settings.

Troubleshooting

Common symptoms and diagnostic steps:

TLS handshake failures:

  - Malformed ClientHello blocked: check 'logs search "Malformed TLS"' for details
  - block_malformed_tls=true rejects missing SNI, invalid TLS version, oversized ClientHello
  - ClientHello too large: check max_header_bytes setting (default 64KB)
  - TLS version rejected: only 0x0301-0x0304 (TLS 1.0-1.3) accepted
  - mTLS certificate popup on proxy routes: check per-SNI mTLS config, set mtls=false on mapping
  - CA rotation issues: 'certs list' to verify CA bundle, check 'logs search "CA rotation"'
  - Start with: 'diagnose domain <hostname>' for cross-subsystem check

Fingerprint cache exhaustion:

  - High memory from fingerprint storage: check fingerprint_max_entries setting
  - Adaptive TTL kicking in too aggressively: increase fingerprint_ttl_seconds
  - Per-IP abuse: 'logs search "fingerprint limit exceeded"' to identify attackers
  - fingerprint_max_entries_per_ip controls anti-abuse threshold (default: 10)
  - LRU eviction warnings: 'logs search "evict"' to monitor cache pressure
  - Check: 'metrics prometheus fingerprint' for cache utilization metrics

Session affinity not working:

  - Verify cluster_affinity=true in global config
  - Loopback connections (127.0.0.1, ::1) bypass affinity by design
  - VPN clients bypass affinity (checked against vpn.network.subnet CIDR)
  - Circuit breaker open for target node: 'proxy circuits' to check breaker states
  - No TLS = no fingerprint = no affinity: ensure clients connect via HTTPS
  - Check: 'cluster status' for node health, 'health components' for listener status

Proxy mode issues (behind CDN/LB):

  - 403 Forbidden: source IP not in proxy_cidr, check 'logs search "CIDR"'
  - 400 Bad Request: missing client IP header, verify proxy_header_clientip config
  - Rate limiting all users as one: JA4 unavailable in proxy mode, use proxy_header_clientfingerprint
  - Wrong client IP: X-Forwarded-For uses FIRST IP only (original client, not proxy chain)
  - Header injection: ensure proxy_cidr is restricted to actual proxy IPs
  - Distributed tracing broken: configure proxy_header_traceid for end-to-end correlation
  - mTLS through proxy: set proxy_header_clientcert and mtls_mode="optional" or "mandatory"

QUIC/HTTP/3 fingerprint failures:

  - Large ClientHello spanning packets: check quic_fingerprint_reassembly_max_packets
  - Reassembly timeout: increase quic_fingerprint_reassembly_timeout_s for slow networks
  - CRYPTO frame offset too large: quic_max_crypto_frame_offset default 64KB should suffice
  - Connection ID too long (>20 bytes): RFC 9000 violation, likely malicious traffic

Rate limiting misbehavior:

  - All clients sharing one rate bucket: check rate_limit_type ("fingerprint" vs "ip")
  - Composite fingerprint unavailable: falls back to IP automatically
  - Per-route bypass not working: verify disable_rate_limit=true on the proxy mapping
  - Cluster-wide consistency: rate limits use distributed memory cache
  - Check: 'ratelimit stats' for current rate limiting state, 'metrics ratelimit' for counters

HXEP (Hexon Edge Protocol) issues:

  - HXEP not resolving real client IP: verify service.hexon_edge_protocol = true
  - Wrong client IP after HXEP: verify source IP falls within service.hexon_edge_cidr
  - "HXEP header stripped": source IP is outside trusted CIDRs — add pod/edge CIDR
  - Geo/rate limiting sees edge proxy IP instead of client: HXEP not enabled or CIDR mismatch
  - RADIUS NAS rejected after HXEP: real NAS IP doesn't match any [[radius.client]] CIDR
  - VPN IKEv2 sees wrong source: same HXEP config applies — check hexon_edge_cidr
  - Default trust-all CIDRs in production: security risk — restrict to actual pod network CIDR
  - Config: 'config show service' and check hexon_edge_protocol + hexon_edge_cidr fields
  - Helm sets HXEP automatically when edge.enabled=true in values.yaml

Connection metrics missing:

  - Metrics batched (flush every 100ms or on close): short-lived connections may lag
  - Check: 'health components' for listener health status
  - 'metrics prometheus listener' for per-listener connection counters

Geo/time restriction issues:

  - Geo blocking wrong country: verify MaxMind database is current
  - Bypass CIDR not working: geo_bypass_cidr checked before country/ASN rules
  - Time window mismatch: verify IANA timezone spelling (e.g., "America/New_York")
  - Overnight ranges supported: "22:00-06:00" spans midnight correctly
  - Check: 'geo lookup <ip>' to verify classification, 'geo timecheck <ip>' for time rules

Architecture

Connection lifecycle:

Client connects to TCP socket
First bytes peeked to detect TLS, extract JA4 fingerprint + SNI
TCP fingerprint extracted (window size, TTL, MSS, options ordering)
Session affinity check: fingerprint hash maps to a cluster node
If affinity target is a remote node: forward connection to that node
If local: proceed with TLS handshake (per-SNI mTLS selection)
If HTTP/2: extract HTTP/2 fingerprint from SETTINGS frame
Compute composite hash: SHA256(ja4|http2|tcp) truncated to 32 hex chars
Assign correlation ID, begin connection tracking
HTTP middleware chain: telemetry -> client identification -> connection info ->

    security headers -> geo restriction -> time restriction -> rate limit -> handler

Handler processes request, correlation ID propagates as trace_id across modules
Metrics flushed on connection close

Fingerprint extraction pipeline:

  Accept-level (before TLS): JA4 from ClientHello peek (zero-copy, buffered I/O)
  TLS callback: per-SNI mTLS mode selection
  Post-handshake: HTTP/2 SETTINGS fingerprint from connection preface
  TCP layer: p0f-style OS fingerprint from socket options (window, MSS, TTL)
  QUIC path: JA4Q from Initial packet, transport params fingerprint, multi-packet reassembly

GSO/ECN/GRO preservation:

  All UDP wrappers (HXEP edge protocol and JA4Q fingerprint) preserve kernel offload
  capabilities so that QUIC can use:
  - GSO (Generic Segmentation Offload): send 64KB in one syscall, kernel splits into MTU packets
  - GRO (Generic Receive Offload): kernel coalesces packets, fewer syscalls on receive
  - ECN (Explicit Congestion Notification): congestion signals via IP header bits
  Without these, QUIC silently falls back to one syscall per packet.
  This affects both HTTP/3 reverse proxy and QUIC connector listeners.

Fingerprint memory protection:

  Address fingerprint map: configurable max entries (default 10,000) with adaptive TTL
  Per-IP limit: configurable (default 10), oldest replaced on overflow
  LRU eviction: sorts by timestamp, evicts oldest when TTL cleanup insufficient
  HTTP/2 cache: configurable size with percentage-based LRU eviction (1-50%)
  All maps use lock-free concurrent reads for performance

Proxy mode flow:

  Step 1: Validate source IP against configured proxy_cidr
  Step 2: Extract trace ID from proxy header, update correlation context
  Step 3: Extract and sanitize client IP (first IP from comma-separated list)
  Step 4: Fingerprint priority: dedicated header > client cert hash > client IP
  Step 5: Update context with real client identifiers for downstream modules

mTLS CA rotation flow:

  1. ACME CA rotates, triggers listener update
  2. CA pool rebuilt atomically (config CA + ACME CA merged)
  3. HTTPS listeners gracefully restarted
  4. Existing connections drain gracefully, new connections get fresh CA pool

Graceful shutdown sequence:

  1. Stop accepting new connections on all listeners
  2. Close all listener sockets
  3. Wait for active connections up to configurable timeout
  4. Cancel contexts for remaining connections
  5. Force-close any connections still open after timeout

Performance characteristics:

  - Pooled slice allocations reduce GC pressure during fingerprint extraction
  - Buffered I/O to minimize syscalls
  - Metrics batched to reduce overhead (flush every 100ms)
  - TCP Fast Open: 15-30% latency reduction for repeat clients (Linux 3.7+, macOS)
  - TCP Window Scaling: 20-40% throughput improvement for large transfers
  - SO_REUSEPORT on Linux for load balancing across cores

Relationships

Module dependencies and interactions:

Proxy: Provides per-SNI mTLS lookup. Listener provides fingerprint and client IP

  context consumed by proxy for rate limiting, identity headers, and session affinity.

Sessions: Listener middleware manages session cookie extraction. Session validation

  uses correlation IDs propagated through listener context.

Certificates: TLS termination uses certificates from the cert module. Per-mapping

  certificates loaded via SNI callback. CA pool for mTLS verification rebuilt
  atomically on ACME CA rotation.

WAF: WAF rules applied in middleware chain after listener accepts connection.

  Fingerprint available in context for WAF correlation.

X.509 authentication: mTLS mode controls TLS client auth level. In proxy mode,

  client certificates injected from HTTP header. Certificate validation uses dynamic
  CA pool.

Rate limiting: Middleware reads composite fingerprint or client IP from context.

  Composite fingerprint (JA4+HTTP/2+TCP) or IP-based, configurable per route.

Geo restriction: Middleware at router level uses client IP from context with

  MaxMind GeoLite2 databases for country/ASN lookup.

Time restriction: Middleware after geo restriction uses client country for

  timezone-aware time window matching.

VPN: VPN clients identified by subnet CIDR to bypass session affinity. Prevents

  VPN tunnel connections from being forwarded to other cluster nodes.

Cluster affinity: Fingerprint hash selects cluster node for session routing.

  Node health checked before forwarding. Forwarded connections use inter-node
  communication for transparent routing.

DNS: Listener does not directly use DNS, but proxy backends resolved via DNS module.
Distributed tracing: Correlation IDs generated at listener level propagate as

  trace_id through all operations, enabling end-to-end tracing across cluster nodes.

Connection pool: Backend connection management operates downstream of listener.

  Listener handles inbound connections; connection pool handles outbound to backends.

TCP/TLS Proxy

TCP/TLS proxy with mTLS authentication, passthrough mode, protocol-aware health checks, and geo/time-based access control

Overview

The TCP proxy service enables secure access to private TCP services such as databases (MySQL, PostgreSQL), caches (Redis, Memcached), and message queues. It operates in two modes: mTLS-authenticated mode for identity-aware access control, and passthrough mode for pure TCP load balancing.

mTLS Mode (auth = true, default):

Server presents Hexon TLS certificate (per-mapping or global)
Client presents an enrolled X.509 certificate
Certificate validation: chain verification, expiration, OCSP/CRL checks
Instant revocation via serial number index (no CRL propagation delay)
Group-based authorization with allow/deny semantics
Per-user rate limiting and connection limits
Backend can be plain TCP or TLS (configured separately)

Passthrough Mode (auth = false):

Pure TCP relay without TLS termination or protocol knowledge
Raw bytes relayed between client and backend
No authentication or group-based authorization
Rate limiting applies per client IP instead of per user
Useful for services that handle their own auth and TLS

Load balancing capabilities:

Strategies: round_robin, weighted, least_connections, hash, random, maglev
Hash keys: cert_serial (default), cn (username), ip (client address)
Protocol-aware health checks: TCP, MySQL, PostgreSQL, Redis
Circuit breaker with configurable error threshold and recovery
Outlier detection with automatic backend ejection

Additional access control:

Geo-IP restrictions per mapping (country, ASN allow/deny with bypass CIDRs)
Time-based restrictions per mapping (day/hour windows with timezone support)
Time windows with per-country/CIDR overrides for global deployments
Geo and time checks execute BEFORE TLS handshake for fast rejection

Hot-reload support:

New mappings created immediately
Removed mappings gracefully drained then stopped
Updated mappings drain existing connections, restart with new config
Unchanged mappings preserve existing connections
Config change detection via hash comparison (no-op for identical config)

Connection draining during shutdown or config updates:

Stop accepting new connections on affected mapping
Wait for existing connections to complete (configurable timeout)
Force-close remaining connections after timeout

Config

Core configuration under [tcp_proxy] and [[tcp_proxy.mapping]]:

[tcp_proxy]

  enabled = true                          # Enable TCP proxy service
  cert = "/etc/hexon/tcp-proxy.crt"       # Default TLS certificate (path or inline PEM)
  key = "/etc/hexon/tcp-proxy.key"        # Default TLS private key (path or inline PEM)
  buffer_size = 32768                     # TCP relay buffer size in bytes (default: 32KB)
  connect_timeout = "10s"                 # Backend connection timeout (default: 10s)
  idle_timeout = "5m"                     # Idle connection timeout (default: 5m)
  max_connection_duration = "24h"         # Maximum connection lifetime (default: 24h)
  max_connections_per_user = 100          # Max concurrent connections per user (0 = unlimited)

[[tcp_proxy.mapping]]

  name = "mysql-prod"                     # Display name for the mapping
  listen_port = 3306                      # TCP port to listen on
  auth = true                             # mTLS mode (default: true); false = passthrough
  cert = "/etc/hexon/mysql-proxy.crt"     # Per-mapping TLS certificate (overrides global)
  key = "/etc/hexon/mysql-proxy.key"      # Per-mapping TLS private key
  protocol_hint = "mysql"                 # Protocol hint for logging and metrics
  backends = ["mysql-1:3306", "mysql-2:3306"]  # Backend addresses

Load balancing options:

  lb_strategy = "round_robin"             # round_robin, weighted, least_connections, hash, random, maglev
  lb_weights = [5, 3, 2]                 # Weights for weighted strategy (must match backends count)
  lb_hash_key = "cert_serial"            # Hash key: cert_serial (default), cn, ip

Backend TLS options (three modes):

  backend_tls = false                     # Plain TCP to backend (default)
  backend_tls = true                      # TLS to backend (encrypted)
  backend_tls_verify = true               # Verify backend certificate (default: false)
  backend_tls_ca = "/path/to/ca.pem"     # Custom CA for backend verification
  backend_tls_sni = "db.internal"         # SNI for backend cert validation
  backend_tls_cert = "/path/to/client.pem"  # Client cert for mTLS to backend
  backend_tls_key = "/path/to/client.key"   # Client key for mTLS to backend
  backend_tls_min_version = "1.3"         # Min TLS version to backend (default: 1.3)

Authorization options (mTLS mode only):

  allowed_groups = ["dba", "developers"]  # Allow these groups (OR logic, empty = all)
  denied_groups = ["contractors"]         # Deny these groups (takes precedence over allow)
  allowed_subnets = ["10.0.0.0/8"]       # Client IP CIDR restrictions

Health check options:

  health_check_enabled = true             # Enable health checks (default: true)
  health_check_interval = "10s"           # Check interval (default: 10s)
  health_check_timeout = "5s"             # Check timeout (default: 5s)
  health_check_type = "tcp"              # tcp, mysql, postgresql, redis

Circuit breaker options:

  circuit_breaker_enabled = true
  circuit_breaker_error_threshold = 0.5   # Trip at 50% failure rate (0.0-1.0)
  circuit_breaker_window = "10s"          # Error tracking window
  circuit_breaker_fallback_time = "30s"   # Wait before half-open state

Outlier detection options:

  outlier_detection_enabled = true
  outlier_detection_interval = "10s"      # Analysis interval
  outlier_detection_failure_rate = 50     # Eject when failure rate exceeds 50%
  outlier_detection_min_requests = 10     # Minimum requests before analysis
  outlier_detection_ejection_time = "30s" # Base ejection duration
  outlier_detection_max_ejection = 50     # Max percentage of backends to eject

Rate limiting:

  rate_limit = "100/1m"                   # Connections per minute per user (mTLS) or per IP (passthrough)
  max_connections = 500                   # Max concurrent connections for this mapping (0 = unlimited)

Per-mapping overrides:

  buffer_size = 65536                     # Override global buffer size
  connect_timeout = "5s"                  # Override global connect timeout
  idle_timeout = "1m"                     # Override global idle timeout
  max_connection_duration = "1h"          # Override global max duration

Geo-IP restrictions (both modes):

  geo_enabled = true                      # Enable geo-IP restrictions
  geo_allow_countries = ["US", "CA"]      # Allow only these countries (ISO 3166-1 alpha-2)
  geo_deny_countries = ["CN", "RU"]       # Deny these countries (takes precedence)
  geo_allow_asn = ["AS15169"]             # Allow specific ASNs
  geo_deny_asn = ["AS12345"]              # Deny specific ASNs
  geo_bypass_cidr = ["10.0.0.0/8"]       # Skip geo checks for these CIDRs

Time-based restrictions (both modes):

  time_enabled = true                     # Enable time restrictions
  time_timezone = "America/New_York"      # Default timezone (IANA format)
  time_allow_days = ["Mon","Tue","Wed","Thu","Fri"]  # Allowed days
  time_deny_days = ["Sat", "Sun"]         # Denied days (takes precedence)
  time_allow_hours = "09:00-18:00"        # Allowed hours (24h format)
  time_deny_hours = "00:00-06:00"         # Denied hours (takes precedence)
  time_bypass_cidr = ["10.0.0.0/8"]      # Skip time checks for these CIDRs

Time windows (per-country/CIDR overrides within a mapping):

  [[tcp_proxy.mapping.time_windows]]
  countries = ["US"]                      # Apply this window to US clients
  timezone = "America/New_York"
  allow_days = ["Mon","Tue","Wed","Thu","Fri"]
  allow_hours = "09:00-21:00"

Certificate and key fields accept both file paths and inline PEM content. Inline PEM is detected by the “-----BEGIN” prefix.

Client CA sources (mTLS mode):

  Same as HTTP proxy for consistent PKI behavior:
  - authentication.x509.ca_pem (if configured) for external PKI certificates
  - ACME CA bundle (always) for Hexon-enrolled certificates
  No separate client_ca setting; use authentication.x509.ca_pem for external CAs.

Troubleshooting

Common symptoms and diagnostic steps:

Connection refused on listen port:

  - TCP proxy not enabled: check [tcp_proxy] enabled = true
  - Port conflict: another process already bound to listen_port
  - Mapping not loaded: check config syntax with 'config validate'
  - Firewall blocking: 'firewall rules' to check network-level rules

mTLS handshake failure:

  - Client certificate not enrolled: must be enrolled via Hexon X.509 enrollment
  - Certificate expired: check certificate validity dates
  - Wrong CA: client cert must be signed by ACME CA or configured external CA
  - Certificate revoked: instant revocation via serial index; check 'certs x509 list'
  - TLS version mismatch: mTLS mode requires TLS 1.3
  - Start with: 'diagnose user <username>' for cross-subsystem check

Connection denied after successful TLS:

  - Group authorization failed: user missing required group in allowed_groups
  - Denied group match: user is in a denied_groups group (takes precedence)
  - Subnet restriction: client IP not in allowed_subnets
  - Rate limit exceeded: check 'metrics prometheus tcp_proxy_denied_total'
  - Per-user connection limit: max_connections_per_user reached
  - Check user access: 'directory user <username>' for group membership

Backend connection failures:

  - Backend unreachable: 'net tcp <backend:port>' to verify connectivity
  - All backends unhealthy: 'proxy health' for health check status
  - Circuit breaker open: 'proxy circuits' for breaker states
  - Backend TLS issues: verify backend_tls_ca and backend_tls_sni settings
  - DNS resolution: 'dns test <backend-hostname>' to verify

Geo-IP access denied:

  - Client country not in geo_allow_countries: 'geo lookup <ip>' to check
  - Client ASN in geo_deny_asn: deny takes precedence over allow
  - Geo checks run BEFORE TLS handshake: connection dropped at TCP level
  - Internal networks: add to geo_bypass_cidr to skip geo checks

Time-based access denied:

  - Outside allowed hours/days: 'geo timecheck <ip>' for current status
  - Timezone mismatch: verify time_timezone is correct IANA timezone
  - Time window overrides: check per-country/CIDR time_windows config
  - Time checks run BEFORE TLS handshake: connection dropped at TCP level

Slow connections or high latency:

  - Buffer size too small: increase buffer_size for high-throughput workloads
  - Backend health degrading: check health check metrics
  - Connection pool exhaustion: check max_connections limit
  - Outlier detection ejecting healthy backends: review outlier thresholds
  - Circuit breaker flapping: check circuit_breaker_window and error_threshold

Connection drops after idle period:

  - idle_timeout too short: increase for long-running database sessions
  - max_connection_duration reached: increase for persistent connections
  - Backend idle timeout: backend may close idle connections independently

Passthrough mode issues:

  - No TLS termination: Hexon relays raw bytes, cannot inspect traffic
  - No user identity: rate limiting uses client IP, not username
  - Health checks: only TCP health checks work in passthrough mode
  - Hash strategy: uses client IP since no certificate serial available

Client connectivity (mTLS mode requires TLS tunnel):

  - Standard clients (mysql, psql) lack mTLS support in required format
  - Use socat for ad-hoc tunnels: socat TCP-LISTEN:local,fork OPENSSL:host:port,...
  - Use stunnel for persistent daemon-style tunnels
  - Client certificate must be PEM format from Hexon X.509 enrollment

Relationships

Module dependencies and interactions:

loadbalancer: Pool management, backend selection, health checks (TCP, MySQL,

  PostgreSQL, Redis), circuit breakers, outlier detection. Multi-algorithm support
  (round-robin, weighted, least-connections, hash, random, Maglev).

x509 (authentication): Client certificate validation in mTLS mode. Chain

  verification, expiration, OCSP/CRL checks, instant revocation via serial index.
  Uses same client CA sources as HTTP proxy (authentication.x509.ca_pem + ACME CA).

directory: User information retrieval after successful mTLS authentication.

  Group membership lookup for allow/deny group authorization.

geoaccess: Geo-IP restriction enforcement per mapping. Country and ASN

  allow/deny with bypass CIDRs. Checks execute before TLS handshake for
  fast rejection of unauthorized connections.

timeaccess: Time-based restriction enforcement per mapping. Day/hour windows

  with timezone support and per-country/CIDR overrides. Also checked before
  TLS handshake.

firewall: Network-level access rules applied before TCP proxy routing.
certificates: TLS certificate management for proxy listeners. Per-mapping

  or global certificate selection. Backend TLS configuration for encrypted
  upstream connections.

hotreload: Configuration change detection via file watcher or SIGHUP. Mappings

  are diffed by hash to detect actual changes. Unchanged mappings preserve
  existing connections; changed mappings are drained and restarted.

sessions: No direct dependency. TCP proxy uses mTLS certificate-based

  authentication, not session cookies.

proxy (HTTP): Shares client CA trust configuration for consistent PKI

  behavior. No runtime dependency; they operate on different ports/protocols.

Architecture

Connection flow for mTLS mode:

Client initiates TCP connection to listen_port
Geo-IP restrictions checked BEFORE TLS handshake (if geo_enabled)
Time-based restrictions checked BEFORE TLS handshake (if time_enabled)
TLS 1.3 handshake with mutual authentication (client certificate required)
Certificate validated (chain verification, expiry, OCSP/CRL, revocation)
User info retrieved from directory module (username, groups)
ACL evaluated: allowed_groups (OR), denied_groups (precedence), allowed_subnets
Rate limit checked per user via loadbalancer module
Backend selected via loadbalancer module (strategy-based)
Backend connection established (plain TCP, TLS, or mTLS depending on config)
Bidirectional TCP relay started with configurable buffer_size
Metrics and audit logs recorded on connection close

Connection flow for passthrough mode:

Client connects via raw TCP to listen_port
Geo-IP restrictions checked (if geo_enabled)
Time-based restrictions checked (if time_enabled)
Rate limit checked per client IP
Backend selected via loadbalancer module (hash by client IP)
Bidirectional TCP relay started
Metrics recorded (no user identity, only client IP)

Security properties:

TLS 1.3 only for mTLS mode client connections
Post-quantum cryptography support (ML-KEM-768 + X25519)
Certificate validation with OCSP stapling
Instant revocation via serial index lookup (no CRL propagation delay)
Per-user connection limits enforced at proxy level
Geo and time checks before TLS handshake avoid expensive crypto for denied clients

Hot-reload mechanism:

Config change detected (fsnotify or SIGHUP)
New config parsed and validated
Each mapping compared by hash to detect changes
Unchanged mappings: no action, existing connections preserved
New mappings: listener created, starts accepting connections
Removed mappings: stop accepting, drain existing, close after timeout
Updated mappings: drain existing connections, restart with new config
All changes are atomic per-mapping (no partial updates)