Connectivity
Client Access (HexonClient)
Transparent L3 network access via QUIC tunnels for CLI tools and native applications
Overview
The Client Access subsystem enables end users (DBAs, developers, operators) to transparently access internal resources through a lightweight QUIC tunnel. The HexonClient binary captures IP packets via TUN + gVisor netstack, extracts TCP flows, and dials each flow as a QUIC stream to the gateway.
The gateway side (this module) handles:
- QUIC listener on a dedicated port with ALPN “hexon-client” and TLS 1.3
- Two authentication paths: server-side device code (RFC 8628) for interactive use, JWT with RFC 5705 channel binding for reconnect/automation
- Per-user route derivation from firewall ACL rules (CIDR + Site routes)
- Virtual IP allocation from a dedicated subnet (default 100.64.208.0/22)
- Per-stream firewall ACL check before dialing backends
- Direct dial or connector tunnel routing based on HostAlias Site field
- Bidirectional splice with 32KB pooled buffers and half-close propagation
- DNS resolution on the control stream for split DNS
- DNS defense-in-depth: per-session O(1) rate limiting + ACL enforcement (RFC 8914)
- Token refresh with group-change detection and mid-session route updates
- Cluster-wide session tracking
This mirrors the connector architecture but reversed: the client opens streams, the gateway accepts and dials backends.
Configuration
Configuration uses the [client_access] TOML section:
[client_access] enabled = true port = 8445 # network_interface = "" # Bind to specific interface (falls back to service.network_interface) # cert = "" # Dedicated TLS cert (falls back to SNI/auto-TLS) # key = "" # Dedicated TLS key subnet = "100.64.208.0/22" # Virtual IP pool for clients (1022 addresses) gateway_ip = "100.64.208.1" # Gateway IP within subnet (excluded from pool) dns_upstream = ["10.0.0.53"] # DNS resolvers for client queries dns_domains = [] # Additional DNS domains pushed to all clients # cidrs = ["10.0.0.0/22"] # Additional CIDR routes pushed to all clients heartbeat_interval = "30s" # Heartbeat frequency (session TTL = 3x this) token_refresh_interval = "45m" # Client token refresh interval max_idle_timeout = "5m" # QUIC idle timeout max_clients = 1000 # Maximum concurrent client connections max_streams_per_client = 100 # Maximum concurrent TCP streams per client dns_rate_limit = 100 # Maximum DNS queries per second per client # required_groups = ["engineers", "operators"] # Empty = any authenticated userThe subnet must not overlap with the IKEv2 VPN subnet (default 100.64.0.0/22). Each connected client gets one virtual IP from the pool.
Routes pushed to clients come from two sources:
- Firewall host aliases: CIDRs and IPs from aliases matched by user groups
- Config-level cidrs: pushed to all clients regardless of group membership Both are merged (deduplicated) before sending in ClientAck.
Admin commands
Admin CLI commands:
clients list [--user=X] List connected hexonclient sessions (cluster-wide) clients show <session_id> Show full session details (device, network, streams, traffic, timing) clients disconnect <user> [id] Disconnect all sessions for user, or a specific session [WRITE]Bastion shell commands (self-service, filtered to own sessions):
clients List your active hexonclient sessions clients list Same as above clients disconnect [session_id] Disconnect your own session(s)Security
Two authentication paths (determined by whether client sends a token):
Device code flow (interactive — RFC 8628, same as bastion SSH):
- Client establishes QUIC/TLS 1.3 connection (ALPN: “hexon-client”)
- Client sends ClientAuth with empty token (signals device code request)
- Gateway initiates device code authorization (server-side, no HTTP from client)
- Gateway sends DeviceCodeChallenge: verification URI, user code, expiry
- Client displays QR code + clickable URL + user code
- Gateway polls the device code service until authorized, denied, or expired
- On authorization: gateway extracts claims (username, email, groups) from poll response
- Gateway checks required_groups, derives routes, allocates VIP
- Gateway sends ClientAck with VIP, routes, DNS, and JWT tokens for reconnection Reconnected sessions use the JWT path below (no re-authentication needed).
JWT flow (reconnect / automation):
- Client establishes QUIC/TLS 1.3 connection (ALPN: “hexon-client”)
- Client sends ClientAuth: JWT + HMAC-SHA256(token, TLS exporter) proof
- Gateway validates JWT (extracts username, groups)
- Gateway verifies channel binding proof (RFC 5705 prevents token replay)
- Gateway checks required_groups (if configured): user must have ANY listed group
- Client sends ClientRegister with device metadata
- Gateway derives per-user routes from firewall ACL rules
- Gateway sends ClientAck with VIP, routes, DNS, token refresh interval
Per-stream access control:
- Each QUIC stream carries a DialHeader (host, port, protocol)
- Gateway checks firewall access control (user groups vs target host/port/protocol)
- Denied streams get DialStatusDenied response immediately
- Only allowed streams proceed to backend dial
DNS defense-in-depth (same model as IKEv2 VPN DNS):
- Per-session rate limiting: O(1) time-bucketed rolling window (dns_rate_limit qps)
- DNS ACL: after resolve, firewall checks user groups vs host aliases
- ACL-denied queries return DNSStatusDenied (RFC 8914 REFUSED) — prevents information leakage
- ACL call failure fails open (dial-time ACL is the authoritative control)
Token refresh:
- Client sends TokenRefresh with new JWT + proof before token expires
- Gateway re-validates JWT and channel binding
- Gateway re-checks required_groups: if user lost membership, connection is terminated
- If groups changed: re-derive routes, send RouteUpdate with add/remove entries
- Bad token on refresh kills the connection (security boundary)
Troubleshooting
Common symptoms and diagnostic steps:
Client cannot connect:
- Check listener: 'status summary' shows clientaccess listener state - Check config: 'config show client_access' (enabled, port, subnet) - Check required_groups: 'config show client_access' — user must be in listed groups (empty = any) - Check certs: 'certs list' or 'diagnose domain <hostname>' - Max clients reached: 'logs search clientaccess --level=warn' - Group denied: 'logs search clientaccess --level=warn' shows "group access denied"Client connected but cannot reach services:
- Check pushed routes: 'config show client_access' — cidrs must include destination subnet - Check firewall rules: user's groups must match rule sources - Check HostAlias: destination alias must have matching hosts (CIDRs for TUN routes, wildcards for DNS only) - Check connector: if Site is set, connector must be connected - 'logs search clientaccess-dial --level=warn' for denied dialsDNS not resolving:
- Check dns_upstream config: must point to reachable resolvers - Check dns_domains: domains must be in the pushed list for split DNS - 'logs search clientaccess-dns' for resolution errors - DNS ACL denied (REFUSED): check user's groups match firewall rules for the hostname - DNS rate limited (SERVFAIL): check dns_rate_limit setting (default 100 qps)Token refresh failures:
- 'logs search clientaccess-refresh --level=warn' - Invalid token: OIDC provider may have rotated keys - Channel binding failure: possible MITM or TLS session changeRelationships
Module dependencies:
- devicecode: Server-side device code authorization (RFC 8628) for interactive authentication
- oidc: JWT validation for reconnect/automation authentication
- firewall: Per-stream access control, DNS ACL enforcement, host alias route derivation
- dns: DNS resolution for client split DNS queries
- sessions: Cluster-wide session tracking (create, validate, revoke)
- connectors: Site-based routing through connector tunnels
- IP pool: Virtual IP allocation from dedicated subnet
- listener: QUIC listener with TLS 1.3 and idle timeout
- telemetry: Structured logging and Prometheus metrics
QUIC Connector
Identity-aware tunnels to remote sites via QUIC for proxy, bastion, and forward proxy traffic
Overview
The Connector subsystem enables identity-aware access to services running at remote sites (Kubernetes clusters, data centers, VPCs) without VPN tunnels, IPsec, or network-level connectivity.
A lightweight binary (hexonconnect) deployed at the remote site establishes an outbound QUIC connection to Hexon. Hexon then sends “dial” commands through this tunnel whenever a proxy mapping, bastion session, forward proxy rule, or firewall policy references that site via the “site” parameter.
Key capabilities:
- Zero-trust remote access: connector dials only what Hexon asks, nothing else
- Opaque site namespace: same IPs and DNS names across sites are irrelevant
- Stateless token auth: HMAC-derived tokens validated without storage
- Channel-bound authentication: RFC 5705 TLS Exported Keying Material prevents
replay and MITM attacks — the token never travels on the wire- Multi-instance HA: multiple connectors per site with adaptive load balancing
- Cross-node routing: any cluster node can route to any connector via adaptive
inter-node forwarding — requests arriving at a node without connector instances are transparently forwarded to a node that has them- Auto-reconnect: connector never gives up, exponential backoff on disconnect
- CDN-compatible: optional dedicated hostname and TLS certificate for direct access
Configuration
Configuration uses the [connector] TOML section:
[connector] enabled = true port = 8444 # hostname = "connector.example.com" # Optional: dedicated hostname (CDN bypass) # cert = "/path/to/cert.pem" # Optional: file path or inline PEM # key = "/path/to/key.pem" # Optional: file path or inline PEM [[connector.sites]] id = "prod-asia-a8f3c1" name = "Production Asia" cidrs = ["203.0.113.0/24"] max_instances = 3 rebalance = true # Distribute across cluster nodes (default: true) rebalance_retries = 5 # Accept after N soft-rejects (default: 5, 1-10)TLS certificate resolution:
1. connector.cert/key when set (static certificate) 2. SNI callback: auto-TLS (ACME), certmanager, wildcard, or service certificate If connector.hostname is set and no cert/key is provided, ACME will automatically provision a certificate for the connector hostname.Usage across subsystems — add “site” parameter:
[[proxy.mapping]] app = "API Asia" host = "api-asia.example.com" service = "http://api.default.svc.cluster.local:8080" site = "prod-asia-a8f3c1" # Shadow targets can also route through connectors: [[proxy.mapping.shadow]] name = "staging-mirror" service = "https://staging.internal:8443" site = "staging-eu" # Circuit breaker fallback can use a different connector site: [proxy.mapping.circuit_breaker] fallback_mode = "service" fallback_service = ["http://dr-backend:8080"] fallback_site = "dr-europe" # SSH cert rules — route bastion SSH through connector: [[bastion.ssh_cert.rules]] name = "remote-dc-ssh" groups = ["devops"] destinations = ["*.internal"] site = "prod-asia-a8f3c1" # SQL bastion — route database connections through connector: [[sql_bastion.sites]] name = "postgres-remote" type = "postgres" host = "pg.internal" port = 5432 site = "prod-asia-a8f3c1" # Firewall host aliases — route forward proxy traffic through connector: [[firewall.aliases.hosts]] name = "remote_services" hosts = ["gitlab.internal", "jenkins.internal"] site = "prod-asia-a8f3c1" # Aliases with site skip nft rules — traffic goes through userspace QUIC tunnelToken generation is deterministic from the cluster key — any node can validate.
Admin commands
Admin CLI:
connector list List configured sites and live connections connector show <site-id> Show site config, token, and connected instances (includes platform, origin with geo/ASN, system labels) connector create <site-id> Create new site (generates token) connector revoke <site-id> Block site, disconnect active QUIC tunnels connector instances <site-id> List connected instances with metricsThe “connector show” output includes per-instance details: platform (OS/arch), origin IP with country and ASN (via geo module), and system labels reported by the connector binary (kernel, OS version, runtime environment, memory, virtualization, PID, UID/GID).
The “connector revoke” command disconnects active QUIC tunnels in addition to revoking cluster sessions, causing connectors to reconnect (and be rejected).
Config reload cleanup: when a site is removed from config (via GitOps or hot reload), active QUIC connections for that site are automatically disconnected. The connector binary will reconnect but be rejected because the site is no longer in config. This prevents stale sessions from lingering in JetStream KV.
Security
Trust boundaries:
- Hexon Cluster (full trust): policy enforcement, identity, routing
- Connector (minimal trust): dials only what Hexon asks, no autonomous access
Authentication flow:
- QUIC/TLS 1.3 connection established (server cert, ECDHE, forward secrecy)
- Both sides compute TLS exporter keying material (RFC 5705) with an application-specific label
- Connector sends: site_id + HMAC of token bound to the TLS channel
- Hexon validates by recomputing from cluster key
Additional protections:
- Optional CIDR allowlist per site restricts connector source IPs
- max_instances limit prevents token abuse
- Instance selection uses epsilon-greedy adaptive algorithm with circuit breaker
- QUIC relay loop prevention: relay handler only dispatches locally,
preventing infinite forwarding loops between nodes- Cluster-wide rebalancing: soft-rejects excess connectors so they redistribute
across gateway nodes (configurable per site, default 5 retries before accepting)Inter node forwarding
All cluster nodes can route to any connector site through QUIC relay.
When a request arrives at a node without local connector instances (or after local retries are exhausted), the dispatcher transparently relays through a peer node. The relay uses QUIC on the same connector port (8444) with ALPN “hexon-relay” and mTLS for peer authentication. Each relay request opens a QUIC stream, sends a dispatch header, and the peer dispatches locally through its QUIC connector tunnel.
All traffic types converge through the same dispatch path — this covers reverse proxy, forward proxy, client access (TCP/UDP), SSH bastion, SQL bastion, shadow targets, and probes.
Remote node IPs are cached (5s refresh) from cluster-wide connector sessions. Failed nodes are tracked by the cluster discovery health checks.
Loop prevention:
- The relay handler only dispatches locally (never relays further)
- A peer with no local instances returns an immediate error
Troubleshooting relay:
- Client-side metrics: relay_total (attempts), relay_success_total, relay_errors_total
- Server-side metrics: relay_served (requests handled), relay_rejected_total (auth failures)
- Relay rejected with “no_certificate”: peer isn’t presenting its service cert
- Relay rejected with “not_peer”: source IP not in cluster discovery peer list
- Relay “no_instances”: the peer node also has no local connectors for the site
- Check logs: ‘logs search connectors.relay —level=warn’
Quic tuning
QUIC performance tuning applied to both gateway and connector sides:
Flow control windows (tuned for database and bulk transfer workloads):
- Stream: 2MB initial, 8MB max - Connection: 4MB initial, 20MB max - Stream-to-connection ratio: 2:5 Larger initial windows reduce round-trips for big responses (SQL results, file transfers).Persistent QUIC transport (connector side):
- hexonconnect reuses one UDP socket across reconnections - Avoids per-connection socket allocation and kernel offload state loss - Enables future QUIC connection migration if network interface changesStream error handling:
- Error paths immediately release QUIC stream resources instead of graceful close - Frees resources under load without waiting for peer acknowledgmentMax concurrent streams:
- Gateway: configurable per listener (default 100) - Connector: 1024 (high concurrency for multiplexed tunnel streams)Rebalancing
When multiple connector replicas start simultaneously (e.g., Kubernetes Deployment with 3 replicas), they may all connect to the same gateway node via DNS or a load balancer. The rebalance mechanism redistributes them:
- First connector for a site on a node is always accepted
- Subsequent connectors check cluster distribution: if this node has more instances
than the least-loaded remote node, the registration is soft-rejected- The connector reconnects with a short backoff (2 seconds) — DNS/LB randomness
typically sends it to a different node- After N soft-rejects (configurable, default 5), the node accepts anyway
Per-site configuration:
rebalance = true # Enable cluster-wide load distribution (default: true) rebalance_retries = 5 # Max soft-rejects before accepting (1-10, default: 5)Rebalance is best-effort — sticky load balancers may prevent redistribution, so the retry budget ensures connectors are never stuck. Metrics: rebalance_reject_total and rebalance_accept_total track distribution activity per site.
Forward Proxy
HTTP CONNECT and CONNECT-UDP service layer with bearer token auth, MASQUE UDP proxying, PAC endpoints, and CDN bypass
Overview
The forward proxy provides VPN-like access to internal resources directly from the browser — no client software needed. It processes HTTP CONNECT requests for TCP tunneling (RFC 9114) and CONNECT-UDP requests for UDP proxying via MASQUE (RFC 9298).
Core capabilities:
- HTTP CONNECT handling for TCP proxy tunneling
- CONNECT-UDP handling for UDP proxy tunneling (MASQUE/QUIC)
- PAC file endpoint serving at configurable path (default /proxy.pac)
- Browser extension config endpoint at /proxy/config
- Browser extension setup/login endpoint at /proxy/setup
- CONNECT rejected on main service port (421 Misdirected) — proxy port only
- Geo-IP and time-based restriction enforcement before tunneling
- DNS resolution with system DNS fallback
- Bidirectional TCP relay with idle timeout and max connection duration
- HTTP/2+ full duplex CONNECT stream support (RFC 8441)
- HTTP/1.1 connection hijacking for classic CONNECT tunneling
- Connection tracking and byte-level metrics recording
The service runs on a dedicated port (forward_proxy.port) separate from the main service port for security isolation. CONNECT requests on the main port receive 421 Misdirected Request, directing clients to the correct proxy port.
TCP CONNECT request flow:
1. Extract client IP (CDN bypass mode uses RemoteAddr directly) 2. Check geo-IP and time-based restrictions 3. Validate target host:port format (RFC 1035 hostname length limit) 4. Extract bearer token from Proxy-Authorization header 5. Authenticate token and check user is not disabled 6. Check ACL (firewall group rules for target destination) 7. Check per-user rate limit (fail-closed) 8. Resolve hostname via DNS module (system DNS fallback) 9. Establish backend TCP connection with configurable timeout 10. Start bidirectional relay with idle timeout and max duration 11. Record metrics (bytes sent/recv, duration, success)CONNECT-UDP request flow:
1-7. Same as TCP (restrictions, auth, ACL, rate limit) 8. MASQUE UDP proxying (capsule protocol, socket management) 9. Record metrics after session completesBearer token authentication supports two formats:
- "Bearer <token>" header (direct bearer token) - "Basic <base64>" header where username is "_bearer_" and password is the token (Chrome's onAuthRequired format for Proxy-Authorization)Config
Service-level configuration under [forward_proxy] in hexon.toml:
[forward_proxy]
enabled = true # Enable forward proxy (default: false) port = 8443 # Dedicated proxy port (must differ from service.port) public_port = 8443 # External port for PAC URLs (NAT/LB scenarios) hostname = "proxy.example.com" # Separate hostname for CDN bypass (optional) enable_tcp = true # Enable TCP CONNECT handling (default: true) enable_udp = true # Enable CONNECT-UDP/MASQUE handling (default: true) udp_proxy_path = "/masque" # URI path for CONNECT-UDP requests (default: /masque) auth_mode = "bearer" # Authentication mode for CONNECT requests buffer_size = "32KB" # TCP relay buffer size (default: 32KB) connect_timeout = "10s" # Backend connection timeout idle_timeout = "5m" # Idle connection timeout (no data flowing) max_connection_duration = "24h" # Maximum connection duration (hard limit) preserve_client_port = true # Use client's port in Alt-Svc header # Token settings (used by /proxy/config endpoint) token_ttl = "5m" # Token validity duration (default: 5m, min: 30s) token_refresh_interval = "60s" # Extension refresh interval (default: 60s, min: 5s) # TLS certificate for the proxy hostname (when hostname differs from service) cert = "/path/to/cert.pem" # File path or inline PEM key = "/path/to/key.pem" # File path or inline PEM # Geo-IP restrictions (overrides [service] if set) geo_enabled = true # Enable geo-IP restrictions geo_allow_countries = ["US", "CA"] # Allowed country codes (ISO 3166-1 alpha-2) geo_deny_countries = [] # Denied country codes geo_bypass_cidr = ["10.0.0.0/8"] # CIDR ranges that bypass geo checks geo_deny_code = 403 # HTTP status for geo denial geo_deny_message = "Access denied from your location" # Time-based restrictions (overrides [service] if set) time_enabled = true # Enable time-based restrictions time_timezone = "America/New_York" # Timezone for time checks time_allow_days = ["Mon","Tue","Wed","Thu","Fri"] time_allow_hours = "09:00-18:00" # Allowed hours range time_deny_code = 403 # HTTP status for time denial time_deny_message = "Access not permitted at this time"PAC file settings
[forward_proxy.pac]
enabled = true # Enable PAC endpoint (default: true) path = "/proxy.pac" # PAC file URL path cache_ttl = "15m" # PAC response Cache-Control max-age group = "proxy-users" # Required group for PAC/config/setup access (optional) use_firewall_targets = true # Derive PAC targets from firewall rulesEndpoints registered by the service:
GET /proxy.pac - PAC file (requires auth, optional group) GET /proxy/config - JSON: PAC + token + refresh interval + username + server_time GET /proxy/setup - Login trigger page for browser extensionsCDN bypass mode:
When forward_proxy.hostname differs from service.hostname, the proxy accepts direct connections (no CDN in between). Client IP is extracted from RemoteAddr instead of X-Forwarded-For. This is typical because CDNs do not support HTTP CONNECT.Hot-reloadable: token_ttl, token_refresh_interval, geo/time restrictions, PAC settings,
rate_limit_per_user, bandwidth_limit_per_user, buffer_size, idle_timeout, max_connection_duration.Cold (restart required): enabled, port, hostname, enable_tcp, enable_udp,
udp_proxy_path, preserve_client_port.Troubleshooting
Common symptoms and diagnostic steps:
CONNECT requests returning 421 Misdirected Request:
- Client is sending CONNECT to the main service port instead of the proxy port - The forward proxy middleware rejects CONNECT on the main port by design - Verify client is configured to use forward_proxy.port (or public_port) - Check error message for the correct proxy hostname:port407 Proxy Authentication Required:
- Missing Proxy-Authorization header on CONNECT request - Token format not recognized (must be "Bearer <token>" or "Basic <base64>") - For Chrome extension: username must be "_bearer_" in Basic auth format - Token exceeds max length (8192 bytes) — check token generation - Verify token is being refreshed before expiry: check /proxy/config response403 Forbidden on CONNECT:
- ACL denied: user's groups do not match firewall rules for the target - Check: 'forwardproxy check <user> <target>' for ACL evaluation - Check: 'forwardproxy targets <user>' to see allowed destinations - Check: 'firewall check <user>' for firewall rule details - Geo-IP denial: 'geo lookup <client_ip>' and 'geo check <client_ip>' - Time-based denial: verify time_timezone and time_allow_hours in config429 Too Many Requests:
- Per-user rate limit exceeded: check rate_limit_per_user setting - Per-user bandwidth limit exceeded: check bandwidth_limit_per_user - Retry-After header in response indicates when to retry - Monitor: 'forwardproxy metrics' for per-user rate limit stats - Consider increasing limits for legitimate high-volume users502 Bad Gateway on CONNECT:
- DNS resolution failed: 'dns test <target_hostname>' - Backend unreachable: 'net tcp <target_host:port>' - Connect timeout too short: check forward_proxy.connect_timeout - All resolved IPs failed (tries IPv4 first, then IPv6) - DNS module failure with system DNS fallback also failingConnection drops or timeouts during tunnel:
- Idle timeout: no data flowing for forward_proxy.idle_timeout (default 5m) - Max duration exceeded: forward_proxy.max_connection_duration hard limit - Check relay buffer_size: default 32KB, increase for high-throughput tunnels - HTTP/2 full duplex not supported by server: check error logs for full duplex support errors - Intermediate firewall blocking long-lived connections or UDP (QUIC)PAC file returns DIRECT for all traffic:
- PAC endpoint requires authentication; verify session cookie is sent - Check forward_proxy.pac.enabled = true - Check use_firewall_targets = true and user has firewall rules - Unauthenticated PAC intentionally returns DIRECT-only (security by design) - Inspect PAC: curl -b session=<cookie> https://host/proxy.pac/proxy/config returns 401 or 403:
- 401: session cookie missing or expired; trigger re-login via /proxy/setup - 403: user not in required group (forward_proxy.pac.group) - Verify group membership: 'directory user <username>'Extension not refreshing token:
- Verify token_refresh_interval < token_ttl in config - Check /proxy/config endpoint accessibility from extension - Look for clock skew between client and server (server_time in response) - Monitor: 'forwardproxy metrics' for token generation countsCONNECT-UDP/MASQUE failures:
- QUIC port (UDP) blocked by intermediate firewall - forward_proxy.enable_udp = false in config - URI template mismatch: check udp_proxy_path setting - MASQUE parse error: malformed CONNECT-UDP request - Verify: 'net tcp <proxy_hostname:port> --tls' for TLS connectivityGeo/time restriction inconsistencies:
- Forward proxy has its own geo/time config that overrides [service] settings - Check both forward_proxy.geo_enabled and service.geo_enabled - Restrictions on /proxy/config and CONNECT may behave differently - CONNECT restrictions fail-open if the cluster is not readyMetrics and monitoring:
- 'forwardproxy metrics' — cluster-wide connection counts and byte totals - 'forwardproxy metrics <user>' — per-user breakdown - Bytes sent/recv recorded per TCP connection; UDP records duration and success only (MASQUE library limitation)Relationships
Dependencies and interactions:
- Forward proxy module: All authentication, ACL, rate limiting, PAC generation,
metrics, and restriction checks handled cluster-wide.- DNS: Hostname resolution for CONNECT targets. Falls back to system DNS if the
DNS module is unavailable. IPv4 preferred over IPv6 in resolution order.- Firewall: ACL rules determine which groups can access which destination host:port.
Firewall rules also drive PAC file generation (use_firewall_targets).- Directory: User disabled status checked during authentication. Group membership
resolved server-side from the directory memory index during ACL evaluation (not embedded in the bearer token).- Geo/Time access: Location and time-based access checks on both /proxy/config
endpoint and CONNECT requests. Forward proxy can override [service] geo/time settings with its own configuration.- Sessions: Session cookies used for /proxy/config, /proxy/setup, and /proxy.pac.
Browser extension first authenticates via session, then receives a bearer token for subsequent CONNECT requests.- Reverse proxy: Complementary service — reverse proxy handles inbound traffic to
backends, forward proxy handles outbound traffic from users. Both share the same TLS listener and session subsystem.Forward Proxy Engine
Authentication, ACL evaluation, rate limiting, and PAC generation engine for the forward proxy
Overview
The forward proxy module provides browser-native VPN-like access using the MASQUE protocol (RFC 9298) over QUIC. It enables authenticated, policy-controlled tunneling of TCP and UDP traffic through the Hexon gateway without requiring a traditional VPN client.
Core capabilities:
- Bearer token authentication using HMAC-SHA256 signed tokens with configurable TTL
- Firewall ACL integration for group-based destination access control
- Per-user rate limiting (requests/sec) and bandwidth limiting (bytes/sec)
- PAC (Proxy Auto-Configuration) file generation for browser proxy setup
- JA4/JA4Q fingerprint binding for session-based authentication
- Geo-IP and time-based access restrictions (fail-closed)
- Active connection tracking with per-user and per-target metrics
- DNS resolution via the DNS module (prevents DNS poisoning)
- Separate proxy hostname and TLS certificate support for CDN bypass
- Token refresh mechanism for long-lived browser sessions
Transport security model:
The PAC file returns "HTTPS host:port", so the browser always connects to the proxy over TLS. The forward proxy listener only speaks TLS. HTTPS target (e.g. https://example.com): Browser --TLS--> Proxy --TLS--> Target CONNECT tunnel (end-to-end encrypted) + token (raw bytes, no proxy headers) Plain HTTP target (e.g. http://ifconfig.io): Browser --TLS--> Proxy --plain--> Target GET http://... (content visible on last hop) + token (token STRIPPED before forwarding) The bearer token only travels on the encrypted browser-to-proxy leg. Hop-by-hop headers (Proxy-Authorization, Connection, etc.) are removed before forwarding. The token never reaches the target server.Authentication flow (bearer token):
1. User logs in via any method, receives session cookie 2. Browser extension fetches /proxy/config with session cookie 3. Service generates HMAC-SHA256 signed token with user/groups/expiry 4. Extension sends Proxy-Authorization: Bearer <token> on CONNECT 5. Token validated locally (no round-trip for validation) 6. User disabled status checked against directory 7. CheckAccess enforces firewall ACL rules 8. Connection established and traffic relayed 9. Extension periodically refreshes token via /proxy/configConfig
Core configuration under [forward_proxy] section in hexon.toml:
[forward_proxy]
enabled = true # Enable forward proxy module port = 8443 # Dedicated proxy port (must differ from service.port) public_port = 8443 # External port for PAC URLs (for NAT/LB scenarios) preserve_client_port = true # Use client's port in Alt-Svc header hostname = "proxy.example.com" # Separate hostname for CDN bypass (optional) fingerprint_binding = true # Enable JA4/JA4Q fingerprint-to-session binding fingerprint_binding_ttl = "8h" # Fingerprint binding TTL (match session TTL) rate_limit_per_user = 1000 # Max requests per second per user bandwidth_limit_per_user = "100mbps" # Max bandwidth per user # Token settings token_ttl = "5m" # Token validity duration (default: 5m) token_refresh_interval = "60s" # Extension refresh interval (default: 60s) # TLS certificate for the proxy hostname (optional) # Only needed when hostname differs from service.hostname # Value can be a file path or inline PEM content # If not set, uses ACME (add hostname to acme.additional_domains) or service cert cert = "/path/to/cert.pem" key = "/path/to/key.pem" # Geo-IP restrictions (optional, falls back to [service] if not set) geo_enabled = true # Enable geo-IP restrictions geo_allow_countries = ["US", "CA"] # Allowed country codes (ISO 3166-1 alpha-2) geo_deny_countries = [] # Denied country codes geo_bypass_cidr = ["10.0.0.0/8"] # CIDR ranges that bypass geo checks geo_deny_code = 403 # HTTP status code for geo-denied requests geo_deny_message = "Access denied from your location" # Time-based restrictions (optional, falls back to [service] if not set) time_enabled = true # Enable time-based restrictions time_timezone = "America/New_York" # Timezone for time checks time_allow_days = ["Mon", "Tue", "Wed", "Thu", "Fri"] time_allow_hours = "09:00-18:00" # Allowed hours range time_deny_code = 403 # HTTP status code for time-denied requests time_deny_message = "Access not permitted at this time"PAC file configuration
[forward_proxy.pac]
enabled = true # Enable PAC endpoint path = "/proxy.pac" # PAC file URL path cache_ttl = "15m" # PAC response cache TTL use_firewall_targets = true # Derive PAC targets from firewall rulesPAC authentication requirement: unauthenticated requests receive a minimal PAC that routes all traffic directly. Authenticated users get a PAC with targets derived from their firewall rules.
Hot-reloadable: rate_limit_per_user, bandwidth_limit_per_user, geo/time restrictions, PAC settings, token_ttl, token_refresh_interval. Cold (restart required): enabled, port, hostname, fingerprint_binding.
Security
Security layers and hardening measures:
Bearer token security:
Tokens signed with HMAC-SHA256 using the cluster-wide secret key. Short TTL (default 5 minutes) limits exposure window for stolen tokens. Token contains user ID, groups, and expiry; validated locally without round-trip for minimal latency. Tokens are not stored server-side (stateless validation via signature). Token transport is always encrypted: the browser-to-proxy connection is TLS (PAC returns "HTTPS"), and the token is stripped (hop-by-hop header) before forwarding to the target. Even for plain HTTP targets, the token never leaves the TLS tunnel.Fingerprint binding:
JA4/JA4Q TLS fingerprint bound to session via BindFingerprint operation. Prevents token replay from a different client/browser. Binding has its own TTL that should match the session TTL for consistent expiry.Access control (multi-layer):
1. Bearer token authentication (identity verification) 2. User disabled check via directory.IsUserDisabled (account status) 3. Firewall ACL via CheckAccess (group-based destination control) 4. Rate limiting per user (abuse prevention) 5. Bandwidth limiting per user (network saturation prevention) 6. Geo-IP restrictions (location-based access, fail-closed) 7. Time-based restrictions (schedule-based access, fail-closed) 8. DNS resolution via the DNS module (prevents DNS poisoning)Geo-IP and time restrictions:
Both use fail-closed semantics: if the check cannot be performed (e.g., GeoIP database unavailable), access is denied. Forward proxy has its own geo/time config that overrides [service] defaults, allowing different policies for proxy vs. web access.PAC file security:
PAC endpoint requires authentication to return proxy-routed targets. Unauthenticated PAC returns DIRECT-only routing (no information leak). Username embedded in PAC for browser extension display only.Rate and bandwidth limiting:
Per-user rate limiting prevents connection flooding. Per-user bandwidth limiting prevents single-user network saturation. Both return RetryAfter hints for well-behaved clients.Troubleshooting
Common symptoms and diagnostic steps:
User cannot connect through forward proxy:
- Verify forward_proxy.enabled = true and port is correct - Check bearer token: token_ttl may have expired, verify refresh is working - Check user disabled status: directory user <username> - Verify firewall rules allow the target: forwardproxy check <user> <target> - Check geo restrictions: geo lookup <client_ip> and geo check <client_ip> - Check time restrictions: ensure current time is within allowed window - DNS resolution: verify target hostname resolves via dns test <hostname>PAC file returns DIRECT for all traffic:
- PAC requires authentication; check session cookie is being sent - Verify forward_proxy.pac.enabled = true - Check use_firewall_targets = true and firewall rules exist for the user - Inspect PAC content: curl -b session=<cookie> https://host/proxy.pacToken refresh failing (extension shows expired):
- Check token_refresh_interval is shorter than token_ttl - Verify /proxy/config endpoint is accessible with session cookie - Check for clock skew between client and server - Monitor token generation metrics via forwardproxy metricsRate limited (429 responses):
- Check rate_limit_per_user setting (requests/sec) - Check bandwidth_limit_per_user setting - Monitor per-user metrics: forwardproxy metrics <username> - RetryAfter header indicates when to retryFingerprint binding failures:
- Verify fingerprint_binding = true in config - Check fingerprint_binding_ttl matches session TTL - JA4 fingerprint changes between requests indicate client switching - Browser updates can change JA4 fingerprint (rebind needed)Connection drops or timeouts:
- Check backend connectivity: net tcp <target_host:port> - Check QUIC port (UDP) is not blocked by intermediate firewalls - Verify TLS certificate: net tls <proxy_hostname:port> - Check active connections: forwardproxy metrics to see connection countsGeo-IP or time-based denial (403/451):
- Geo denial: geo lookup <ip> shows country, geo check <ip> shows policy - Time denial: verify time_timezone is correct, check time_allow_hours - Bypass CIDR: add client network to geo_bypass_cidr for exemption - Forward proxy geo/time overrides [service] config if setMetrics and monitoring:
- Active connections: forwardproxy metrics (cluster-wide) - Per-user breakdown: forwardproxy metrics <username> - Connection success/failure rates tracked via RecordMetrics - Bytes sent/received per user for bandwidth accountingRelationships
Module dependencies and interactions:
- Firewall: ACL rule evaluation determines which destinations each user group
can reach. Firewall rules also drive PAC file generation when use_firewall_targets is enabled.- Directory: User disabled check on every authentication call. Group membership
embedded in token for ACL evaluation.- Forward proxy service: Service layer handles HTTP CONNECT (TCP tunneling),
CONNECT-UDP (UDP tunneling), and absolute-form HTTP requests (plain HTTP forwarding), plus HTTP endpoints (/proxy/config, /proxy/setup, /proxy.pac). Service calls this engine for auth, ACL, metrics.- DNS: Hostname resolution for target destinations, with system DNS fallback.
- Rate limiting: Per-user request throttling and bandwidth controls.
- Geo-IP: Location-based access restrictions. Forward proxy can override
[service] geo config with its own settings.- Sessions: Session cookie used for initial token generation. Fingerprint
binding ties proxy session to TLS fingerprint.- Configuration: Hot-reload of rate limits, bandwidth limits, geo/time
restrictions, PAC settings. Token TTL changes apply to new tokens only.- Telemetry: Structured logging for authentication, ACL decisions, rate limit
events. Metrics for active connections, bytes transferred, token generation.- Auto TLS: ACME certificate for proxy hostname when using a separate hostname
(add to acme.additional_domains).Network Listener
High-performance network listeners with composite client fingerprinting, session affinity, and TLS security
Overview
The listener module manages all network interfaces for HexonGateway, providing high-performance connection handling with built-in security features. It supports:
- TCP with optional TLS, HTTP/1.1, HTTP/2, HTTPS, UDP, and gRPC over HTTP/2
- Composite client fingerprinting combining JA4 (TLS), HTTP/2, and TCP/IP stack layers
- Session affinity routing based on composite fingerprint hash for cluster-wide persistence
- Malformed TLS blocking (enabled by default) to reject invalid ClientHello messages
- Graceful shutdown with configurable connection draining timeout
- Platform-specific TCP optimizations: Fast Open (RFC 7413) and Window Scaling
- Per-SNI mTLS with dynamic CA rotation
- Proxy mode for deployment behind CDN/load balancer with header-based client identification
- QUIC/HTTP/3 fingerprinting with multi-packet reassembly and replay protection
- Connection metrics with batched flush (every 100ms or on close) for low overhead
- HXEP (Hexon Edge Protocol) for real client IP through edge proxies and SNAT
- Correlation ID propagation for end-to-end distributed tracing
- HTTP middleware chain: security headers, geo restriction, time restriction, rate limiting
- Proof-of-Work challenge middleware for bot protection
- Configurable Server header (HexonGateway/<version>, can be disabled)
Fingerprint Components:
JA4 (TLS) : t13004d_[cipher_hash]_[ext_hash] — extracted at Accept() before TLS handshake HTTP/2 : h2_[settings_hash] — SETTINGS frame parameters and pseudo-header ordering TCP/IP Stack : tcp_[window_mss_ttl_hash] — p0f-style OS identification Composite Hash : SHA256(ja4|http2|tcp) truncated to 32 hex chars JA4Q (QUIC) : QUIC transport parameters fingerprint for HTTP/3 clientsFingerprint data is stored in a unified structure across all protocols (HTTP/1.1, HTTP/2, HTTP/3), providing a consistent interface for rate limiting, session affinity, and client identification.
Config
Core configuration under [service] in config TOML:
[service]
hostname = "auth.example.com" # Service hostname tls_cert = "/path/to/cert.pem" # TLS certificate path tls_key = "/path/to/key.pem" # TLS private key path handshake_timeout = 10 # TLS handshake timeout in seconds (default: 10) block_malformed_tls = true # Reject invalid TLS ClientHello (default: true) max_header_bytes = 65536 # Max ClientHello size in bytes (default: 64KB) disable_server_header = false # Suppress HexonGateway/<version> header (default: false) correlation_id_header = "X-Hexon-ID" # Correlation ID header name (default: "X-Hexon-ID") cookie_name = "hexon" # Session cookie name (default: "hexon") # Mutual TLS mtls_mode = "none" # "none", "optional", "mandatory" (default: "none") # HTTP/2 settings http2_enable = true # Enable HTTP/2 (default: true) http2_maxstreams = 1000 # Max concurrent streams per connection http2_maxframesize = 1048576 # Max frame payload size (default: 1MB) http2_idletimeout = 120 # Idle timeout in seconds http2_keepalive = true # Enable HTTP/2 keepalive http2_keepaliveseconds = 30 # Keepalive interval in seconds # Fingerprint cache fingerprint_max_entries = 10000 # Max entries in addr fingerprint map (default: 10000) fingerprint_ttl_seconds = 300 # Base TTL in seconds (default: 5 min) fingerprint_cleanup_seconds = 30 # Cleanup sweep interval (default: 30s) fingerprint_max_entries_per_ip = 10 # Max fingerprints per IP, anti-abuse (default: 10) # JA4 parsing security limits ja4_max_extensions = 200 # Max TLS extensions to parse (default: 200, typical: 10-30) ja4_max_sigalgs = 100 # Max signature algorithms to parse (default: 100) # HTTP/2 fingerprint cache http2_fingerprint_cache_size = 10000 # Max entries (default: 10000) http2_fingerprint_cache_evict_pct = 10 # % of oldest entries to evict when full (1-50) # QUIC fingerprint reassembly quic_fingerprint_reassembly_max_packets = 10 # Max packets for reassembly (default: 10) quic_fingerprint_reassembly_max_bytes = 15360 # Max reassembly buffer (default: 15KB) quic_fingerprint_reassembly_timeout_s = 5 # State timeout (default: 5s) quic_max_crypto_frame_offset = 65536 # Max CRYPTO frame offset (default: 64KB) # Proxy mode (behind CDN/LB) proxy = false # Enable proxy mode (default: false) proxy_cidr = ["10.0.0.0/8"] # Trusted proxy IPs (REQUIRED when proxy=true) proxy_header_clientip = "X-Forwarded-For" # Real client IP header (REQUIRED when proxy=true) proxy_header_clientcert = "SSL_CLIENT_CERT" # Client certificate header (optional) proxy_header_clientfingerprint = "CF-Ray" # Client fingerprint header (optional) proxy_header_traceid = "X-Request-ID" # Trace ID header for distributed tracing (optional) # Geo restriction (router-level middleware) geo_enabled = false # Enable geo restrictions (default: false) geo_database = "GeoLite2-Country.mmdb" geo_asn_database = "GeoLite2-ASN.mmdb" geo_allow_countries = [] # ISO 3166-1 alpha-2 codes (empty = all) geo_deny_countries = [] # Deny takes precedence over allow geo_allow_asn = [] # ASN allow list geo_deny_asn = [] # ASN deny list geo_bypass_cidr = [] # CIDRs that skip geo checks geo_deny_code = 403 # HTTP status for denials geo_deny_message = "" # Custom denial message # Time restriction (router-level middleware) time_enabled = false # Enable time restrictions (default: false) time_bypass_cidr = [] # CIDRs that skip time checks time_default_timezone = "UTC" # Default timezone (IANA format)[protection]
rate_limit = "100/1m" # Requests per interval (empty = disabled) rate_limit_type = "fingerprint" # "fingerprint" or "ip" (default: "ip") rate_limit_bantime = "5m" # Ban duration when limit exceededFingerprint adaptive TTL (based on cache utilization):
Normal (<60%): base TTL (default 5 min) Medium (60-80%): base TTL / 2 (min 2 min) High (>80%): base TTL / 5 (min 1 min) LRU eviction triggers when TTL cleanup is insufficient. # HXEP (Hexon Edge Protocol) hexon_edge_protocol = false # Enable HXEP header parsing (default: false) hexon_edge_cidr = [ # Trusted CIDRs for HXEP (default: trust all) "10.244.0.0/16", # Kubernetes pod network ]HXEP (Hexon Edge Protocol) — real client IP through edge proxies:
When traffic flows: External Client → Edge Proxy → Gateway (via k8s Service/LB), the edge proxy prepends a binary header with the original client IP and port. Format: Magic "HXEP" (4B) + Type (1B: 0x04=IPv4, 0x06=IPv6) + IP (4/16B) + Port (2B) Required for: geo-IP accuracy, rate limiting, IDS, and RADIUS NAS identification when the gateway sits behind an edge proxy or Kubernetes service with SNAT. Config: - service.hexon_edge_protocol = true → enables HXEP parsing on all listeners - service.hexon_edge_cidr = [...] → only these source CIDRs are trusted for HXEP Default: ["0.0.0.0/0", "::/0"] (trust all) — restrict to pod CIDR in production - Packets from untrusted CIDRs: HXEP header stripped, socket address used - Set automatically via Helm when edge.enabled=true Protocols: TCP (parsed on first read, before TLS handshake), UDP (PacketConn wrapper), HTTP/3 QUIC (HXEP wrapping applied transparently, GSO/ECN/GRO OOB data preserved). Used by: reverse proxy, VPN (IKEv2), RADIUS (RADSEC + UDP), SSH bastion.Hot-reloadable: TLS certificates, mTLS CA pool, proxy mappings, geo/time rules, rate limit settings, fingerprint cache limits. Cold (restart required): listen addresses, HTTP/2 enable, proxy mode toggle, HXEP settings.
Troubleshooting
Common symptoms and diagnostic steps:
TLS handshake failures:
- Malformed ClientHello blocked: check 'logs search "Malformed TLS"' for details - block_malformed_tls=true rejects missing SNI, invalid TLS version, oversized ClientHello - ClientHello too large: check max_header_bytes setting (default 64KB) - TLS version rejected: only 0x0301-0x0304 (TLS 1.0-1.3) accepted - mTLS certificate popup on proxy routes: check per-SNI mTLS config, set mtls=false on mapping - CA rotation issues: 'certs list' to verify CA bundle, check 'logs search "CA rotation"' - Start with: 'diagnose domain <hostname>' for cross-subsystem checkFingerprint cache exhaustion:
- High memory from fingerprint storage: check fingerprint_max_entries setting - Adaptive TTL kicking in too aggressively: increase fingerprint_ttl_seconds - Per-IP abuse: 'logs search "fingerprint limit exceeded"' to identify attackers - fingerprint_max_entries_per_ip controls anti-abuse threshold (default: 10) - LRU eviction warnings: 'logs search "evict"' to monitor cache pressure - Check: 'metrics prometheus fingerprint' for cache utilization metricsSession affinity not working:
- Verify cluster_affinity=true in global config - Loopback connections (127.0.0.1, ::1) bypass affinity by design - VPN clients bypass affinity (checked against vpn.network.subnet CIDR) - Circuit breaker open for target node: 'proxy circuits' to check breaker states - No TLS = no fingerprint = no affinity: ensure clients connect via HTTPS - Check: 'cluster status' for node health, 'health components' for listener statusProxy mode issues (behind CDN/LB):
- 403 Forbidden: source IP not in proxy_cidr, check 'logs search "CIDR"' - 400 Bad Request: missing client IP header, verify proxy_header_clientip config - Rate limiting all users as one: JA4 unavailable in proxy mode, use proxy_header_clientfingerprint - Wrong client IP: X-Forwarded-For uses FIRST IP only (original client, not proxy chain) - Header injection: ensure proxy_cidr is restricted to actual proxy IPs - Distributed tracing broken: configure proxy_header_traceid for end-to-end correlation - mTLS through proxy: set proxy_header_clientcert and mtls_mode="optional" or "mandatory"QUIC/HTTP/3 fingerprint failures:
- Large ClientHello spanning packets: check quic_fingerprint_reassembly_max_packets - Reassembly timeout: increase quic_fingerprint_reassembly_timeout_s for slow networks - CRYPTO frame offset too large: quic_max_crypto_frame_offset default 64KB should suffice - Connection ID too long (>20 bytes): RFC 9000 violation, likely malicious trafficRate limiting misbehavior:
- All clients sharing one rate bucket: check rate_limit_type ("fingerprint" vs "ip") - Composite fingerprint unavailable: falls back to IP automatically - Per-route bypass not working: verify disable_rate_limit=true on the proxy mapping - Cluster-wide consistency: rate limits use distributed memory cache - Check: 'ratelimit stats' for current rate limiting state, 'metrics ratelimit' for countersHXEP (Hexon Edge Protocol) issues:
- HXEP not resolving real client IP: verify service.hexon_edge_protocol = true - Wrong client IP after HXEP: verify source IP falls within service.hexon_edge_cidr - "HXEP header stripped": source IP is outside trusted CIDRs — add pod/edge CIDR - Geo/rate limiting sees edge proxy IP instead of client: HXEP not enabled or CIDR mismatch - RADIUS NAS rejected after HXEP: real NAS IP doesn't match any [[radius.client]] CIDR - VPN IKEv2 sees wrong source: same HXEP config applies — check hexon_edge_cidr - Default trust-all CIDRs in production: security risk — restrict to actual pod network CIDR - Config: 'config show service' and check hexon_edge_protocol + hexon_edge_cidr fields - Helm sets HXEP automatically when edge.enabled=true in values.yamlConnection metrics missing:
- Metrics batched (flush every 100ms or on close): short-lived connections may lag - Check: 'health components' for listener health status - 'metrics prometheus listener' for per-listener connection countersGeo/time restriction issues:
- Geo blocking wrong country: verify MaxMind database is current - Bypass CIDR not working: geo_bypass_cidr checked before country/ASN rules - Time window mismatch: verify IANA timezone spelling (e.g., "America/New_York") - Overnight ranges supported: "22:00-06:00" spans midnight correctly - Check: 'geo lookup <ip>' to verify classification, 'geo timecheck <ip>' for time rulesArchitecture
Connection lifecycle:
- Client connects to TCP socket
- First bytes peeked to detect TLS, extract JA4 fingerprint + SNI
- TCP fingerprint extracted (window size, TTL, MSS, options ordering)
- Session affinity check: fingerprint hash maps to a cluster node
- If affinity target is a remote node: forward connection to that node
- If local: proceed with TLS handshake (per-SNI mTLS selection)
- If HTTP/2: extract HTTP/2 fingerprint from SETTINGS frame
- Compute composite hash: SHA256(ja4|http2|tcp) truncated to 32 hex chars
- Assign correlation ID, begin connection tracking
- HTTP middleware chain: telemetry -> client identification -> connection info ->
security headers -> geo restriction -> time restriction -> rate limit -> handler- Handler processes request, correlation ID propagates as trace_id across modules
- Metrics flushed on connection close
Fingerprint extraction pipeline:
Accept-level (before TLS): JA4 from ClientHello peek (zero-copy, buffered I/O) TLS callback: per-SNI mTLS mode selection Post-handshake: HTTP/2 SETTINGS fingerprint from connection preface TCP layer: p0f-style OS fingerprint from socket options (window, MSS, TTL) QUIC path: JA4Q from Initial packet, transport params fingerprint, multi-packet reassemblyGSO/ECN/GRO preservation:
All UDP wrappers (HXEP edge protocol and JA4Q fingerprint) preserve kernel offload capabilities so that QUIC can use: - GSO (Generic Segmentation Offload): send 64KB in one syscall, kernel splits into MTU packets - GRO (Generic Receive Offload): kernel coalesces packets, fewer syscalls on receive - ECN (Explicit Congestion Notification): congestion signals via IP header bits Without these, QUIC silently falls back to one syscall per packet. This affects both HTTP/3 reverse proxy and QUIC connector listeners.Fingerprint memory protection:
Address fingerprint map: configurable max entries (default 10,000) with adaptive TTL Per-IP limit: configurable (default 10), oldest replaced on overflow LRU eviction: sorts by timestamp, evicts oldest when TTL cleanup insufficient HTTP/2 cache: configurable size with percentage-based LRU eviction (1-50%) All maps use lock-free concurrent reads for performanceProxy mode flow:
Step 1: Validate source IP against configured proxy_cidr Step 2: Extract trace ID from proxy header, update correlation context Step 3: Extract and sanitize client IP (first IP from comma-separated list) Step 4: Fingerprint priority: dedicated header > client cert hash > client IP Step 5: Update context with real client identifiers for downstream modulesmTLS CA rotation flow:
1. ACME CA rotates, triggers listener update 2. CA pool rebuilt atomically (config CA + ACME CA merged) 3. HTTPS listeners gracefully restarted 4. Existing connections drain gracefully, new connections get fresh CA poolGraceful shutdown sequence:
1. Stop accepting new connections on all listeners 2. Close all listener sockets 3. Wait for active connections up to configurable timeout 4. Cancel contexts for remaining connections 5. Force-close any connections still open after timeoutPerformance characteristics:
- Pooled slice allocations reduce GC pressure during fingerprint extraction - Buffered I/O to minimize syscalls - Metrics batched to reduce overhead (flush every 100ms) - TCP Fast Open: 15-30% latency reduction for repeat clients (Linux 3.7+, macOS) - TCP Window Scaling: 20-40% throughput improvement for large transfers - SO_REUSEPORT on Linux for load balancing across coresRelationships
Module dependencies and interactions:
- Proxy: Provides per-SNI mTLS lookup. Listener provides fingerprint and client IP
context consumed by proxy for rate limiting, identity headers, and session affinity.- Sessions: Listener middleware manages session cookie extraction. Session validation
uses correlation IDs propagated through listener context.- Certificates: TLS termination uses certificates from the cert module. Per-mapping
certificates loaded via SNI callback. CA pool for mTLS verification rebuilt atomically on ACME CA rotation.- WAF: WAF rules applied in middleware chain after listener accepts connection.
Fingerprint available in context for WAF correlation.- X.509 authentication: mTLS mode controls TLS client auth level. In proxy mode,
client certificates injected from HTTP header. Certificate validation uses dynamic CA pool.- Rate limiting: Middleware reads composite fingerprint or client IP from context.
Composite fingerprint (JA4+HTTP/2+TCP) or IP-based, configurable per route.- Geo restriction: Middleware at router level uses client IP from context with
MaxMind GeoLite2 databases for country/ASN lookup.- Time restriction: Middleware after geo restriction uses client country for
timezone-aware time window matching.- VPN: VPN clients identified by subnet CIDR to bypass session affinity. Prevents
VPN tunnel connections from being forwarded to other cluster nodes.- Cluster affinity: Fingerprint hash selects cluster node for session routing.
Node health checked before forwarding. Forwarded connections use inter-node communication for transparent routing.- DNS: Listener does not directly use DNS, but proxy backends resolved via DNS module.
- Distributed tracing: Correlation IDs generated at listener level propagate as
trace_id through all operations, enabling end-to-end tracing across cluster nodes.- Connection pool: Backend connection management operates downstream of listener.
Listener handles inbound connections; connection pool handles outbound to backends.TCP/TLS Proxy
TCP/TLS proxy with mTLS authentication, passthrough mode, protocol-aware health checks, and geo/time-based access control
Overview
The TCP proxy service enables secure access to private TCP services such as databases (MySQL, PostgreSQL), caches (Redis, Memcached), and message queues. It operates in two modes: mTLS-authenticated mode for identity-aware access control, and passthrough mode for pure TCP load balancing.
mTLS Mode (auth = true, default):
- Server presents Hexon TLS certificate (per-mapping or global)
- Client presents an enrolled X.509 certificate
- Certificate validation: chain verification, expiration, OCSP/CRL checks
- Instant revocation via serial number index (no CRL propagation delay)
- Group-based authorization with allow/deny semantics
- Per-user rate limiting and connection limits
- Backend can be plain TCP or TLS (configured separately)
Passthrough Mode (auth = false):
- Pure TCP relay without TLS termination or protocol knowledge
- Raw bytes relayed between client and backend
- No authentication or group-based authorization
- Rate limiting applies per client IP instead of per user
- Useful for services that handle their own auth and TLS
Load balancing capabilities:
- Strategies: round_robin, weighted, least_connections, hash, random, maglev
- Hash keys: cert_serial (default), cn (username), ip (client address)
- Protocol-aware health checks: TCP, MySQL, PostgreSQL, Redis
- Circuit breaker with configurable error threshold and recovery
- Outlier detection with automatic backend ejection
Additional access control:
- Geo-IP restrictions per mapping (country, ASN allow/deny with bypass CIDRs)
- Time-based restrictions per mapping (day/hour windows with timezone support)
- Time windows with per-country/CIDR overrides for global deployments
- Geo and time checks execute BEFORE TLS handshake for fast rejection
Hot-reload support:
- New mappings created immediately
- Removed mappings gracefully drained then stopped
- Updated mappings drain existing connections, restart with new config
- Unchanged mappings preserve existing connections
- Config change detection via hash comparison (no-op for identical config)
Connection draining during shutdown or config updates:
- Stop accepting new connections on affected mapping
- Wait for existing connections to complete (configurable timeout)
- Force-close remaining connections after timeout
Config
Core configuration under [tcp_proxy] and [[tcp_proxy.mapping]]:
[tcp_proxy]
enabled = true # Enable TCP proxy service cert = "/etc/hexon/tcp-proxy.crt" # Default TLS certificate (path or inline PEM) key = "/etc/hexon/tcp-proxy.key" # Default TLS private key (path or inline PEM) buffer_size = 32768 # TCP relay buffer size in bytes (default: 32KB) connect_timeout = "10s" # Backend connection timeout (default: 10s) idle_timeout = "5m" # Idle connection timeout (default: 5m) max_connection_duration = "24h" # Maximum connection lifetime (default: 24h) max_connections_per_user = 100 # Max concurrent connections per user (0 = unlimited)[[tcp_proxy.mapping]]
name = "mysql-prod" # Display name for the mapping listen_port = 3306 # TCP port to listen on auth = true # mTLS mode (default: true); false = passthrough cert = "/etc/hexon/mysql-proxy.crt" # Per-mapping TLS certificate (overrides global) key = "/etc/hexon/mysql-proxy.key" # Per-mapping TLS private key protocol_hint = "mysql" # Protocol hint for logging and metrics backends = ["mysql-1:3306", "mysql-2:3306"] # Backend addressesLoad balancing options:
lb_strategy = "round_robin" # round_robin, weighted, least_connections, hash, random, maglev lb_weights = [5, 3, 2] # Weights for weighted strategy (must match backends count) lb_hash_key = "cert_serial" # Hash key: cert_serial (default), cn, ipBackend TLS options (three modes):
backend_tls = false # Plain TCP to backend (default) backend_tls = true # TLS to backend (encrypted) backend_tls_verify = true # Verify backend certificate (default: false) backend_tls_ca = "/path/to/ca.pem" # Custom CA for backend verification backend_tls_sni = "db.internal" # SNI for backend cert validation backend_tls_cert = "/path/to/client.pem" # Client cert for mTLS to backend backend_tls_key = "/path/to/client.key" # Client key for mTLS to backend backend_tls_min_version = "1.3" # Min TLS version to backend (default: 1.3)Authorization options (mTLS mode only):
allowed_groups = ["dba", "developers"] # Allow these groups (OR logic, empty = all) denied_groups = ["contractors"] # Deny these groups (takes precedence over allow) allowed_subnets = ["10.0.0.0/8"] # Client IP CIDR restrictionsHealth check options:
health_check_enabled = true # Enable health checks (default: true) health_check_interval = "10s" # Check interval (default: 10s) health_check_timeout = "5s" # Check timeout (default: 5s) health_check_type = "tcp" # tcp, mysql, postgresql, redisCircuit breaker options:
circuit_breaker_enabled = true circuit_breaker_error_threshold = 0.5 # Trip at 50% failure rate (0.0-1.0) circuit_breaker_window = "10s" # Error tracking window circuit_breaker_fallback_time = "30s" # Wait before half-open stateOutlier detection options:
outlier_detection_enabled = true outlier_detection_interval = "10s" # Analysis interval outlier_detection_failure_rate = 50 # Eject when failure rate exceeds 50% outlier_detection_min_requests = 10 # Minimum requests before analysis outlier_detection_ejection_time = "30s" # Base ejection duration outlier_detection_max_ejection = 50 # Max percentage of backends to ejectRate limiting:
rate_limit = "100/1m" # Connections per minute per user (mTLS) or per IP (passthrough) max_connections = 500 # Max concurrent connections for this mapping (0 = unlimited)Per-mapping overrides:
buffer_size = 65536 # Override global buffer size connect_timeout = "5s" # Override global connect timeout idle_timeout = "1m" # Override global idle timeout max_connection_duration = "1h" # Override global max durationGeo-IP restrictions (both modes):
geo_enabled = true # Enable geo-IP restrictions geo_allow_countries = ["US", "CA"] # Allow only these countries (ISO 3166-1 alpha-2) geo_deny_countries = ["CN", "RU"] # Deny these countries (takes precedence) geo_allow_asn = ["AS15169"] # Allow specific ASNs geo_deny_asn = ["AS12345"] # Deny specific ASNs geo_bypass_cidr = ["10.0.0.0/8"] # Skip geo checks for these CIDRsTime-based restrictions (both modes):
time_enabled = true # Enable time restrictions time_timezone = "America/New_York" # Default timezone (IANA format) time_allow_days = ["Mon","Tue","Wed","Thu","Fri"] # Allowed days time_deny_days = ["Sat", "Sun"] # Denied days (takes precedence) time_allow_hours = "09:00-18:00" # Allowed hours (24h format) time_deny_hours = "00:00-06:00" # Denied hours (takes precedence) time_bypass_cidr = ["10.0.0.0/8"] # Skip time checks for these CIDRsTime windows (per-country/CIDR overrides within a mapping):
[[tcp_proxy.mapping.time_windows]] countries = ["US"] # Apply this window to US clients timezone = "America/New_York" allow_days = ["Mon","Tue","Wed","Thu","Fri"] allow_hours = "09:00-21:00"Certificate and key fields accept both file paths and inline PEM content. Inline PEM is detected by the “-----BEGIN” prefix.
Client CA sources (mTLS mode):
Same as HTTP proxy for consistent PKI behavior: - authentication.x509.ca_pem (if configured) for external PKI certificates - ACME CA bundle (always) for Hexon-enrolled certificates No separate client_ca setting; use authentication.x509.ca_pem for external CAs.Troubleshooting
Common symptoms and diagnostic steps:
Connection refused on listen port:
- TCP proxy not enabled: check [tcp_proxy] enabled = true - Port conflict: another process already bound to listen_port - Mapping not loaded: check config syntax with 'config validate' - Firewall blocking: 'firewall rules' to check network-level rulesmTLS handshake failure:
- Client certificate not enrolled: must be enrolled via Hexon X.509 enrollment - Certificate expired: check certificate validity dates - Wrong CA: client cert must be signed by ACME CA or configured external CA - Certificate revoked: instant revocation via serial index; check 'certs x509 list' - TLS version mismatch: mTLS mode requires TLS 1.3 - Start with: 'diagnose user <username>' for cross-subsystem checkConnection denied after successful TLS:
- Group authorization failed: user missing required group in allowed_groups - Denied group match: user is in a denied_groups group (takes precedence) - Subnet restriction: client IP not in allowed_subnets - Rate limit exceeded: check 'metrics prometheus tcp_proxy_denied_total' - Per-user connection limit: max_connections_per_user reached - Check user access: 'directory user <username>' for group membershipBackend connection failures:
- Backend unreachable: 'net tcp <backend:port>' to verify connectivity - All backends unhealthy: 'proxy health' for health check status - Circuit breaker open: 'proxy circuits' for breaker states - Backend TLS issues: verify backend_tls_ca and backend_tls_sni settings - DNS resolution: 'dns test <backend-hostname>' to verifyGeo-IP access denied:
- Client country not in geo_allow_countries: 'geo lookup <ip>' to check - Client ASN in geo_deny_asn: deny takes precedence over allow - Geo checks run BEFORE TLS handshake: connection dropped at TCP level - Internal networks: add to geo_bypass_cidr to skip geo checksTime-based access denied:
- Outside allowed hours/days: 'geo timecheck <ip>' for current status - Timezone mismatch: verify time_timezone is correct IANA timezone - Time window overrides: check per-country/CIDR time_windows config - Time checks run BEFORE TLS handshake: connection dropped at TCP levelSlow connections or high latency:
- Buffer size too small: increase buffer_size for high-throughput workloads - Backend health degrading: check health check metrics - Connection pool exhaustion: check max_connections limit - Outlier detection ejecting healthy backends: review outlier thresholds - Circuit breaker flapping: check circuit_breaker_window and error_thresholdConnection drops after idle period:
- idle_timeout too short: increase for long-running database sessions - max_connection_duration reached: increase for persistent connections - Backend idle timeout: backend may close idle connections independentlyPassthrough mode issues:
- No TLS termination: Hexon relays raw bytes, cannot inspect traffic - No user identity: rate limiting uses client IP, not username - Health checks: only TCP health checks work in passthrough mode - Hash strategy: uses client IP since no certificate serial availableClient connectivity (mTLS mode requires TLS tunnel):
- Standard clients (mysql, psql) lack mTLS support in required format - Use socat for ad-hoc tunnels: socat TCP-LISTEN:local,fork OPENSSL:host:port,... - Use stunnel for persistent daemon-style tunnels - Client certificate must be PEM format from Hexon X.509 enrollmentRelationships
Module dependencies and interactions:
- loadbalancer: Pool management, backend selection, health checks (TCP, MySQL,
PostgreSQL, Redis), circuit breakers, outlier detection. Multi-algorithm support (round-robin, weighted, least-connections, hash, random, Maglev).- x509 (authentication): Client certificate validation in mTLS mode. Chain
verification, expiration, OCSP/CRL checks, instant revocation via serial index. Uses same client CA sources as HTTP proxy (authentication.x509.ca_pem + ACME CA).- directory: User information retrieval after successful mTLS authentication.
Group membership lookup for allow/deny group authorization.- geoaccess: Geo-IP restriction enforcement per mapping. Country and ASN
allow/deny with bypass CIDRs. Checks execute before TLS handshake for fast rejection of unauthorized connections.- timeaccess: Time-based restriction enforcement per mapping. Day/hour windows
with timezone support and per-country/CIDR overrides. Also checked before TLS handshake.- firewall: Network-level access rules applied before TCP proxy routing.
- certificates: TLS certificate management for proxy listeners. Per-mapping
or global certificate selection. Backend TLS configuration for encrypted upstream connections.- hotreload: Configuration change detection via file watcher or SIGHUP. Mappings
are diffed by hash to detect actual changes. Unchanged mappings preserve existing connections; changed mappings are drained and restarted.- sessions: No direct dependency. TCP proxy uses mTLS certificate-based
authentication, not session cookies.- proxy (HTTP): Shares client CA trust configuration for consistent PKI
behavior. No runtime dependency; they operate on different ports/protocols.Architecture
Connection flow for mTLS mode:
- Client initiates TCP connection to listen_port
- Geo-IP restrictions checked BEFORE TLS handshake (if geo_enabled)
- Time-based restrictions checked BEFORE TLS handshake (if time_enabled)
- TLS 1.3 handshake with mutual authentication (client certificate required)
- Certificate validated (chain verification, expiry, OCSP/CRL, revocation)
- User info retrieved from directory module (username, groups)
- ACL evaluated: allowed_groups (OR), denied_groups (precedence), allowed_subnets
- Rate limit checked per user via loadbalancer module
- Backend selected via loadbalancer module (strategy-based)
- Backend connection established (plain TCP, TLS, or mTLS depending on config)
- Bidirectional TCP relay started with configurable buffer_size
- Metrics and audit logs recorded on connection close
Connection flow for passthrough mode:
- Client connects via raw TCP to listen_port
- Geo-IP restrictions checked (if geo_enabled)
- Time-based restrictions checked (if time_enabled)
- Rate limit checked per client IP
- Backend selected via loadbalancer module (hash by client IP)
- Bidirectional TCP relay started
- Metrics recorded (no user identity, only client IP)
Security properties:
- TLS 1.3 only for mTLS mode client connections
- Post-quantum cryptography support (ML-KEM-768 + X25519)
- Certificate validation with OCSP stapling
- Instant revocation via serial index lookup (no CRL propagation delay)
- Per-user connection limits enforced at proxy level
- Geo and time checks before TLS handshake avoid expensive crypto for denied clients
Hot-reload mechanism:
- Config change detected (fsnotify or SIGHUP)
- New config parsed and validated
- Each mapping compared by hash to detect changes
- Unchanged mappings: no action, existing connections preserved
- New mappings: listener created, starts accepting connections
- Removed mappings: stop accepting, drain existing, close after timeout
- Updated mappings: drain existing connections, restart with new config
- All changes are atomic per-mapping (no partial updates)