Protection

End-to-Origin Encryption

AES-256-GCM over ECDH P-256 for all browser traffic — CDN/WAF/proxy protection

Overview

E2OE encrypts browser-to-server traffic, protecting against CDN/WAF/proxy plaintext exposure.

Architecture:

  Crypto layer:     Pure ECDH, HKDF, AES-GCM, sequence validation
  Runtime layer:    Channel state, middleware, HTTP handlers, Service Worker
  Proxy integration: Interception of E2OE endpoints on proxied hosts

Channel lifecycle:

  1. Browser loads page → channel.js injected (before app JS)
  2. ECDH key exchange → channel established in session
  3. Service Worker registered → transparently decrypts responses
  4. All fetch/XHR traffic encrypted between browser and server
  5. Middleware decrypts request, encrypts response — handlers see plaintext

Multi-channel support:

  - Sessions support multiple concurrent channels (one per tab/origin)
  - Multiple tabs work without conflicts (no 421 ping-pong)

Service Worker:

  - Transparently decrypts encrypted responses for navigations and API calls
  - First page load after init is plaintext (SW activating), subsequent encrypted
  - Falls back gracefully on any error

Session-backed channels:

  - Key stored in session metadata (JetStream KV replicated, cluster-aware)
  - TTL = session TTL (PoW or auth session)
  - No separate channel store or expiry management

Access gate: valid PoW cookie (pre-auth) or session cookie (post-auth). Config: e2oe = true requires protection.pow = true.

Injection coverage:

  - Service pages (login, console, profile): via server-rendered page templates
  - Proxied apps with rewrite_host=true: via HTML rewriter (script injection)
  - Proxied apps with rewrite_host=false: NOT injected (zero-copy streaming mode)
    These apps can add channel.js manually if needed. No injection = no encryption
    overhead, no breakage. E2OE is transparent: no channel header = passthrough.

Tiers:

  - Tier 2 (baseline): Standard ECDH, passive MitM resistance. Automatic for all browsers.
  - Tier 1 (WebAuthn): ECDH commitment in WebAuthn challenge, hardware MitM resistance.
    Auto-upgrades when user logs in with passkey. Persists across page loads via rebind proof.
    Inherits across origins via session secret in metadata.

Endpoints

POST /_hexon/e2oe/init ECDH key exchange (PoW or session cookie required)

  GET  /_hexon/e2oe/channel.js  Browser-side encryption JS (SRI hash, cache-busted)
  GET  /_hexon/e2oe/sw.js       Service Worker for response decryption

Config

[service]

  e2oe = false          # Enable E2OE (requires protection.pow = true)
  e2oe_strict = false   # Reject ALL requests without E2OE channel (no degradation)

Troubleshooting

Common issues:

  - E2OE not working: verify e2oe = true AND protection.pow = true
  - 421 errors: channel expired (session restart) or e2oe_strict enabled without channel
  - Proxied app not encrypted: check rewrite_host=true (required for JS injection)
  - Channel expired: parent session expired (check session_ttl / pow_session_ttl)
  - Tier 1 not activating: user must log in with passkey (WebAuthn), check audit logs
  - Multi-tab: each tab gets own channel, no conflicts
  - First page after init: plaintext (Service Worker activating), refresh to encrypt

Network Firewall

nftables-based firewall with per-peer VPN isolation and group-based policies

Overview

The firewall module provides a generic nftables abstraction layer for VPN implementations (IKEv2, WireGuard, OpenVPN). It manages per-peer chain isolation, flexible policy modes, group-based ACL, and cluster-wide operations.

Core capabilities:

Per-peer nftables chain isolation (dedicated chain per VPN client)
Three policy modes: restricted (captive portal), full (group-based ACL), custom
Group-based ACL with host aliases, port aliases, and LDAP group matching
Service DNAT for redirecting VPN traffic to the Hexon web portal
Wildcard DNS dynamic rule injection for pattern-based ACL entries
DNS background refresh with TTL-based re-resolution and automatic peer updates
Dual-stack IPv4/IPv6 with deterministic address mapping (ULA fd7a:ec0a::/48)
NAT64 for IPv6 VPN clients accessing IPv4 services
nftables connection pooling for reduced netlink socket overhead
DoS prevention via max rules per chain and wildcard hostname limits
Audit logging for all ACL changes (create, update, remove)
Input validation: ASCII-only peer IDs, interface name regex, CIDR validation

Policy modes:

  Restricted: Captive portal mode. VPN clients can only reach the Hexon web
    portal for authentication. All other traffic is dropped. Applied on initial
    VPN connect before the user authenticates.
  Full: Group-based ACL with internet access. Three-tier evaluation:
    1. Block RFC1918 and reserved networks (10/8, 172.16/12, 192.168/16,
       169.254/16, 127/8, 224/4, 240/4)
    2. Allow ACL exceptions based on user LDAP group memberships
    3. Allow all internet traffic (non-RFC1918)
    Applied after successful authentication via UpdatePeerChain.
  Custom: Operator-defined rules only. No automatic internet access or
    RFC1918 blocking. Full control over allowed destinations.

Platform: requires Linux with nftables (kernel 3.13+) and CAP_NET_ADMIN. Non-Linux platforms use stub implementations that return errors gracefully.

Config

Core configuration under [firewall] and related VPN sections:

[firewall]

  enabled = true                    # Enable firewall module
  blocked_networks = [              # Networks blocked in PolicyFull (RFC1918 default)
    "10.0.0.0/8",
    "172.16.0.0/12",
    "192.168.0.0/16",
  ]
  nft_pool_size = 5                 # nftables connection pool size (range: 1-100)
  max_rules_per_chain = 1000        # Max rules per peer chain (0 = unlimited, NOT recommended)

  # DNS Background Refresh for ACL hostnames
  dns_refresh_enabled = true        # Enable automatic hostname re-resolution (default: true)
  dns_refresh_min_interval = "1m"   # Min refresh interval / zero-TTL fallback
  dns_refresh_max_interval = "1h"   # Max refresh interval (TTL clamped to this)
  dns_refresh_jitter = 10           # Jitter percentage 0-100 (default: 10, prevents thundering herd)
  dns_refresh_init_timeout = "10s"  # Startup timeout for initial hostname resolution

Host aliases — reusable destination groups

[[firewall.aliases.hosts]]

  name = "hosts_ipa"                # Alias name referenced by rules
  hosts = [
    "192.168.11.0/24",              # CIDR notation
    "192.168.11.101",               # Single IP (auto /32 or /128)
    "ipa.hexon.private",            # Hostname (DNS resolved at rule generation, 5s timeout)
  ]

[[firewall.aliases.hosts]]

  name = "hosts_remote_dc"
  hosts = ["gitlab.internal", "jenkins.internal"]
  site = "prod-dc-a8f3c1"          # Route through connector (no nft rules, remote DNS)

Port aliases — reusable protocol+port combinations

[[firewall.aliases.ports]]

  name = "ports_web"
  entries = [
    { proto = "tcp", ports = [80, 443, 8080, 8443] },
  ]

[[firewall.aliases.ports]]

  name = "ports_ldap"
  entries = [
    { proto = "tcp", ports = [389, 636] },
    { proto = "udp", ports = [389] },
  ]

[[firewall.aliases.ports]]

  name = "icmp"
  entries = [{ proto = "icmp", ports = [] }]

ACL rules — map LDAP groups to allowed resources

[[firewall.rules]]

  rule = "allow_ipa_access"         # Rule name (used in metrics and audit logs)
  src = ["admins", "ipa_users"]     # LDAP group names (case-insensitive, OR matching)
  dst = ["hosts_ipa"]               # Host alias names
  ports = ["ports_web", "icmp"]     # Port alias names ("any" = all traffic)

Wildcard DNS limits (under VPN network config)

[vpn.network]

  wildcard_max_hosts_per_domain = 100  # Max hostnames per wildcard pattern (default: 100)
  wildcard_max_hosts_total = 1000      # Max total tracked hostnames (default: 1000)
  ipv6_enabled = false                 # Enable dual-stack IPv4+IPv6 (default: false)
  ipv6_prefix = "fd7a:ec0a::/48"      # ULA prefix for VPN client IPv6 addresses

Service DNAT redirects VPN client traffic to the Hexon web portal for authentication. It is auto-configured from the VPN and service settings at startup:

  - The service IP is auto-detected from the network interface
  - ServicePublicPort (VPN-facing, e.g., 8443) maps to the actual service port (e.g., 443)
  - Set ServicePublicPort = 0 to use the same port as the service

Hot-reloadable: ACL rules, host/port aliases, DNS refresh settings, blocked_networks. New rules apply to new VPN connections only. Existing VPN sessions require a reconnect or explicit peer chain update to pick up changes. Cold (restart required): firewall.enabled, nft_pool_size.

Troubleshooting

Common symptoms and diagnostic steps:

User cannot reach internal services after VPN connect:

  - Check group membership: 'directory user <username>' to verify LDAP groups
  - Check ACL rule matching: user groups must match at least one src group (OR, case-insensitive)
  - Verify peer chain policy: 'nft list chain inet hexon_ikev2 hexon_ikev2_peer_<user>_<ip>'
  - Check if still in PolicyRestricted (pre-auth): look for limited port rules only
  - DNS ACL blocking: unauthorized domains are refused (RCODE 5) by the DNS ACL check
  - Verify host alias resolution: hostnames have 5s DNS timeout, check dns module health

Peer isolated unexpectedly (all traffic dropped):

  - Check per-peer chain exists: 'nft list chains inet hexon_ikev2 | grep peer'
  - Verify jump rule in forward chain: 'nft list chain inet hexon_ikev2 hexon_ikev2_forward'
  - Check max_rules_per_chain limit: error message includes "reached maximum rule limit"
  - Group sync timing: groups fetched when peer chain is created/updated, not continuously
  - DNS refresh failure: hostname resolution failed, rules removed (check metrics)

Rules not applying after config change:

  - Config changes only apply to NEW peer chain operations
  - Existing VPN sessions need a reconnect or explicit peer chain update
  - Verify config loaded: check structured log for firewall config reload
  - Host alias DNS resolution: hostnames resolved at rule generation, not config load

Service DNAT not working (connection refused via VPN):

  - Check DNAT rule: 'nft list chain inet hexon_ikev2_nat hexon_ikev2_prerouting'
    Expected: iifname "hexon0" tcp dport 8443 dnat ip to 192.168.21.10:443
  - Check peer chain service acceptance: 'nft list chain inet hexon_ikev2 <peer_chain>'
    Expected: tcp dport 443 ip daddr 192.168.21.10 accept
  - Service IP changed: IP detected at Initialize time only, restart VPN or re-initialize
  - Port conflict: ServicePublicPort conflicts with another service on VPN interface

Timeout after DNAT (packet reaches service but no response):

  - Verify service acceptance rule exists in peer chain (all policy types get it)
  - Check NAT masquerade: return traffic must be NATed back through VPN interface
  - Conntrack state: established/related rule must precede service acceptance rule

Wildcard DNS rules not being injected:

  - Verify wildcard pattern in ACL host aliases (e.g., "*.example.com")
  - Check wildcard limits: wildcard_max_hosts_per_domain (default 100),
    wildcard_max_hosts_total (default 1000)
  - Monitor metrics: firewall.wildcard_limit_hit_total indicates limit reached
  - TTL expiry: rules removed after DNS TTL expires (min 5min, max 24h)
  - Rules injected synchronously BEFORE DNS response (no race condition)

DNS background refresh not updating rules:

  - Verify dns_refresh_enabled = true (default)
  - Check refresh cycle metrics: firewall.dns_refresh_cycle_total
  - Hostname change detection: firewall.dns_hostname_change_total
  - Rate limiting: peer updates limited to 1/sec burst 5 per peer
  - DNS module must be enabled and healthy

IPv6 connectivity issues:

  - Verify ipv6_enabled = true in [vpn.network]
  - Check dual-stack jump rules: 'nft list chain inet hexon_ikev2 hexon_ikev2_forward'
    Should show both IPv4 (@nh,96,32) and IPv6 (@nh,64,128) jump rules per peer
  - Verify IPv6 gateway rules: peer chain should include ip6 daddr fd7a:ec0a::... rules
  - Test: 'ping6 fd7a:ec0a::6440:cc01' from VPN client
  - NAT64 service access: 'curl -6 https://[fd7a:ec0a::6440:cc01]:8443/'
  - "No route to host": check XFRM policies include IPv6 selectors
  - "Connection timeout": IPv6 jump rule missing in forward chain

nftables errors or operation failures:

  - Verify Linux with nftables: 'nft --version'
  - Check CAP_NET_ADMIN capability: required for nftables operations
  - Thread safety: all ops protected by mutex with deadlock detection
  - Connection pool exhaustion: increase nft_pool_size for high-concurrency
  - Non-Linux: stub returns errors, VPN works without firewall (insecure)

Firewall audit and metrics:

  - Audit events: telemetry log at INFO level with action, peer_id, chain_name, rule_count
  - ACL metrics: firewall.acl_rules_generated_total, firewall.acl_rule_errors_total
  - DNS metrics: firewall.dns_resolution_total, firewall.dns_resolution_duration
  - CIDR validation: firewall.cidr_validation_errors_total (labels: source)
  - Wildcard metrics: firewall.wildcard_hostnames_total (gauge), dynamic_rules_injected_total

Security

Input validation and security hardening:

Peer ID validation (anti-homoglyph):

  Only ASCII printable characters (33-126) allowed. Blocks Unicode homoglyph attacks
  where lookalike characters (Cyrillic 'e' vs Latin 'e') could spoof identities.
  Valid: "user@example.com", "alice_123", "device-01"
  Invalid: "useг@example.com" (Cyrillic), spaces, tabs, control characters

Interface name validation:

  Strict regex: ^[a-zA-Z0-9_-]+$ with 15-char max (Linux kernel limit).
  Prevents command injection via crafted interface names.

Custom rule validation:

  SourceIP/DestIP: max 45 characters (IPv6 length), validated with net.ParseCIDR()
  Comment: max 256 characters
  Protocol: must be "tcp", "udp", "icmp", or "all"
  Action: must be "accept", "drop", or "reject"
  Ports: must be valid 0-65535

Network segmentation (PolicyFull):

  RFC1918 and reserved ranges blocked FIRST, then ACL exceptions applied.
  No group memberships = no ACL rules = internet-only access (fail-safe).
  ACL rules cannot override internet blocks, only RFC1918 blocks.
  Invalid config blocks peer chain creation entirely (fail-closed).

Service DNAT security:

  Service acceptance rule cannot be removed via custom rules (always present).
  DNAT does not bypass application-level authentication/authorization.
  Service IP auto-detected from interface (prevents IP spoofing in config).
  Interface-based matching: all traffic on VPN interface port goes to service.

DNS refresh security:

  Jitter uses crypto/rand (not math/rand) to prevent timing prediction attacks.
  5-second DNS resolution timeout prevents DoS via slow DNS responses.
  NXDOMAIN removes hostname from rules immediately (fail-safe).
  Transient DNS errors keep last known good IPs (fail-open for availability).

Wildcard DNS DoS prevention:

  Per-domain limit (default 100) and total limit (default 1000).
  Rules injected synchronously before DNS response (prevents race condition).
  TTL-based expiry ensures eventual cleanup (min 5min, max 24h).
  Dynamic rules are tagged for deduplication and audit trail.

Thread safety:

  All nftables operations are protected against concurrent access.
  Deadlock detection warns if the same operation attempts to re-acquire a lock.
  Connection pool is designed for safe concurrent use.

Audit logging:

  All ACL changes logged at INFO with structured fields:
    action (create/update/remove), peer_id, chain_name, rule_count, timestamp.
  Flows through telemetry module to log files, SIEM, compliance reporting.

Interpreting tool output:

  'firewall rules':
    Normal: Rules listed per group with allow/deny targets and ports
    Empty: No firewall rules configured — all traffic allowed by default
    Action: Check specific user → 'firewall check <username>' for effective permissions

  'firewall check <username>':
    Allowed targets: List of host:port patterns the user can access (via group membership)
    No rules match: User has no firewall-granted access — check group membership
    Action: Missing access → verify user groups with 'directory user <username>',
    then check which groups have rules in 'firewall rules'

Relationships

Module dependencies and interactions:

vpn.ikev2: Primary consumer. Creates restricted peer chains on connect, upgrades to

  full policy after web authentication. Passes LDAP groups for ACL evaluation.
  Service DNAT enables auth flow over VPN tunnel (captive portal pattern).

vpn.wireguard: Same firewall API as IKEv2. Different VPNType and table name.
vpn.openvpn: Same firewall API. VPNType=openvpn.
Directory: Group membership drives ACL rule matching (case-insensitive,

  OR semantics). Groups fetched at UpdatePeerChain time, not cached in firewall.

sessions: Session revocation or VPN disconnect triggers peer chain removal.

  Session upgrade (post-auth) triggers peer chain update to full policy mode.

dns: Hostname resolution for ACL host aliases (5s timeout). DNS background refresh

  uses Hexon DNS module for DNSSEC, distributed cache (80-95% hit rate), adaptive
  resolver selection. DNS layer enforces ACL permissions (returns REFUSED RCODE 5 on
  denial). Wildcard DNS queries trigger dynamic rule injection for matching patterns.

distributed cache: Cluster-wide hostname tracking for wildcard DNS with

  automatic TTL expiry. Local counters for efficient limit checks.

config: Hot-reload of ACL rules, host/port aliases, DNS refresh settings.

  Config is read directly on each operation (never cached internally) so changes
  take effect within one refresh interval.

telemetry: Structured logging (Info/Debug/Warn/Error) with vpn_type, peer_id,

  peer_ip fields. Metrics for ACL operations, DNS resolution, wildcard tracking.

Rate limiting: Connection-level integration for VPN traffic.
ippool: IPv4-to-IPv6 deterministic mapping (fd7a:ec0a::/48 ULA prefix) for

  dual-stack peer chain creation.

Protection

Multi-layered request and protocol protection: WAF, rate limiting, geo/time access, PoW, size limiting, IDS, and password policy

Overview

The protection subsystem provides defense-in-depth security across HTTP, IKEv2/VPN, and process layers. Each module targets a specific threat domain and operates independently.

Subsystems:

  waf - Web Application Firewall (Coraza v3 with OWASP CRS). Inspects HTTP
    requests and responses for SQL injection, XSS, path traversal, command
    injection, and other application-layer attacks. Supports anomaly scoring
    and self-contained blocking modes with four OWASP paranoia levels.

  ratelimit - Distributed rate limiting with client fingerprinting. Tracks
    request counts per JA4 TLS fingerprint or IP address using a token bucket
    algorithm. Automatically bans clients exceeding thresholds. Cluster-wide
    protection via distributed storage with per-host isolation.

  geoaccess - Geo/IP and ASN access control using MaxMind databases. Evaluates
    client IP against country and ASN allow/deny lists. Supports CDN geo header
    trust, CIDR bypass rules, and IP lookup caching.

  timeaccess - Time-based access control with IANA timezone awareness. Enforces
    day-of-week and hour-of-day restrictions per country or CIDR range. Supports
    overnight hour ranges, deny rule overrides, and default fallback windows.

  pow - Proof-of-Work challenge-response anti-abuse. SHA-256 based challenges
    with configurable difficulty, anti-automation honeypot fields, random form
    field names, and timing validation to prevent bot submissions.

  sizelimit - HTTP request body size enforcement. Configurable default limit
    with per-host/path exceptions using exact, wildcard, or regex matching.

  ikev2ids - IKEv2 intrusion detection system for VPN traffic. Protocol
    validation, signature-based detection, statistical anomaly analysis,
    and DoS flood prevention. Inline inspection at sub-50 microsecond latency.

  password - Password strength validation using the zxcvbn algorithm. Pattern
    detection, dictionary matching, and entropy analysis rather than simple
    character rules.

HTTP middleware execution order:

  1. ratelimit  - Block abusive clients first (cheapest check)
  2. sizelimit  - Enforce body size limits
  3. pow         - Proof-of-Work challenge for allowed clients
  4. waf         - Application-layer attack detection
  5. geoaccess   - Geographic restrictions
  6. timeaccess  - Temporal access policy

Relationships

Cross-subsystem interactions:

Listener: Chains ratelimit, sizelimit, pow, and waf middleware in order

  before routing. Geo and time checks also integrated at the listener level.

VPN: IKEv2 IDS inspects every incoming UDP packet on ports 500 and 4500

  before protocol state machine processing.

Proxy: WAF wraps the reverse proxy handler. Per-mapping overrides allow

  disabling rate limiting or size limiting on specific routes.

Password change: Validates new passwords before LDAP update during

  password change and reset flows.

Configuration: Most subsystems read from [protection] or [service] config.

  WAF, ratelimit, geo, and time settings are hot-reloadable.

Admin CLI: Exposes diagnostics via metrics ratelimit, metrics sizelimit,

  metrics waf, metrics ids, metrics pow, geo lookup, geo check, geo timecheck.

Geo/IP and ASN Access Control

MaxMind-based geographic and ASN access restrictions with CDN header support and CIDR bypass

Overview

The geoaccess module provides geographic and network-based access control using MaxMind GeoLite2/GeoIP2 databases. It evaluates client IP addresses against country and ASN allow/deny lists to block or permit requests before they reach application logic.

Core capabilities:

Country-based allow/deny lists using ISO 3166-1 alpha-2 codes
ASN-based allow/deny lists for blocking hosting providers and VPN networks
CIDR-based bypass rules for trusted internal networks
CDN geo header integration (Cloudflare, AWS CloudFront, Fastly)
IP lookup caching for high-throughput performance
Dual operation modes: access control (Check) and informational (Lookup)
Graceful degradation when MaxMind databases are missing or invalid

Two operation modes:

  Check  - Validate request against geo/ASN restrictions (returns allowed/blocked)
  Lookup - Retrieve geo information without access control (informational only)

Evaluation priority (first match wins):

  1. Bypass CIDR check (skip all checks if client IP matches)
  2. ASN deny check (block if ASN is in deny list)
  3. ASN allow check (block if ASN is NOT in allow list, when allow list is set)
  4. Country deny check (block if country is in deny list)
  5. Country allow check (block if country is NOT in allow list, when allow list is set)
  6. Allow (default - permit if no rules matched)

Database requirements:

  - GeoLite2-Country.mmdb (required for country filtering)
  - GeoLite2-ASN.mmdb (optional, required only for ASN filtering)

If database files are missing or invalid, the module falls back to an embedded database (if available) or disables itself with an error log. The service continues running without geo restrictions rather than failing completely (fail-open for availability).

CDN geo header support: When deployed behind a CDN, the country code can be provided via HTTP header instead of performing a MaxMind database lookup. This is faster and often more accurate since CDNs have extensive IP intelligence databases.

Common CDN headers:

  - CF-IPCountry (Cloudflare)
  - CloudFront-Viewer-Country (AWS CloudFront)
  - Fastly-Client-GeoIP-Country (Fastly)

When CDNCountry is set and valid (2-letter ISO code):

  - MaxMind country lookup is skipped entirely
  - ASN lookup still occurs if ASN rules are configured (CDNs do not provide ASN)
  - The CDN-provided country is used for all country-based checks

Common ASN examples for blocking:

  Cloud/Hosting: 14061 (DigitalOcean), 16509 (AWS), 15169 (Google Cloud),
    8075 (Azure), 13335 (Cloudflare), 20473 (Vultr), 63949 (Linode)
  VPN providers: 55967 (NordVPN), 9009 (M247), 212238 (ExpressVPN)

Config

Configuration in hexon.toml under [service]:

[service]

  geo_enabled = true                     # Enable geo access control
  geo_database = "/etc/hexon/GeoLite2-Country.mmdb"   # Path to country database
  geo_asn_database = "/etc/hexon/GeoLite2-ASN.mmdb"   # Path to ASN database (optional)
  geo_allow_countries = ["US", "CA", "GB"]             # ISO codes to allow (empty = all)
  geo_deny_countries = []                              # ISO codes to deny
  geo_allow_asn = []                                   # ASN numbers to allow (empty = all)
  geo_deny_asn = ["14061", "16509", "15169"]           # ASN numbers to deny
  geo_bypass_cidr = ["10.0.0.0/8", "100.64.0.0/10"]   # CIDRs that skip all checks
  geo_deny_code = 403                                  # HTTP status code for blocked requests
  geo_deny_message = ""                                # Custom deny message (empty = default)

  # CDN geo header (requires proxy = true and proxy_cidr set)
  proxy = true                           # Required to trust proxy/CDN headers
  proxy_cidr = ["173.245.48.0/20"]       # Trusted proxy IP ranges
  geo_country_header = "CF-IPCountry"    # CDN header containing country code

Configuration notes:

Country codes must be ISO 3166-1 alpha-2 (e.g., “US”, “GB”, “DE”)
ASN numbers are strings without the “AS” prefix (e.g., “14061” not “AS14061”)
When both allow and deny lists are set, deny takes precedence (checked first)
Empty allow list means “allow all” for that category
CIDR bypass is checked before any country/ASN evaluation
geo_country_header requires proxy = true and valid proxy_cidr
Hot-reloadable: all geo settings can be changed without restart
Database file changes require restart (loaded at startup only)

Troubleshooting

Common symptoms and diagnostic steps:

Legitimate users blocked by geo restrictions:

  - Check user's detected country: use 'geo lookup <ip>' in admin CLI
  - Verify allow_countries includes the user's country code
  - MaxMind accuracy varies by region; consider adding nearby countries
  - VPN users may show the VPN exit country, not their actual country
  - CDN header may override MaxMind: check geo_country_header setting
  - Country code case: codes are normalized to uppercase internally

Users from blocked countries still getting through:

  - Check bypass CIDR: user IP may match geo_bypass_cidr
  - CDN header spoofing: ensure proxy = true and proxy_cidr is restrictive
  - IPv6 addresses: verify MaxMind database covers IPv6 ranges
  - Cache hit returning stale allow: cache entries expire, wait for refresh

ASN blocking not working:

  - Verify geo_asn_database path is correct and file exists
  - ASN database is optional: if missing, ASN checks are silently skipped
  - Cloud provider IPs change: MaxMind ASN data may be stale
  - Shared hosting: multiple ASNs may serve the same IP range

CDN geo header issues:

  - Header not present: CDN may not send header for all requests
  - Invalid country code: non-2-letter codes fall back to MaxMind lookup
  - proxy = false: CDN headers are ignored when proxy is not enabled
  - proxy_cidr mismatch: request not from trusted proxy range
  - Header name case: HTTP headers are case-insensitive (handled automatically)

Performance concerns:

  - Check cache hit rate: geoaccess.cache metric (hit vs miss)
  - High miss rate: increase cache TTL or check for IP diversity
  - MaxMind lookup latency: typically sub-millisecond per lookup
  - CDN header mode skips MaxMind lookup entirely (faster)

Geo module not loading:

  - Missing database file: check error log for "geoaccess" messages
  - Invalid mmdb format: re-download from MaxMind
  - File permissions: hexon process must have read access to database files
  - Module disabled: verify geo_enabled = true in config

Metrics for diagnostics:

  - geoaccess.requests_total (status=allowed|blocked, reason=...)
  - geoaccess.blocked_by_country (country label)
  - geoaccess.blocked_by_asn (asn label)
  - geoaccess.cache (result=hit|miss)
  - geoaccess.cdn_country_used (country label)

Security

Security considerations and hardening:

CDN header trust model:

  CDN geo headers are only trusted when all conditions are met:
    - proxy = true is configured (required)
    - proxy_cidr defines trusted proxy IP ranges
    - Connection originates from within proxy_cidr ranges
  Without these safeguards, attackers can spoof CDN headers to bypass geo blocks.

Input validation:

  - Country codes must be exactly 2 ASCII letters (a-z, A-Z)
  - Codes are normalized to uppercase (e.g., "us" becomes "US")
  - Invalid codes (numeric, symbols, unicode) fall back to MaxMind lookup
  - Whitespace is trimmed from header values
  - ASN numbers validated as numeric strings

Evaluation order security:

  Deny lists are always evaluated before allow lists within each category.
  This ensures that explicitly denied entries cannot be bypassed by being
  in an allow list. CIDR bypass is checked first to ensure internal
  networks always have access regardless of geo restrictions.

Fail-open behavior:

  If MaxMind databases are missing or corrupt, the module disables itself
  and allows all traffic. This is intentional for availability but means
  geo restrictions silently stop working. Monitor the error log for
  database loading failures.

IP spoofing prevention:

  When behind a reverse proxy, the module uses the client IP extracted by
  the trusted proxy chain (X-Forwarded-For validated against proxy_cidr),
  not the raw connection IP. Direct connections use the TCP source address.

Rate limiting interaction:

  Geo checks happen before rate limiting in the request pipeline. A blocked
  geo request never reaches the rate limiter, so geo-blocked IPs do not
  consume rate limit tokens.

Relationships

Module dependencies and interactions:

Request pipeline: Primary consumer. Geo checks are performed early in the

  pipeline before routing, authentication, or application logic.
  Uses the extracted client IP from trusted proxy headers.

Rate limiting: Geo checks precede rate limiting. Blocked requests

  do not consume rate limit tokens. Both modules share the client IP extraction.

Proof-of-work: PoW challenges may be served before geo checks depending

  on configuration order. Typically geo blocks first, then PoW for allowed regions.

config: All geo settings are hot-reloadable. Reads current settings dynamically for

  values on each request (no stale cache). Database paths are cold config
  (restart required to reload mmdb files).

telemetry: Structured logging for blocked requests with country, ASN, reason.

  Metrics exported for monitoring dashboards and alerting.

dns: MaxMind lookups are IP-based (no DNS dependency). However, CDN header

  trust depends on proxy_cidr which may include CDN IP ranges that change.

Directory: No direct dependency. Geo checks are pre-authentication

  and identity-independent. Applied uniformly to all requests.

sessions: No session dependency. Each request is evaluated independently

  against current geo rules (stateless check).

vpn.ikev2: VPN connections can be geo-checked at the IKE_SA_INIT stage

  using the client's source IP before tunnel establishment.

Admin CLI: Exposes ‘geo lookup’, ‘geo check’, and ‘geo timecheck’ commands

  for diagnostics and testing.

IKEv2 Intrusion Detection System

Inline IDS for IKEv2/IPsec with protocol validation, signature detection, anomaly analysis, and DoS prevention

Overview

The ikev2ids module provides a comprehensive intrusion detection system specifically designed for IKEv2/IPsec VPN traffic. It integrates directly into the IKEv2 packet processing pipeline for low-latency inline inspection.

Four layers of threat detection:

Protocol Validation (RFC 7296 compliance):

   - IKE version checking (must be 2.0)
   - Valid exchange type verification
   - Message length consistency checks
   - Reserved flag validation

Signature-Based Detection (10 built-in signatures):

   - SIG-001: CVE-2018-5389 INVALID_KE_PAYLOAD DoS
   - SIG-002: Malformed SA Proposals
   - SIG-003: Weak cipher detection (DES, 3DES, MD5)
   - SIG-004: Excessive re-keying DoS
   - SIG-005: Weak DH groups (less than 2048-bit)
   - SIG-006: Oversized messages (buffer overflow attempts)
   - SIG-007: Message ID manipulation (replay attacks)
   - SIG-008: NULL authentication attempts
   - SIG-009: IKEv2 fragmentation attacks
   - SIG-010: DELETE payload floods

Anomaly Detection (statistical analysis):

   - Packet size anomaly detection using exponential moving averages
   - Connection rate anomaly tracking per client IP
   - Configurable sensitivity threshold (0.0 to 1.0)

DoS Detection (connection flood prevention):

   - Connection flood detection with configurable threshold per IP per minute
   - Authentication failure flood detection
   - Automatic state cleanup every 5 minutes (30-minute TTL)

Performance characteristics:

  - Expected latency: less than 50 microseconds per packet
  - Memory overhead: approximately 500 bytes per tracked client
  - Automatic state cleanup every 5 minutes with 30-minute TTL

Operational modes:

  - block_malicious = true: detected threats cause packet to be dropped (inline prevention)
  - block_malicious = false: threats are logged but packets are allowed (detection only)

Config

Configuration in hexon.toml under [protection.ikev2ids]:

[protection.ikev2ids]

  enabled = true                  # Enable/disable the IDS module
  block_malicious = true          # Block detected threats (false = log only / detection mode)
  log_level = "info"              # Logging verbosity for IDS events
  dos_threshold = 100             # Maximum connections per minute per client IP
  anomaly_sensitivity = 0.95      # Statistical anomaly threshold (0.0-1.0, higher = more sensitive)

Configuration notes:

enabled = false completely disables packet inspection (zero overhead)
block_malicious = false is useful for initial deployment to assess false positives
dos_threshold applies per unique client IP address per minute window
anomaly_sensitivity of 0.95 means traffic outside 95th percentile is flagged
Lower anomaly_sensitivity reduces false positives but may miss subtle attacks
All settings are evaluated at packet inspection time (hot-reloadable)

Recommended configurations by environment:

Standard deployment:

  enabled = true
  block_malicious = true
  dos_threshold = 100
  anomaly_sensitivity = 0.95

High-security (stricter, more false positives):

  enabled = true
  block_malicious = true
  dos_threshold = 50
  anomaly_sensitivity = 0.99

Initial rollout (detection only):

  enabled = true
  block_malicious = false
  dos_threshold = 200
  anomaly_sensitivity = 0.90

Troubleshooting

Common symptoms and diagnostic steps:

Legitimate VPN clients being blocked:

  - Set block_malicious = false temporarily to confirm IDS is the cause
  - Check metrics: ikev2ids_threats_detected_total with type and severity labels
  - Anomaly detection may flag unusual but legitimate traffic patterns
  - Lower anomaly_sensitivity (e.g., 0.90) to reduce false positives
  - Check if client VPN software sends non-standard IKEv2 extensions
  - SIG-003 (weak cipher): client may be offering DES/3DES/MD5 in proposals
  - SIG-005 (weak DH): client may propose DH groups below 2048-bit

High false positive rate on anomaly detection:

  - Reduce anomaly_sensitivity from 0.95 to 0.90 or 0.85
  - Anomaly baselines adapt over time; new deployments have higher false positives
  - Exponential moving averages need time to converge to normal patterns
  - Consider running in detection-only mode (block_malicious = false) initially

DoS threshold too aggressive:

  - Increase dos_threshold if legitimate users trigger connection flood detection
  - Mobile users may reconnect frequently due to network changes
  - NAT environments: multiple users behind same IP may exceed per-IP threshold
  - Monitor ikev2ids_threats_detected_total with type=dos label

IDS not detecting known attacks:

  - Verify enabled = true in configuration
  - Check that packet reaches IDS: inspect ikev2ids_packets_inspected_total
  - Signature detection is pattern-based; zero-day attacks need anomaly layer
  - Protocol validation requires well-formed IKE headers to parse
  - Severely malformed packets may be dropped before reaching IDS

Performance impact concerns:

  - Monitor ikev2ids_inspection_duration histogram for latency
  - Expected: less than 50 microseconds per packet
  - High latency indicates too many tracked clients (memory pressure)
  - State cleanup runs every 5 minutes; 30-minute TTL for inactive clients
  - If latency exceeds 1ms, check total tracked client count

IDS metrics not appearing:

  - Verify module is enabled and processing packets
  - Check telemetry module health
  - Metrics are exported as Prometheus-compatible counters and histograms
  - ikev2ids_packets_inspected_total should increment for any VPN traffic

Specific signature troubleshooting:

  SIG-001 (CVE-2018-5389): legitimate if client sends INVALID_KE_PAYLOAD
    during normal negotiation retries (rare but possible)
  SIG-004 (re-keying): high re-key rate may be legitimate under high traffic;
    adjust threshold if needed
  SIG-006 (oversized): legitimate if using certificate-based auth with
    large certificate chains
  SIG-009 (fragmentation): legitimate with large payloads; check if client
    uses IKEv2 fragmentation (RFC 7383)
  SIG-010 (DELETE floods): may occur during mass session cleanup events

Security

Security design and considerations:

Fail-open design:

  If the IDS module is disabled or encounters an internal error during
  inspection, packets are allowed through. This prioritizes VPN availability
  over security enforcement. Monitor the module health to ensure continuous
  protection.

Inline vs passive inspection:

  The IDS operates inline in the packet processing pipeline. When
  block_malicious = true, detected threats cause immediate packet drop
  before the IKEv2 state machine processes them. This prevents exploitation
  but means false positives cause connection failures.

Signature coverage:

  The 10 built-in signatures cover known CVEs and common IKEv2 attack
  patterns. They do not cover application-layer attacks that occur after
  tunnel establishment. Post-tunnel traffic is handled by the WAF and
  firewall modules.

State tracking security:

  Per-client state (connection counts, anomaly baselines) is stored in
  memory with automatic 30-minute TTL cleanup. An attacker rotating source
  IPs can evade per-IP DoS detection. Consider combining with network-level
  rate limiting for comprehensive protection.

Cipher policy enforcement:

  SIG-003 detects weak cipher proposals (DES, 3DES, MD5) but does not
  enforce cipher policy. The IKEv2 negotiation module handles actual cipher
  selection. IDS detection provides visibility into clients offering weak
  ciphers even when the server rejects them.

Memory exhaustion prevention:

  State cleanup runs every 5 minutes with 30-minute TTL for inactive
  entries. Under extreme conditions (millions of unique IPs), memory usage
  grows linearly at approximately 500 bytes per tracked client. Monitor
  memory usage in high-traffic deployments.

Relationships

Module dependencies and interactions:

vpn.ikev2: Primary consumer. Calls InspectPacket for every incoming UDP

  packet on ports 500 and 4500 before IKEv2 state machine processing.
  Block response causes immediate packet drop with no IKE response sent.

Firewall: Complementary protection. Firewall handles

  post-authentication network ACL; IDS handles pre-authentication protocol
  threats. No direct dependency between modules.

Rate limiting: IDS DoS detection is IKEv2-specific (protocol-aware).

  Rate limiting is generic connection-level. Both can trigger independently.
  IDS provides deeper protocol insight; ratelimit provides broader coverage.

telemetry: All threat detections logged with structured fields including

  threat type, severity, signature ID, client IP, and packet metadata.
  Metrics exported for monitoring and alerting.

config: Settings are hot-reloadable. Settings read dynamically at inspection time

  so changes take effect immediately without restarting VPN service.

Admin CLI: Exposes ‘metrics ids’ command for viewing IDS statistics

  including packets inspected, threats detected, and packets blocked.

Proof-of-work: No direct interaction. PoW operates at HTTP layer while

  IDS operates at UDP/IKEv2 layer. Different protocol domains.

sessions: No direct interaction. IDS operates before session establishment.

  Session-level security is handled by the IKEv2 authentication module.

Proof-of-Work Challenge

SHA-256 proof-of-work challenges with anti-automation features for bot detection and abuse prevention

Overview

The PoW module provides SHA-256 proof-of-work challenge-response protection to prevent automated abuse without requiring third-party CAPTCHA services.

Core capabilities:

SHA-256 challenges with configurable difficulty (leading zero bits)
Anti-automation: randomized form field names to defeat hardcoded bots
Honeypot decoy fields that catch bots filling all form fields
Timing validation to detect pre-computed solutions
One-time-use challenges with TTL expiration (prevents replay attacks)
Session-based validation (solve once, access for session duration)
POST body preservation across the challenge flow
Distributed challenge storage via cluster storage

Challenge-response flow:

  1. Client request arrives without valid PoW session
  2. Middleware intercepts and renders challenge page inline
  3. Client receives: challenge ID, SHA-256 challenge bytes, difficulty
  4. Client JavaScript brute-forces a nonce where
     SHA-256(challenge + nonce) has N leading zero bits
  5. Client submits nonce along with form values
  6. Server validates: timing, honeypots, hash correctness, expiry
  7. On success: PoW session cookie set, original request proceeds

Difficulty recommendations:

  16 bits: ~65K hashes, ~0.1 seconds (light protection)
  20 bits: ~1M hashes, ~1 second (default, good balance)
  24 bits: ~16M hashes, ~15 seconds (high protection)
  28 bits: ~256M hashes, ~4 minutes (extreme, may frustrate users)

Runs third in the HTTP middleware chain (after ratelimit and sizelimit).

Config

Configuration under the [protection] section:

[protection]

  pow = true                      # Enable proof-of-work challenges
  pow_difficulty = 20             # Leading zero bits required (higher = harder)
  pow_difficulty_time = "5m"      # Challenge token TTL (time to solve)
  pow_session_ttl = "30m"         # PoW session TTL after successful challenge
  pow_cookie_name = "hexon_pow"   # Cookie name for PoW sessions
  pow_random_fields = true        # Randomize form field names per challenge
  pow_decoy_fields = 5            # Number of honeypot decoy fields
  pow_min_render_time = "200ms"   # Minimum time before submission is accepted
  pow_body_ttl = "5m"             # TTL for stored encrypted POST bodies
  pow_body_max_size = "1MB"       # Maximum POST body size to preserve

Difficulty tuning:

  Each additional bit doubles the expected computation time:
    16 bits: ~0.1s | 20 bits: ~1s | 24 bits: ~15s | 28 bits: ~4min

Anti-automation settings:

  pow_random_fields: Randomized form field names per challenge defeat bots
    that hardcode field names like "nonce" or "solution".
  pow_decoy_fields: Hidden honeypot fields that legitimate users never see.
    Bots filling all fields are detected and rejected.
  pow_min_render_time: Minimum elapsed time between challenge generation and
    submission. Prevents pre-computed or instant bot responses.

POST body preservation:

  When a POST triggers a PoW challenge, the original body is encrypted and
  stored, then replayed after the challenge is solved.

Hot-reloadable: pow_difficulty, pow_difficulty_time, pow_random_fields,

  pow_decoy_fields, pow_min_render_time, pow_body_ttl, pow_body_max_size.

Cold (restart required): pow (enable/disable), pow_cookie_name.

Troubleshooting

Common symptoms and diagnostic steps:

Challenge page not appearing:

  - Verify [protection] pow = true
  - Check if client already has a valid PoW session cookie
  - Check 'metrics pow' for challenges_issued counter

Users cannot solve the challenge (timeout):

  - Difficulty too high: reduce pow_difficulty (20 is default)
  - TTL too short: increase pow_difficulty_time
  - Client JavaScript disabled: PoW requires JavaScript execution
  - Mobile devices are slower: consider lower difficulty

Bots bypassing the challenge:

  - Enable honeypot decoys: set pow_decoy_fields > 0
  - Enable random field names: set pow_random_fields = true
  - Increase difficulty: raise pow_difficulty
  - Check timing: bots solving faster than pow_min_render_time are rejected

Timing validation rejecting legitimate users:

  - pow_min_render_time too high: lower to 200ms (default)
  - Clock skew between nodes: check NTP synchronization

Honeypot false positives:

  - Browser auto-fill may populate hidden fields on some browsers
  - Reduce pow_decoy_fields to 2-3 for fewer false positives

POST body lost after challenge:

  - Body exceeds pow_body_max_size: increase limit or reduce POST size
  - Body TTL expired: increase pow_body_ttl
  - Large file uploads: consider disabling PoW for upload routes

Relationships

Module dependencies and interactions:

Listener: Third middleware in the protection chain (after ratelimit

  and sizelimit).

Rate limiting: Runs before PoW, preventing challenge generation

  resource exhaustion from abusive clients.

Distributed storage: Challenge records and PoW sessions stored cluster-wide

  with TTL-based automatic cleanup.

Configuration: Reads [protection] section. Most settings hot-reloadable.
Admin CLI: ‘metrics pow’ shows challenges issued, solved, and failed.

Rate Limiting

Distributed token bucket rate limiting with client fingerprinting, automatic banning, and per-host isolation

Overview

The ratelimit module provides distributed rate limiting and automatic client banning across the cluster. It protects all HTTP endpoints against request flooding, brute-force attacks, and automated abuse.

Core capabilities:

Token bucket algorithm with burst support (1.5x capacity for brief spikes)
Client identification via TLS fingerprint (JA4) or IP address
Automatic banning when rate limits are exceeded
Manual ban/unban operations via admin CLI
Per-host rate limit isolation (independent limits per proxy mapping)
Per-route custom rate limits (override global setting per proxy mapping)
Cluster-wide protection via distributed storage
Atomic per-node statistics (allowed, blocked, banned counts)

Token bucket algorithm:

  - Bucket capacity is 1.5x the configured rate limit (allows brief bursts)
  - Refill rate equals limit / interval (tokens per second)
  - New clients start with a full bucket
  - Each request consumes one token
  - When bucket is empty the client is automatically banned
  - Banned clients immediately blocked without consuming resources

Runs first in the HTTP middleware chain (before sizelimit, PoW, and WAF).

Config

Configuration under the [protection] section:

[protection]

  rate_limit = "100/1m"          # Requests per interval (e.g., "100/1m", "5000/1h")
  rate_limit_type = "fingerprint"  # Client identification: "fingerprint" (JA4) or "ip"
  rate_limit_bantime = "5m"      # Ban duration when limit is exceeded

Rate limit format: “{count}/{interval}” where interval uses Go duration suffixes: s (seconds), m (minutes), h (hours).

Examples:

  "100/1m"  - 100 requests per minute (token bucket capacity: 150)
  "5/1m"    - 5 requests per minute (strict, for sensitive endpoints)
  "5000/1h" - 5000 requests per hour (generous, for API gateways)

Per-route overrides via [[proxy.mapping]]:

  disable_rate_limit = false       # Bypass rate limiting for this route
  rate_limit = "200/1m"            # Custom rate limit for this route

Per-host isolation:

  When proxy routes provide a hostname, rate limits are tracked independently.
  A client can have separate counters for different applications. Bans are
  also per-host: being banned on one app does not block other apps.

Fingerprint types:

  "fingerprint" (default, recommended):
    Uses JA4 TLS fingerprint. Identifies clients by TLS handshake
    characteristics. Resistant to IP spoofing and NAT traversal.
  "ip":
    Uses client IP address. Simpler but affected by NAT and shared IPs.

Hot-reloadable: rate_limit, rate_limit_type, rate_limit_bantime.

Troubleshooting

Common symptoms and diagnostic steps:

Legitimate users getting 429 Too Many Requests:

  - Check current rate limit: 'metrics ratelimit' shows cluster-wide stats
  - Rate limit too low: add per-route rate_limit override
  - Shared IP (NAT/office): switch rate_limit_type to "fingerprint"
  - Token bucket burst is 1.5x limit; sustained traffic above base drains it
  - Temporarily increase rate_limit or set disable_rate_limit on the route

Users banned unexpectedly:

  - Check ban status: 'ratelimit stats' shows active bans
  - Short rate_limit_bantime causes frequent ban/unban cycling
  - Per-host bans: user may be banned on one app but not others
  - Unban manually: 'ratelimit unban <fingerprint>'

Rate limiting not enforcing:

  - Verify [protection] rate_limit is not empty (empty = disabled)
  - Check if route has disable_rate_limit = true
  - Counters are per-node with eventual consistency; a few extra requests
    may slip through during cluster propagation

Ban not taking effect across cluster:

  - Bans propagate via broadcast; check cluster health
  - Verify all nodes can communicate: 'cluster status' and 'ping'
  - Ban propagation typically completes within 100ms

JA4 fingerprint issues:

  - Some clients produce identical fingerprints (e.g., same curl version)
  - Requires TLS termination at Hexon (not upstream LB)
  - Fall back to "ip" type if fingerprinting is unreliable

All state is in-memory with TTL:

  - Full cluster restart clears all counters and bans
  - No persistent state survives complete cluster outage (by design)

Relationships

Module dependencies and interactions:

Listener: First middleware in the HTTP protection chain. Runs before

  sizelimit, PoW, and WAF.

JA4 fingerprinting: TLS fingerprint extracted during TLS handshake,

  available on request context for rate_limit_type "fingerprint".

Configuration: Reads [protection] section. Hot-reloadable settings.
Distributed storage: Counters and bans stored cluster-wide with TTL.

  Bans are replicated to all nodes (typically under 100ms).

Proxy: Per-route overrides via disable_rate_limit and custom rate_limit.
Admin CLI: ‘ratelimit stats’, ‘ratelimit ban <fp>’, ‘ratelimit unban <fp>‘,

  and 'metrics ratelimit' commands.

Request Size Limiting

HTTP request body size enforcement with per-host/path exceptions and multiple matching strategies

Overview

The sizelimit module prevents abuse by enforcing maximum request body sizes on all HTTP endpoints. It supports a configurable default limit with per-host and per-path exceptions for endpoints that require larger payloads.

Core capabilities:

Default max request body size with human-readable format (e.g., “10MB”)
Per-host/path exceptions with custom size limits
Three path matching strategies: exact, wildcard (suffix /*), regex
Regex validation at init time with graceful skip on invalid patterns
Enforcement via http.MaxBytesReader (immune to faked Content-Length headers)
Automatic statistics tracking (allowed vs blocked request counts)
Routes can opt out individually via DisableSizeLimit flag

Middleware execution order in the request pipeline:

  1. Rate limiting - Block abusive clients first
  2. Size limiting - Enforce body size limits
  3. Proof-of-Work challenge
  4. Session management
  5. Handler - Process request

Size format supports: B, KB, MB, GB, TB (case-insensitive). Values are binary-based (1 KB = 1024 bytes, 1 MB = 1048576 bytes).

The module returns HTTP 413 Payload Too Large immediately when a request exceeds the applicable size limit. It wraps the request body reader so the actual bytes read are measured, not the Content-Length header value.

Config

Configuration under the [protection] section in hexon.toml:

[protection]

  max_bytes = "10MB"              # Default limit for all endpoints (empty = disabled)

Per-host/path exceptions (checked in order, first match wins)

[[protection.max_bytes_exceptions]]

  host = "upload.example.com"     # Optional: restrict to specific host
  path = "/api/upload/*"          # Path pattern (exact, wildcard, or regex)
  bytes = "100MB"                 # Custom limit for this exception

[[protection.max_bytes_exceptions]]

  path = "/bulk/*"                # All hosts, wildcard path
  bytes = "500MB"

[[protection.max_bytes_exceptions]]

  path = "^/api/v[0-9]+/upload$"  # Regex pattern
  regex = true                    # Must be set for regex matching
  bytes = "200MB"

Path matching strategies:

  1. Exact: path = "/upload" matches only /upload
  2. Wildcard: path = "/upload/*" matches /upload/file, /upload/x/y/z
  3. Regex: path = "^/pattern$" with regex = true

Exception evaluation:

  - Checked in config order (first match wins)
  - Host field is optional (empty = match all hosts)
  - Invalid regex patterns are logged as WARN and skipped at init time
  - Valid exceptions logged at INFO with match type and human-readable size

Disabling:

  - Set max_bytes = "" to disable size limiting entirely
  - Individual routes can opt out via DisableSizeLimit: true in RouteConfig

Hot-reloadable: No. Changes require restart. Init logging shows: default limit, exception count, valid/invalid breakdown.

Troubleshooting

Common symptoms and diagnostic steps:

Uploads failing with 413 Payload Too Large:

  - Check if the endpoint has an exception configured
  - Verify exception path matches: exact vs wildcard vs regex
  - Check exception order: first match wins, reorder if needed
  - Verify host field matches the request Host header (if specified)
  - Check size units: "100MB" = 104857600 bytes (binary, not decimal)

Size limit not enforced (large uploads succeeding):

  - Verify max_bytes is not empty (empty = module disabled)
  - Check if route has DisableSizeLimit: true
  - Verify size limit middleware is active in the request chain
  - Check init logs for "DISABLED via config" or "INVALID config" messages

Regex exceptions not working:

  - Check init logs for "Invalid regex in size limit exception - SKIPPED"
  - Verify regex = true is set in the exception config
  - Test regex pattern independently for validity
  - Common errors: unclosed brackets, unescaped special characters

Exception not matching expected requests:

  - Wildcard requires /* suffix: "/upload/*" not "/upload*"
  - Exact match is literal: "/upload" does not match "/upload/"
  - Host matching is exact (no wildcard support for hosts)
  - Check exception_index in init logs to verify load order

Statistics show unexpected blocked count:

  - Check 'metrics sizelimit' for allowed and blocked request counts
  - High blocked count may indicate: limit too low, missing exceptions,
    or actual abuse attempts
  - Check application logs for specific blocked requests

Module init shows INVALID config:

  - Verify size format: must be number + unit (e.g., "10MB")
  - Supported units: B, KB, MB, GB, TB (case-insensitive)
  - No spaces between number and unit
  - Must be positive value

Security

Security design and enforcement model:

Body size enforcement:

  Uses http.MaxBytesReader which wraps the request body reader at the
  transport level. This prevents attacks using:
  - Faked Content-Length headers (actual bytes read are measured)
  - Chunked transfer encoding abuse (reader counts all chunks)
  - Slow-drip attacks (reader enforces absolute byte limit)

Authorization model:

  The sizelimit module uses authorization for all operations.
  Default policy restricts size checking to the TLS listener middleware only.
  This prevents unauthorized callers from bypassing size restrictions.

Middleware ordering:

  Size limiting runs AFTER rate limiting. This ensures that abusive clients
  are blocked by rate limits before consuming resources on body reading.
  The order prevents resource exhaustion attacks where an attacker sends
  many large payloads to overwhelm the size checking logic itself.

Regex safety:

  Regex patterns are compiled once at init time. Invalid patterns are
  rejected with a warning and skipped entirely. This prevents:
  - Runtime compilation failures during request handling
  - ReDoS attacks via pathological regex patterns in config
  - Performance degradation from repeated regex compilation

Relationships

Module dependencies and interactions:

TLS listener: Primary consumer. The size limit middleware calls

  CheckRequest for every incoming HTTP request. Only authorized caller.

Rate limiting: Runs before sizelimit in the middleware chain.

  Rate limiting blocks abusive clients before size checking begins.

Proof-of-work: Runs after sizelimit. Proof-of-Work challenges are only

  issued after the request passes size validation.

config: Reads [protection] section at init time for default limit and

  exceptions. Not hot-reloadable (restart required for changes).

telemetry: Structured logging at init (config summary, exception details)

  and at runtime (blocked requests). Metrics for allowed/blocked counts.

Admin CLI: Statistics exposed via the “metrics sizelimit” admin command.

Time-Based Access Control

Timezone-aware access restrictions with day/hour windows, country matching, and CIDR bypass rules

Overview

The timeaccess module enforces time-based access restrictions on incoming requests. It evaluates whether a request is allowed based on the current day of week, hour of day, client country, and IP address.

Core capabilities:

Day-of-week filtering (Mon through Sun)
Hour-of-day filtering in HH:MM-HH:MM format (24-hour clock)
Overnight hour ranges supported (e.g., “22:00-06:00”)
Multiple time windows per country or CIDR range
IANA timezone-aware evaluation per window
CIDR-based bypass rules (skip all time checks)
Country-based window matching via geo lookup
Deny rules take precedence over allow rules within each window
Default fallback window when no country/CIDR match

Evaluation priority (first match wins):

  1. Bypass CIDR check: if client IP matches any bypass CIDR, request is allowed
  2. CIDR-based window match: most specific, checked by IP range
  3. Country-based window match: matched via geo lookup country code
  4. Default window: fallback using DefaultTimezone, DefaultAllowDays, DefaultAllowHours

Within each window, deny rules override allow rules:

  - DenyDays takes precedence over AllowDays
  - DenyHours takes precedence over AllowHours
  - Empty AllowDays list means all days are allowed

The response includes diagnostic information: which timezone was used, the current day and time in that timezone, what matched (cidr/country/default), and the reason if the request was blocked.

Config

Configuration under the [service] section in hexon.toml:

[service]

  time_enabled = true                                       # Enable time-based access control
  time_bypass_cidr = ["10.0.0.0/8", "100.64.0.0/10"]       # CIDRs that skip all time checks
  time_deny_code = 403                                      # HTTP status code for denied requests
  time_deny_message = ""                                    # Custom denial message (empty = default)

  # Default window (used when no country/CIDR window matches)
  time_default_timezone = "UTC"                             # IANA timezone for default window
  time_default_allow_days = ["Mon", "Tue", "Wed", "Thu", "Fri"]  # Allowed days
  time_default_allow_hours = "08:00-18:00"                  # Allowed hours (HH:MM-HH:MM)

Country-specific time windows

[[service.time_windows]]

  countries = ["US", "CA"]                                  # ISO 3166-1 alpha-2 country codes
  timezone = "America/New_York"                             # IANA timezone for this window
  allow_days = ["Mon", "Tue", "Wed", "Thu", "Fri"]          # Weekdays only
  allow_hours = "08:00-18:00"                               # Business hours Eastern

[[service.time_windows]]

  countries = ["GB", "DE", "FR"]
  timezone = "Europe/London"
  allow_days = ["Mon", "Tue", "Wed", "Thu", "Fri"]
  allow_hours = "09:00-17:30"                               # UK/EU business hours

CIDR-specific time windows (takes precedence over country windows)

[[service.time_windows]]

  cidr = ["192.168.100.0/24"]                               # Match by IP range
  timezone = "UTC"
  allow_days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]  # 24/7 access
  allow_hours = "00:00-23:59"

Deny rules (override allow rules within the same window)

[[service.time_windows]]

  countries = ["US"]
  timezone = "America/New_York"
  allow_days = ["Mon", "Tue", "Wed", "Thu", "Fri"]
  allow_hours = "08:00-18:00"
  deny_days = ["Wed"]                                       # Block Wednesdays (maintenance)
  deny_hours = "12:00-13:00"                                # Block lunch hour

Hour range format:

  "08:00-18:00"  - 8 AM to 6 PM
  "22:00-06:00"  - 10 PM to 6 AM (overnight, wraps around midnight)
  "00:00-23:59"  - All day (24/7)

Day names: Mon, Tue, Wed, Thu, Fri, Sat, Sun (case-sensitive, 3-letter).

Hot-reloadable: Yes. Window changes apply to new requests immediately.

Troubleshooting

Common symptoms and diagnostic steps:

Users blocked outside expected hours:

  - Check timezone configuration: IANA timezone string must be valid
  - Verify the window that matched: CheckResponse.MatchedBy shows cidr/country/default
  - Check CheckResponse.CurrentDay and CurrentTime for the evaluated timezone
  - Country code mismatch: verify geo lookup returns expected country code
  - Overnight ranges: "22:00-06:00" is valid and should wrap around midnight

Users not blocked when they should be:

  - Check bypass CIDR list: client IP may match a bypass range
  - CIDR windows take precedence over country windows
  - Verify time_enabled = true in config
  - Check deny rules: DenyDays/DenyHours must be set to override allow rules
  - Empty AllowDays means all days allowed (not no days)

Wrong timezone applied:

  - Check window matching order: CIDR first, then country, then default
  - Multiple country windows: first match wins
  - Verify IANA timezone string (e.g., "America/New_York" not "EST")
  - Invalid timezone falls back to UTC silently

Bypass not working for internal IPs:

  - Verify CIDR notation: "10.0.0.0/8" not "10.0.0.0"
  - Check time_bypass_cidr is a list, not a single string
  - Client IP must be the actual source IP (check proxy headers)
  - IPv6 addresses need proper CIDR notation

Deny rules not taking effect:

  - Deny rules only work within a matched window
  - deny_days takes precedence over allow_days in the SAME window
  - deny_hours takes precedence over allow_hours in the SAME window
  - Cannot use deny rules in the default window (use deny_days/deny_hours fields)

Metrics and diagnostics:

  - timeaccess.requests_total{status="allowed|blocked"} for traffic patterns
  - timeaccess.windows_checked{matched_by="cidr|country|default"} for match distribution
  - CheckResponse includes full diagnostic: Timezone, CurrentDay, CurrentTime,
    MatchedBy, and Reason (if blocked)

Relationships

Module dependencies and interactions:

Geo access: Provides country code for each client IP via geo lookup.

  The country code is passed in CheckRequest.Country field. Without geo
  module, only CIDR-based and default windows are evaluated.

TLS listener: Invokes time access checks as part of the protection

  middleware chain. Passes client IP and geo-resolved country.

config: Reads [service] section for time windows, bypass CIDRs, default

  timezone, and deny code. Hot-reloadable for window changes.

telemetry: Metrics for allowed/blocked counts and window match distribution.

  Structured logging for blocked requests with reason and timezone context.

Rate limiting: Complementary protection. Rate limiting handles

  request volume; timeaccess handles temporal access policy.

Directory: Indirect relationship. User group membership determines

  which proxy mappings a user can access; timeaccess adds temporal constraints
  on top of identity-based access control.

Web Application Firewall

Coraza WAF v3 with embedded OWASP Core Rule Set for HTTP request/response inspection

Overview

The WAF module provides Web Application Firewall protection using Coraza WAF v3 with the embedded OWASP Core Rule Set (CRS). It inspects HTTP requests and responses against security rules to detect and block application-layer attacks.

Core capabilities:

SQL injection detection and blocking (95% coverage at paranoia level 1)
Cross-site scripting (XSS) detection (90% coverage)
Path traversal, command injection, SSRF, LFI/RFI, XXE detection
Scanner and bot detection (nikto, sqlmap, nmap, etc.)
Two blocking modes: anomaly scoring (recommended) and self-contained
Four OWASP paranoia levels (1=basic to 4=maximum security)
Detection-only mode for safe deployment and tuning
Per-route WAF bypass via context keys
Custom rules via TOML configuration or .conf files
Request body inspection with configurable size limits
Optional response body inspection (disabled by default for performance)
User-friendly block pages with correlation ID for incident tracking
All logging and metrics via telemetry module (no separate WAF log files)

Architecture:

WAF Engine: Single shared instance initialized once with embedded CRS
Middleware: HTTP middleware for request/response inspection pipeline
Per-Route Control: Per-route WAF bypass via configuration
CRS Rules: Embedded in binary via git submodule (no external dependencies)
Rules Location: bundled CRS rules directory

WAF inspection pipeline (HTTP middleware):

  1. Check if WAF disabled for route via per-mapping configuration, bypass if disabled
  2. Create Coraza transaction with correlation ID
  3. Phase 1: Inspect URI, method, protocol, headers, query parameters
  4. Check for rule interruption, block if triggered
  5. Phase 2: Inspect request body (if enabled and body present)
  6. Check for rule interruption, block if triggered
  7. Request passed: continue to backend handler
  8. Record metrics and log transaction details via telemetry

Important limitation: per-route paranoia levels are NOT supported in Coraza v3. The paranoia level is set globally during WAF initialization and applies to all routes uniformly. Use per-route WAF bypass if certain routes need no protection.

Config

Configuration under [waf] section:

[waf]

  enabled = true                       # Enable WAF protection
  paranoia = 1                         # OWASP paranoia level (1-4)
  detection_only = false               # true = log only, false = block requests
  self_contained = false               # false = anomaly scoring (recommended), true = immediate block
  max_body_size = "1MB"                # Maximum request body to inspect
  inspect_body = true                  # Inspect POST/PUT request bodies
  inspect_response = false             # Inspect response bodies (performance impact)

  # Rule exclusions (for tuning false positives)
  disabled_rules = [942100]            # Disable specific OWASP CRS rule IDs
  disabled_tags = ["attack-sqli"]      # Disable all rules with specific tags

Custom rules (operator-defined, use IDs 10000+ to avoid CRS conflicts)

[[waf.custom_rule]]

  id = 10001                           # Rule ID (10000+ recommended)
  name = "Block Security Scanners"     # Human-readable rule name
  severity = "CRITICAL"                # CRITICAL, WARNING, NOTICE, etc.
  phase = 1                            # 1=headers, 2=body, 3=resp headers, 4=resp body
  variable = "REQUEST_HEADERS:User-Agent"  # Variable to inspect
  operator = "rx"                      # rx=regex, eq=equals, contains=contains
  pattern = "(?i:sqlmap|nikto|nmap)"   # Match pattern
  transform = ["lowercase"]            # Transformations before matching
  action = "deny"                      # deny, redirect, log
  status = 403                         # HTTP status code for deny action
  message = "Security scanner detected"  # Log message on match
  tags = ["hexon-custom", "scanner-detection"]  # Rule tags

Paranoia levels control rule sensitivity:

  Level 1 (default): Basic protection, minimal false positives
  Level 2: Increased security, moderate false positives
  Level 3: High security, higher false positives (needs tuning)
  Level 4: Maximum security, highest false positives (extensive tuning required)

Blocking modes:

  Anomaly scoring (self_contained = false, recommended):
    Multiple rules contribute to an anomaly score. Blocks only if total score
    exceeds threshold (default: 5). Fewer false positives, industry standard.
  Self-contained (self_contained = true):
    Each matched rule blocks immediately. More false positives but simpler to
    debug. Good for high-security environments.

Hot-reloadable: disabled_rules, disabled_tags, detection_only, custom rules. Cold (restart required): enabled, paranoia, self_contained, max_body_size.

Troubleshooting

Common symptoms and diagnostic steps:

WAF not loading or initializing:

  - Check CRS rules exist in the binary (embedded via git submodule)
  - Look for "waf.init" in application logs for initialization errors
  - Verify [waf] enabled = true in configuration
  - Check for Coraza initialization errors in startup logs

Rules not matching expected attack payloads:

  - Enable trace-level logging: [telemetry] level = "trace"
  - Check waf.pass and waf.block events in logs for inspection details
  - Verify paranoia level is sufficient for the attack type
  - Test with known payloads: curl "http://host/api?id=1' OR '1'='1"
  - Check if rule ID is in disabled_rules list

False positives blocking legitimate traffic:

  - Identify triggering rule ID from waf.block log event (rule_id field)
  - Temporarily add rule to disabled_rules list for immediate relief
  - Switch to detection_only = true for non-blocking investigation
  - Consider lowering paranoia level if too many false positives
  - Use per-route WAF bypass for endpoints that trigger false positives
  - For anomaly scoring: check if multiple low-score rules accumulate

WAF bypass not working for specific routes:

  - Verify WAF bypass is configured on the proxy mapping
  - Check configuration propagation: per-route WAF bypass must be set in mapping config
  - Look for waf.bypass events in debug logs (event with path field)
  - Ensure WAF middleware wraps the correct handler chain

Performance degradation with WAF enabled:

  - Expected overhead: headers-only +100-200us, body 1KB +500us-1ms, body 100KB +5-10ms
  - Reduce paranoia level (fewer rules evaluated)
  - Disable body inspection for large upload endpoints (inspect_body = false)
  - Lower max_body_size to skip inspection of large payloads
  - Disable response inspection if enabled (inspect_response = false)
  - Bypass WAF for high-throughput internal endpoints (metrics, health)
  - Check waf.duration_ms histogram for actual inspection times

Blocked requests missing correlation ID:

  - Verify correlation ID middleware runs before WAF middleware
  - Check correlation_id field in waf.block log events
  - Block pages should display correlation ID for user to report

Custom rules not taking effect:

  - Verify rule ID does not conflict with CRS rules (use 10000+)
  - Check rule syntax: variable, operator, pattern must be valid
  - Verify phase is correct for the data being inspected
  - Look for rule loading errors in initialization logs

Recommended deployment process:

  Week 1: Enable with detection_only = true, paranoia = 1 (monitor logs)
  Week 2: Tune false positives with disabled_rules, test attack payloads
  Week 3: Switch to detection_only = false (blocking mode)
  Week 4+: Gradually increase paranoia level, repeat tuning cycle

Security

Security coverage and protection details:

OWASP CRS coverage at paranoia level 1:

  SQL Injection: 95% detection rate
  Cross-Site Scripting (XSS): 90% detection rate
  Path Traversal: 95% detection rate
  Command Injection: 85% detection rate
  Server-Side Request Forgery (SSRF): 80% detection rate
  Local/Remote File Inclusion (LFI/RFI): 90% detection rate
  XML External Entity (XXE): 85% detection rate
  Protocol Attacks: 90% detection rate
  Scanner Detection: 95% detection rate
  Bot Detection: 80% detection rate

Higher paranoia levels increase coverage but require tuning to manage false positives. Custom rules provide additional Hexon-specific coverage.

Anomaly scoring provides defense-in-depth: a single indicator may not block, but multiple suspicious indicators in the same request will trigger blocking. This significantly reduces false positives compared to self-contained mode while maintaining strong detection of actual attacks.

Request body inspection limits:

  Bodies exceeding max_body_size are blocked with waf.body_too_large metric.
  This prevents memory exhaustion from oversized payloads while ensuring
  attack payloads in request bodies are inspected up to the configured limit.

Correlation ID tracking:

  Every blocked request includes a correlation ID in the block page.
  Users can report this ID for incident investigation.
  Correlation IDs link WAF events to upstream request tracing.

Limitations to be aware of:

  - HTTP-only protection (does not inspect TCP/UDP/VPN traffic)
  - CRS rules embedded at compile time (updates require recompilation)
  - Detection-only mode has same performance overhead as blocking mode
  - No separate WAF audit log (all logging via telemetry to stdout)
  - Per-route paranoia levels not supported (Coraza v3 limitation)

Relationships

Module dependencies and interactions:

TLS listener: Provides correlation IDs for request tracking.

  Correlation ID middleware must run before WAF middleware.
  Correlation IDs appear in all WAF log events and block pages.

Configuration system: WAF configuration from [waf] section.

  Config changes for disabled_rules and detection_only are hot-reloadable.
  Paranoia level and enabled state require restart.

Metrics subsystem: Exports counters (waf.requests, waf.blocked, waf.passed,

  waf.bypassed, waf.body_too_large) and histograms (waf.duration_ms).
  Labels include method, path, blocked, rule_id, action.

telemetry: Structured logging for all WAF events at appropriate levels.

  WARN for blocks, TRACE for passes, DEBUG for bypasses.
  No separate WAF log file; all events flow through telemetry.

Error page service: Provides user-friendly error/block pages with correlation ID.

  Block pages shown to users when requests are denied by WAF rules.

proxy: WAF middleware wraps the reverse proxy handler chain.

  Per-route WAF bypass configured via proxy mapping context.
  WAF inspects proxied requests before they reach backend servers.

Rate limiting: Complementary protection layer.

  Rate limiting operates at connection level, WAF at application level.
  Both modules contribute to overall request protection pipeline.

Size limiting: Body size limits complement WAF max_body_size.

  Size limiting may reject oversized requests before WAF inspection.