Agent is Offline
Questionβ
The Xshield agent is a low compute software that is running on customer servers and endpoints. Post registration of the agent, It periodically sends a HeartBeat message to the Xshield Platform (by default every two minutes). The Xshield platform will mark an agent as offline and unreachable if four HeartBeat messages are missed consecutively. It is very critical for the Xshield agent to be online on the Xshield platform as the request contains visibility information and the response for these messages carry information with regards to applying security policies.
It is important for an operator to understand why an agent is offline and address it at the earliest as this is critical in microsegmentation and Zero Trust environments. Here are the primary reasons the Xshield agent could go offline or lose connectivity with the Xshield security platform.
Answerβ
π 1. Network Segmentation or Policy Misconfigurationβ
- Overly restrictive microsegmentation policies can block agent traffic to the SaaS control plane (e.g., HTTPS to API endpoints).
- ACLs, firewalls, or NSGs might block required ports (e.g., 443, 8443) or destination IPs/FQDNs.
- East-west traffic rules may inadvertently isolate agents from required gateways or proxies.
π 2. DNS Resolution Issuesβ
- SaaS platforms often use FQDNs (e.g., api.vendor.com)βif DNS isnβt working or is restricted, agents canβt resolve their endpoint.
- Split-horizon DNS or misconfigured resolvers can return incorrect IPs or fail silently.
π 3. TLS/SSL or Certificate Problemsβ
- Certificate pinning, expired certs, or mismatched CA chains can prevent TLS handshakes.
- Middleboxes (SSL interceptors or proxies) can interfere with certificate trust validation.
π§± 4. Proxy or Egress Configurationβ
- Misconfigured HTTP/S proxies (e.g., authentication required, incorrect allow list) can prevent outbound traffic.
- Egress controls may not allow the agent to reach the public internet/SaaS endpoint.
- No proxy exceptions set for agent traffic.
π« 5. Endpoint Security Interferenceβ
- Host-based firewalls (e.g., Windows Defender Firewall, iptables) may block outbound connections.
- EDR/AV solutions might flag the agent as suspicious and quarantine it or block its traffic.
π₯οΈ 6. Resource Constraints or System Issuesβ
- High CPU/memory pressure may cause the agent to crash or timeout.
- Disk full, corrupted binaries, or misconfigured startup scripts can prevent agent service from running.
- OS patching or reboots might disable or delay service startup.
π 7. Version Mismatch or Agent Corruptionβ
- Incompatible agent versions may stop communicating properly if the control plane expects a newer protocol/API.
- Broken or incomplete agent installs can cause connection failures or erratic behavior.
βοΈ 8. SaaS Platform Outage or Endpoint Changeβ
- SaaS backend downtime, region-specific outages, or DNS changes might impact connectivity.
- Changed endpoints or decommissioned URLs not reflected in agent config.
π§© 9. Tenant Misconfigurationβ
- Tenant ID, auth tokens, or API keys used by the agent might be incorrect, expired, or revoked.
- Misconfigured onboarding or bootstrap process can leave the agent in a limbo state.
π§βπ» 10. Human Error or Operational Gapsβ
- Manual misconfigurations like:
- Wrong bootstrap URLs.
- Overridden environment variables.
- Accidental firewall or policy change during maintenance.
- Forgotten agent restart after OS or network changes.