Agent is Offline

Question

The Xshield agent is a low compute software that is running on customer servers and endpoints. Post registration of the agent, It periodically sends a HeartBeat message to the Xshield Platform (by default every two minutes). The Xshield platform will mark an agent as offline and unreachable if four HeartBeat messages are missed consecutively. It is very critical for the Xshield agent to be online on the Xshield platform as the request contains visibility information and the response for these messages carry information with regards to applying security policies.

It is important for an operator to understand why an agent is offline and address it at the earliest as this is critical in microsegmentation and Zero Trust environments. Here are the primary reasons the Xshield agent could go offline or lose connectivity with the Xshield security platform.

Answer

🔌 1. Network Segmentation or Policy Misconfiguration

Overly restrictive microsegmentation policies can block agent traffic to the SaaS control plane (e.g., HTTPS to API endpoints).
ACLs, firewalls, or NSGs might block required ports (e.g., 443, 8443) or destination IPs/FQDNs.
East-west traffic rules may inadvertently isolate agents from required gateways or proxies.

🌐 2. DNS Resolution Issues

SaaS platforms often use FQDNs (e.g., api.vendor.com)—if DNS isn’t working or is restricted, agents can’t resolve their endpoint.
Split-horizon DNS or misconfigured resolvers can return incorrect IPs or fail silently.

🔒 3. TLS/SSL or Certificate Problems

Certificate pinning, expired certs, or mismatched CA chains can prevent TLS handshakes.
Middleboxes (SSL interceptors or proxies) can interfere with certificate trust validation.

🧱 4. Proxy or Egress Configuration

Misconfigured HTTP/S proxies (e.g., authentication required, incorrect allow list) can prevent outbound traffic.
Egress controls may not allow the agent to reach the public internet/SaaS endpoint.
No proxy exceptions set for agent traffic.

🚫 5. Endpoint Security Interference

Host-based firewalls (e.g., Windows Defender Firewall, iptables) may block outbound connections.
EDR/AV solutions might flag the agent as suspicious and quarantine it or block its traffic.

🖥️ 6. Resource Constraints or System Issues

High CPU/memory pressure may cause the agent to crash or timeout.
Disk full, corrupted binaries, or misconfigured startup scripts can prevent agent service from running.
OS patching or reboots might disable or delay service startup.

🔁 7. Version Mismatch or Agent Corruption

Incompatible agent versions may stop communicating properly if the control plane expects a newer protocol/API.
Broken or incomplete agent installs can cause connection failures or erratic behavior.

☁️ 8. SaaS Platform Outage or Endpoint Change

SaaS backend downtime, region-specific outages, or DNS changes might impact connectivity.
Changed endpoints or decommissioned URLs not reflected in agent config.

🧩 9. Tenant Misconfiguration

Tenant ID, auth tokens, or API keys used by the agent might be incorrect, expired, or revoked.
Misconfigured onboarding or bootstrap process can leave the agent in a limbo state.

🧑‍💻 10. Human Error or Operational Gaps

Manual misconfigurations like:
- Wrong bootstrap URLs.
- Overridden environment variables.
- Accidental firewall or policy change during maintenance.
- Forgotten agent restart after OS or network changes.

Question​

Answer​

🔌 1. Network Segmentation or Policy Misconfiguration​

🌐 2. DNS Resolution Issues​

🔒 3. TLS/SSL or Certificate Problems​

🧱 4. Proxy or Egress Configuration​

🚫 5. Endpoint Security Interference​

🖥️ 6. Resource Constraints or System Issues​

🔁 7. Version Mismatch or Agent Corruption​

☁️ 8. SaaS Platform Outage or Endpoint Change​

🧩 9. Tenant Misconfiguration​

🧑‍💻 10. Human Error or Operational Gaps​