Skip to main content

Troubleshooting Guide

This guide provides step-by-step instructions for troubleshooting common issues with the Xshield Gatekeeper.

1. Internet Connectivity Issues

Symptoms

  • Gatekeeper cannot reach external services
  • Unable to connect to ng.colortokens.com or other required endpoints

Verification Steps

  1. Check Firewall Rules

    • Review deployment-checklist.md for required outbound rules
    • Verify firewall allows traffic to:
      • ng.colortokens.com
      • logs.ng.colortokens.com
      • artifacts.ng.colortokens.com
      • telemetry.ng.colortokens.com
      • apt-ng.colortokens.com
      • registration.colortokens.com
  2. Verify DNS Configuration

    • Check DNS resolution using nslookup or dig
    • Verify DNS/hosts entries for:
      • ng.colortokens.com
      • logs.ng.colortokens.com
    • For on-prem installations:
      • Check private DNS entries
      • Verify /etc/hosts file entries

2. DHCP Issues

Symptoms

  • Devices not receiving IP addresses
  • DHCP service not responding

Verification Steps

  1. Check DHCP Service Status

    systemctl status ctcoredhcp
  2. Verify Configuration

    • Check DHCP relay vs server mode
    • Verify DHCP range configuration
    • Ensure no other DHCP servers on the network
  3. Additional Checks for DHCP Relay

    systemctl status coredhcp
    • Verify upstream DHCP server is running
    • Check network connectivity to upstream DHCP server

3. Gatekeeper Status Issues

Symptoms

  • Gatekeeper reported as down
  • Unresponsive in Xshield console

Verification Steps

  1. Physical Checks

    • Verify power supply
    • Check network connectivity
    • Ensure machine is powered on
  2. System Status

    systemctl status ctappliance
    • Check Xshield console status
    • Verify internet connectivity
    • Check cloud site availability
    • Verify on-premise site connectivity

4. Network Traffic Issues

Symptoms

  • Devices cannot reach internet/intranet
  • Traffic not flowing through gatekeeper

Verification Steps

  1. Basic Connectivity

    • Ping gatekeeper from affected devices
    • Use traceroute to identify traffic drop points
  2. Network Verification

    tcpdump -i lan_interface
    • Verify traffic on gatekeeper's LAN interface
    • Check for ARP entries
    • Verify correct gateway configuration

5. Traffic Flow Recording Issues

Symptoms

  • No traffic flows recorded
  • Conntrack not registering traffic

Verification Steps

  1. Check Conntrack

    conntrack -L
    • Verify traffic registration
    • Check for appropriate rules in:
      nft list ruleset
  2. Network Verification

    tcpdump -i lan_interface
    • Verify traffic visibility on LAN interface
    • Check device gateway configuration
    • Verify netmask is /32 or appropriate for environment

6. Device Discovery Issues

Symptoms

  • Devices not appearing in gatekeeper
  • Devices remain unmanaged

Verification Steps

  1. Check ARP Table

    arp -a
  2. Verify Asset Cache

    • Check /etc/colortokens/asset-cache/asset-cache*.json
    • Verify device status (managed/unmanaged)
  3. Additional Steps

    • Verify device is sending/receiving traffic
    • Check for traffic registration in conntrack
    • Ping device from outside subnet to create traffic

7. Remote Access Issues

Symptoms

  • Cannot SSH into gatekeeper
  • No console access

Best Practices

  1. Management LAN Connection

    • Always connect gatekeeper to management LAN
    • Configure BMC/ILoM/iDRAC
    • Ensure management LAN has IP connectivity
  2. Remote Access Options

    • BMC/ILoM/iDRAC console
    • Management LAN SSH access
    • Note: GK-2000 boxes may not have console port

8. Additional Troubleshooting Tips

HA Cluster Issues

  • Verify VRRP ID consistency across cluster
  • Check peer IP configuration
  • Verify virtual IP assignment

Network Configuration

  • Verify VLAN trunking on switch ports
  • Check network segmentation
  • Verify correct subnet configuration

Logs and Diagnostics

  • Collect diagnostics from Xshield console
  • Check system logs
  • Verify application logs

Best Practices

  1. Documentation

    • Maintain network documentation
    • Document VLAN assignments
    • Keep firewall rules documented
  2. Monitoring

    • Set up monitoring alerts
    • Regularly check system status
    • Verify network connectivity

9. Troubleshooting Checklist

  1. Basic Connectivity

    • Verify power
    • Check network cables
    • Verify switch connectivity
  2. Network Configuration

    • Check VLAN settings
    • Verify IP configurations
    • Check firewall rules
  3. Services

    • Verify DHCP service
    • Check DNS resolution
    • Verify conntrack
  4. System Status

    • Check system logs
    • Verify service status
    • Check resource usage

10. Emergency Recovery Procedures

  1. Physical Access

    • Power cycle if necessary
    • Verify hardware connections
    • Check system LEDs
  2. Remote Recovery

    • Use BMC/ILoM/iDRAC
    • Follow vendor-specific recovery procedures
    • Document recovery steps

11. Documentation References

  • Refer to deployment-checklist.md for network requirements
  • Check deployment-guide.md for configuration details
  • Review gatekeeper-operation-maintenance.md for operational procedures

12. Support Information

When contacting support:

  1. Provide system logs
  2. Include diagnostic bundle
  3. Document troubleshooting steps taken
  4. Provide network topology information
  5. Include error messages and symptoms

13. HA Cluster Issues

Symptoms

  • Both gatekeepers showing as Standby
  • No active node in cluster

Verification Steps

  1. Check Network Connectivity

    ping <peer_gk_ip>
    • Verify GK devices can ping each other
    • Check multicast packet flow
    • Verify no firewall blocking VRRP traffic
  2. Verify VRRP Configuration

    • Check VRRP ID consistency
    • Verify same WAN VIP and LAN VIP
    • Confirm identical WAN gateway IP
    • Ensure VRRP ID is unique across tenant
    • Verify VRRP password match
  3. Check VRRP Traffic

    tcpdump -i <interface> vrrp
    • Verify VRRP packets are visible
    • Check packet frequency and integrity
  4. Review Configuration

    cat /root/ctgatekeeper.json
    • Verify IP configurations
    • Check VRRP settings
    • Confirm network interface configurations

Best Practices

  • Document VRRP ID allocation
  • Regularly verify network connectivity
  • Monitor VRRP traffic
  • Keep VRRP passwords secure

14. Agent Alerts

Agent Unreachable Alert

Symptoms

  • Agent unreachable alert in console
  • Both HA nodes down
  • Standalone GK down

Verification Steps

  1. Check Physical Status

    • Verify power supply
    • Check network cables
    • Verify switch connectivity
  2. Check Network Status

    ping <gk_ip>
    • Verify network connectivity
    • Check for network outages
    • Verify firewall rules

Agent Failover Alert

Symptoms

  • Agent failover alert in console
  • Interface down
  • Connectivity loss

Verification Steps

  1. Check System Logs

    cat /var/log/syslog
    • Look for interface down events
    • Check for connectivity issues
    • Verify error messages
  2. Collect Diagnostics

    not collect diagnostics
    • Collect syslog
    • Gather interface status
    • Check network configuration

Common Alert Scenarios

  1. Network Connectivity Issues

    • Interface failures
    • Network outages
    • Firewall blocks
  2. Hardware Issues

    • Power failures
    • Hardware failures
    • Network card issues
  3. Configuration Issues

    • Incorrect IP settings
    • Invalid VRRP configurations
    • Network misconfigurations

Alert Resolution Steps

  1. Network Connectivity

    • Check physical connections
    • Verify network configuration
    • Check firewall rules
  2. Hardware Issues

    • Verify power supply
    • Check hardware status
    • Replace faulty components
  3. Configuration Issues

    • Verify IP configurations
    • Check VRRP settings
    • Validate network settings

15. Additional Monitoring and Prevention

Best Practices

  1. Monitoring

    • Set up network monitoring
    • Configure alert thresholds
    • Regularly check system logs
  2. Documentation

    • Document VRRP configurations
    • Keep network diagrams updated
    • Maintain configuration backups
  3. Regular Maintenance

    • Check system health
    • Verify network connectivity
    • Test failover procedures

16. Support Information

When contacting support for HA cluster issues:

  1. Provide system logs
  2. Include VRRP configuration
  3. Document network topology
  4. Share tcpdump captures
  5. Include error messages and symptoms
  6. Provide ctgatekeeper.json contents

17. Diagnostics Collection

Using Xshield Console

  1. Select Gatekeeper in UI
  2. Choose "Collect Diagnostics"
    • This process may take several minutes
  3. Once complete:
    • View diagnostics in UI
    • Download diagnostics zip file
    • Send zip file to support

Best Practices

  • Always use UI for diagnostics collection
  • Wait for complete diagnostics collection
  • Download and verify diagnostics zip before sending

18. Gatekeeper Operations

Service Management

  1. Reboot Options

    • Use UI for planned reboots
    • Console access for emergency reboots
  2. Service Restart

    # Restart Gatekeeper
    systemctl restart ctappliance

    # Restart HA Services
    systemctl restart ctgatekeeper-ha

    # Restart DHCP
    systemctl restart ctcoredhcp

Device Management

  1. Device Scan
    • Initiate from Xshield Console
    • Re-identifies managed devices
    • Updates device information
    • Reassesses vulnerabilities
    • Collects firmware version info

Best Practices

  • Use UI for most operations
  • Reserve console access for:
    • WAN settings changes
    • Proxy configuration
    • Initial registration
    • Emergency situations

19. Gatekeeper Uninstallation

Using Xshield Console

  1. Select Gatekeeper
  2. Choose "Decommission" option
  3. Confirm decommissioning
  4. Wait for completion
  5. Unplug from network

Console-Based Uninstallation

# Only as last resort
ssh <gk_ip>
ctgatekeeper decommission

Best Practices

  • Always use UI for decommissioning
  • Use console only for:
    • WAN settings changes
    • Proxy configuration
    • Initial registration
    • Emergency situations

20. Support Information

Required Information

  1. System Logs
  2. VRRP Configuration
  3. Network Topology
  4. Tcpdump Captures
  5. Error Messages
  6. ctgatekeeper.json Contents
  7. Diagnostics Zip File

Best Practices

  • Include all requested information
  • Document troubleshooting steps taken
  • Provide network diagrams
  • Share configuration backups

21. Support Workflow

Initial Contact

  1. Contact ColorTokens Support

  2. Provide Initial Information

    • Customer details
    • Issue description
    • Impact level
    • System logs
    • Diagnostics zip file

Support Response Levels

Level 1 Support

  • Initial issue triage
  • Basic troubleshooting
  • Configuration verification
  • Common issue resolution

Level 2 Support

  • Advanced troubleshooting
  • System analysis
  • Configuration review
  • Escalation for complex issues

Product Engineering

  • Root cause analysis
  • Bug identification
  • Feature requests
  • Emergency fixes

Escalation Criteria

  1. Critical Issues

    • System downtime
    • Security breaches
    • Data loss
    • Major functionality failure
  2. High Priority Issues

    • Significant performance impact
    • Multiple users affected
    • Business-critical operations
  3. Standard Issues

    • Configuration questions
    • Minor functionality issues
    • Documentation requests

Support Process Flow

  1. Issue Reporting

    • Open support ticket
    • Provide required information
    • Attach diagnostics
  2. Initial Analysis

    • Basic troubleshooting
    • Configuration review
    • Log analysis
  3. Escalation Points

    • If issue unresolved
    • If impact critical
    • If additional expertise needed
  4. Resolution Steps

    • Provide fix/workaround
    • Document solution
    • Close ticket

Best Practices for Support

  1. Before Contacting Support

    • Follow troubleshooting guide
    • Collect diagnostics
    • Document steps taken
    • Gather system information
  2. During Support Interaction

    • Provide clear issue description
    • Share all relevant information
    • Follow support team instructions
    • Document all interactions
  3. After Resolution

    • Verify fix
    • Document solution
    • Update procedures
    • Close ticket

Support Response Times

  1. Critical Issues

    • Response: Within 1 hour
    • Resolution: Within 4 hours
  2. High Priority Issues

    • Response: Within 4 hours
    • Resolution: Within 24 hours
  3. Standard Issues

    • Response: Within 24 hours
    • Resolution: Within 5 business days

Additional Support Resources

  1. Knowledge Base

    • Documentation
    • Troubleshooting guides
    • Best practices
  2. Community Forums

    • User discussions
    • Best practices sharing
    • Common solutions
  3. Training Resources

    • Webinars
    • Training materials
    • Best practices guides

Last Updated: [Date] Version: 1.3