Skip to main content

Agent Health Monitoring

Monitor and troubleshoot CastellanAI agents across your infrastructure.

Understanding Agent Status

Each agent reports its status to the portal every 30 seconds via heartbeat. The portal displays real-time health information including connection status and event collection activity.

Connected (Green)

Agent is connected and actively sending security events. All systems operational.

Indicators:

  • Green status dot (pulsing animation)
  • Last seen within the last few minutes
  • Events being collected and transmitted

Disconnected (Gray)

Agent has lost connection with the portal. May require attention.

Common Causes:

  • Network connectivity issues
  • Agent service stopped
  • System restart or shutdown
  • Firewall blocking outbound connections

Pending (Yellow)

Agent is newly enrolled and waiting for initial connection, or is in the process of connecting.

Indicators:

  • Yellow status dot
  • Agent recently enrolled
  • Awaiting first heartbeat

Error (Red)

Agent is experiencing errors that prevent normal operation. Requires immediate attention.

Common Causes:

  • Authentication failures
  • Configuration issues
  • Service crash or critical error

Updating (Blue)

Agent is currently receiving or applying updates.

Indicators:

  • Blue status dot
  • Agent version change in progress

Monitoring Agent Health

Portal Agents Page

The primary method for monitoring agent health is through the Customer Portal:

  1. Log in to your Customer Portal
  2. Click Agents in the header navigation
  3. View the agent list with status indicators
  4. Review: Hostname, Platform, Status, Last Seen, Events Today
tip

The status summary cards at the top show counts for each status type (Connected, Disconnected, Pending, etc.)

Local Agent Logs

Check agent logs directly on the endpoint for detailed diagnostic information:

Windows:

Event Viewer → Applications and Services Logs → CastellanAgent

# Or check log files:
C:\ProgramData\CastellanAI\logs\agent.log

Linux:

sudo journalctl -u castellan-agent -n 50

# Or check log files:
sudo tail -f /var/log/castellan-agent/agent.log

macOS:

sudo log show --predicate 'process == "CastellanAgent"' --last 1h

# Or check log files:
sudo tail -f /Library/Logs/CastellanAI/agent.log

Check Service Status

Verify the agent service is running on the endpoint:

Windows:

Get-Service "Castellan Agent"
# Should show Status: Running

Linux:

sudo systemctl status castellan-agent
# Should show Active: active (running)

macOS:

sudo launchctl list | grep com.castellanai.agent
# Should show PID if running

Common Issues & Solutions

Agent shows Disconnected but service is running

Check network connectivity to the CastellanAI servers. Verify firewall allows outbound HTTPS (port 443).

Windows:

Test-NetConnection -ComputerName api.castellanai.com -Port 443

Linux/macOS:

curl -v https://api.castellanai.com

Agent stuck in Pending status

The agent may not have completed enrollment successfully.

  1. Check agent logs for enrollment errors
  2. Verify the enrollment token is valid and not expired
  3. Re-enroll if necessary:
# Check current status
castellan-agent status

# Re-enroll with fresh token
castellan-agent enroll --token YOUR_TOKEN --portal-url https://api.castellanai.com --force

Authentication errors in logs

Agent credentials may be invalid or corrupted.

# Check enrollment status
castellan-agent status

# Re-enroll with new token from Portal
castellan-agent enroll --token YOUR_NEW_TOKEN --portal-url https://api.castellanai.com --force

# Restart service to apply
# Windows:
Restart-Service "Castellan Agent"

# Linux:
sudo systemctl restart castellan-agent

Events not appearing in portal

Verify event sources are configured correctly and agent has permissions to access logs.

  1. Check agent logs for "permission denied" errors
  2. Verify agent service is running with appropriate privileges
  3. Test event collection with a known security event

Agent service won't start

Check for configuration file errors or missing dependencies.

  1. Review agent logs for startup errors
  2. Verify configuration file syntax is correct
  3. Ensure all required dependencies are installed
  4. Try running agent manually to see error messages:
# Run in foreground for debugging
castellan-agent run

Agent Health Best Practices

  • Monitor Status Regularly - Check the Agents page periodically to catch issues early
  • Keep Agents Updated - Apply agent updates regularly for performance improvements and bug fixes
  • Review Logs Periodically - Check agent logs to catch warnings before they become critical
  • Ensure Network Access - Verify agents can reach CastellanAI servers through any firewalls or proxies
  • Document Configurations - Maintain records of agent deployments for troubleshooting

Agent CLI Commands

Use these commands directly on the endpoint for troubleshooting:

CommandDescription
castellan-agent statusShow enrollment status and agent information
castellan-agent versionShow agent version, platform, and runtime
castellan-agent runStart agent in foreground (for debugging)
castellan-agent enroll --token TOKEN --portal-url URLEnroll or re-enroll the agent
castellan-agent unenrollRemove enrollment and clear credentials

What's Next?

Need Help?

If you're experiencing persistent agent health issues, contact support for assistance.

Contact Support