Agent Health Monitoring
Monitor and troubleshoot CastellanAI agents across your infrastructure.
Understanding Agent Status
Each agent reports its status to the portal every 30 seconds via heartbeat. The portal displays real-time health information including connection status and event collection activity.
Connected (Green)
Agent is connected and actively sending security events. All systems operational.
Indicators:
- Green status dot (pulsing animation)
- Last seen within the last few minutes
- Events being collected and transmitted
Disconnected (Gray)
Agent has lost connection with the portal. May require attention.
Common Causes:
- Network connectivity issues
- Agent service stopped
- System restart or shutdown
- Firewall blocking outbound connections
Pending (Yellow)
Agent is newly enrolled and waiting for initial connection, or is in the process of connecting.
Indicators:
- Yellow status dot
- Agent recently enrolled
- Awaiting first heartbeat
Error (Red)
Agent is experiencing errors that prevent normal operation. Requires immediate attention.
Common Causes:
- Authentication failures
- Configuration issues
- Service crash or critical error
Updating (Blue)
Agent is currently receiving or applying updates.
Indicators:
- Blue status dot
- Agent version change in progress
Monitoring Agent Health
Portal Agents Page
The primary method for monitoring agent health is through the Customer Portal:
- Log in to your Customer Portal
- Click Agents in the header navigation
- View the agent list with status indicators
- Review: Hostname, Platform, Status, Last Seen, Events Today
The status summary cards at the top show counts for each status type (Connected, Disconnected, Pending, etc.)
Local Agent Logs
Check agent logs directly on the endpoint for detailed diagnostic information:
Windows:
Event Viewer → Applications and Services Logs → CastellanAgent
# Or check log files:
C:\ProgramData\CastellanAI\logs\agent.log
Linux:
sudo journalctl -u castellan-agent -n 50
# Or check log files:
sudo tail -f /var/log/castellan-agent/agent.log
macOS:
sudo log show --predicate 'process == "CastellanAgent"' --last 1h
# Or check log files:
sudo tail -f /Library/Logs/CastellanAI/agent.log
Check Service Status
Verify the agent service is running on the endpoint:
Windows:
Get-Service "Castellan Agent"
# Should show Status: Running
Linux:
sudo systemctl status castellan-agent
# Should show Active: active (running)
macOS:
sudo launchctl list | grep com.castellanai.agent
# Should show PID if running
Common Issues & Solutions
Agent shows Disconnected but service is running
Check network connectivity to the CastellanAI servers. Verify firewall allows outbound HTTPS (port 443).
Windows:
Test-NetConnection -ComputerName api.castellanai.com -Port 443
Linux/macOS:
curl -v https://api.castellanai.com
Agent stuck in Pending status
The agent may not have completed enrollment successfully.
- Check agent logs for enrollment errors
- Verify the enrollment token is valid and not expired
- Re-enroll if necessary:
# Check current status
castellan-agent status
# Re-enroll with fresh token
castellan-agent enroll --token YOUR_TOKEN --portal-url https://api.castellanai.com --force
Authentication errors in logs
Agent credentials may be invalid or corrupted.
# Check enrollment status
castellan-agent status
# Re-enroll with new token from Portal
castellan-agent enroll --token YOUR_NEW_TOKEN --portal-url https://api.castellanai.com --force
# Restart service to apply
# Windows:
Restart-Service "Castellan Agent"
# Linux:
sudo systemctl restart castellan-agent
Events not appearing in portal
Verify event sources are configured correctly and agent has permissions to access logs.
- Check agent logs for "permission denied" errors
- Verify agent service is running with appropriate privileges
- Test event collection with a known security event
Agent service won't start
Check for configuration file errors or missing dependencies.
- Review agent logs for startup errors
- Verify configuration file syntax is correct
- Ensure all required dependencies are installed
- Try running agent manually to see error messages:
# Run in foreground for debugging
castellan-agent run
Agent Health Best Practices
- Monitor Status Regularly - Check the Agents page periodically to catch issues early
- Keep Agents Updated - Apply agent updates regularly for performance improvements and bug fixes
- Review Logs Periodically - Check agent logs to catch warnings before they become critical
- Ensure Network Access - Verify agents can reach CastellanAI servers through any firewalls or proxies
- Document Configurations - Maintain records of agent deployments for troubleshooting
Agent CLI Commands
Use these commands directly on the endpoint for troubleshooting:
| Command | Description |
|---|---|
castellan-agent status | Show enrollment status and agent information |
castellan-agent version | Show agent version, platform, and runtime |
castellan-agent run | Start agent in foreground (for debugging) |
castellan-agent enroll --token TOKEN --portal-url URL | Enroll or re-enroll the agent |
castellan-agent unenroll | Remove enrollment and clear credentials |
What's Next?
- Agent Troubleshooting - Deep dive into advanced agent troubleshooting
- Agent Updates - Learn how to update agents and manage versions
- Agent Configuration - Configure agent settings
Need Help?
If you're experiencing persistent agent health issues, contact support for assistance.