Agent Health Monitoring
Monitor and troubleshoot CastellanAI agents across your infrastructure.
Understanding Agent Status
Each agent reports its status to the portal every 30 seconds via heartbeat. The portal displays real-time health information including connection status and event collection activity.
Status Indicators
- 🟢 Connected
- ⚫ Disconnected
- 🟡 Pending
- 🔴 Error
- 🔵 Updating
Connected (Green)
Agent is connected and actively sending security events. All systems operational.
Indicators:
- Green status dot (pulsing animation)
- Last seen within the last few minutes
- Events being collected and transmitted
This is the expected state for all your agents. No action required.
Disconnected (Gray)
Agent has lost connection with the portal. May require attention.
Common Causes:
- Network connectivity issues
- Agent service stopped
- System restart or shutdown
- Firewall blocking outbound connections
Investigate agents that remain disconnected for more than 5 minutes.
Pending (Yellow)
Agent is newly enrolled and waiting for initial connection, or is in the process of connecting.
Indicators:
- Yellow status dot
- Agent recently enrolled
- Awaiting first heartbeat
New agents typically transition to Connected within 1-2 minutes of enrollment.
Error (Red)
Agent is experiencing errors that prevent normal operation. Requires immediate attention.
Common Causes:
- Authentication failures
- Configuration issues
- Service crash or critical error
Error status indicates the agent cannot function. Investigate immediately.
Updating (Blue)
Agent is currently receiving or applying updates.
Indicators:
- Blue status dot
- Agent version change in progress
The agent will return to Connected status after the update completes.
Monitoring Agent Health
Portal Agents Page
The primary method for monitoring agent health is through the Customer Portal:
- Log in to your Customer Portal
- Click Agents in the header navigation
- View the agent list with status indicators
- Review: Hostname, Platform, Status, Last Seen, Events Today
The status summary cards at the top show counts for each status type (Connected, Disconnected, Pending, etc.)
Local Agent Logs
Check agent logs directly on the endpoint for detailed diagnostic information:
- Windows
- Linux
- macOS
# Event Viewer
# Event Viewer → Applications and Services Logs → CastellanAgent
# Or check log files:
Get-Content "C:\ProgramData\CastellanAI\logs\agent.log" -Tail 50
# Using journalctl
sudo journalctl -u castellan-agent -n 50
# Or check log files:
sudo tail -f /var/log/castellan-agent/agent.log
# Using unified logging
sudo log show --predicate 'process == "CastellanAgent"' --last 1h
# Or check log files:
sudo tail -f /Library/Logs/CastellanAI/agent.log
Check Service Status
- Windows
- Linux
- macOS
Get-Service "Castellan Agent"
# Should show Status: Running
sudo systemctl status castellan-agent
# Should show Active: active (running)
sudo launchctl list | grep com.castellanai.agent
# Should show PID if running
Common Issues & Solutions
🔌 Agent shows Disconnected but service is running
Check network connectivity to the CastellanAI servers. Verify firewall allows outbound HTTPS (port 443).
- Windows
- Linux/macOS
Test-NetConnection -ComputerName api.castellanai.com -Port 443
curl -v https://api.castellanai.com
🟡 Agent stuck in Pending status
The agent may not have completed enrollment successfully.
- Check agent logs for enrollment errors
- Verify the enrollment token is valid and not expired
- Re-enroll if necessary:
# Check current status
castellan-agent status
# Re-enroll with fresh token
castellan-agent enroll --token YOUR_TOKEN --portal-url https://api.castellanai.com --force
🔐 Authentication errors in logs
Agent credentials may be invalid or corrupted.
# Check enrollment status
castellan-agent status
# Re-enroll with new token from Portal
castellan-agent enroll --token YOUR_NEW_TOKEN --portal-url https://api.castellanai.com --force
Then restart the service:
- Windows
- Linux
Restart-Service "Castellan Agent"
sudo systemctl restart castellan-agent
📊 Events not appearing in portal
Verify event sources are configured correctly and agent has permissions to access logs.
- Check agent logs for "permission denied" errors
- Verify agent service is running with appropriate privileges
- Test event collection with a known security event (e.g., failed login attempt)
❌ Agent service won't start
Check for configuration file errors or missing dependencies.
- Review agent logs for startup errors
- Verify configuration file syntax is correct
- Ensure all required dependencies are installed
- Try running agent manually to see error messages:
# Run in foreground for debugging
castellan-agent run
Agent Health Best Practices
Follow these practices to maintain healthy agent deployments.
| Practice | Description |
|---|---|
| Monitor Status Regularly | Check the Agents page periodically to catch issues early |
| Keep Agents Updated | Apply agent updates regularly for performance improvements and bug fixes |
| Review Logs Periodically | Check agent logs to catch warnings before they become critical |
| Ensure Network Access | Verify agents can reach CastellanAI servers through any firewalls or proxies |
| Document Configurations | Maintain records of agent deployments for troubleshooting |
Agent CLI Commands
Use these commands directly on the endpoint for troubleshooting:
| Command | Description |
|---|---|
castellan-agent status | Show enrollment status and agent information |
castellan-agent version | Show agent version, platform, and runtime |
castellan-agent run | Start agent in foreground (for debugging) |
castellan-agent enroll --token TOKEN --portal-url URL | Enroll or re-enroll the agent |
castellan-agent unenroll | Remove enrollment and clear credentials |
What's Next?
| Guide | Description |
|---|---|
| Agent Troubleshooting | Deep dive into advanced agent troubleshooting |
| Agent Updates | Learn how to update agents and manage versions |
| Agent Configuration | Configure agent settings |
Need Help?
If you're experiencing persistent agent health issues, contact support for assistance.
Contact Support for personalized assistance.