Skip to main content

Connectivity Issues

Diagnose and resolve agent communication problems, network connectivity issues, and firewall configuration errors.

Common Symptoms

Connectivity issues typically present with these symptoms:

SymptomDescription
Agent Shows as "Offline"Agent cannot establish connection to CastellanAI backend
Intermittent Connection DropsAgent alternates between online and offline status
Events Not AppearingAgent is connected but event data is not being transmitted
High LatencyEvents take longer than 30 seconds to appear in dashboard

Diagnostic Steps

Follow these steps systematically to identify the root cause:

Step 1: Check Agent Service Status

Verify the CastellanAI agent service is running on the affected host.

Windows:

Get-Service CastellanAgent | Select-Object Status, StartType

Linux:

sudo systemctl status castellan-agent

Step 2: Test Network Connectivity

Verify the agent can reach CastellanAI backend endpoints.

Test HTTPS connectivity (port 443):

Test-NetConnection -ComputerName api.castellanai.com -Port 443

Test WebSocket connectivity:

Test-NetConnection -ComputerName ws.castellanai.com -Port 443

Step 3: Review Agent Logs

Check agent logs for connection errors or authentication failures.

PlatformLog Location
WindowsC:\ProgramData\CastellanAgent\logs\agent.log
Linux/var/log/castellan-agent/agent.log

Look for error patterns:

  • Connection timeout errors
  • Certificate validation failures
  • Authentication token expired/invalid
  • DNS resolution failures

Step 4: Verify Firewall Rules

Ensure firewall allows outbound HTTPS and WebSocket connections.

Required outbound rules:

DestinationProtocolPort
api.castellanai.comHTTPS443
ws.castellanai.comWebSocket (WSS)443
updates.castellanai.comHTTPS443

Step 5: Check Proxy Configuration

If your organization uses a proxy, verify agent proxy settings.

Configure proxy in agent.yaml:

proxy:
enabled: true
server: proxy.company.com
port: 8080
authentication:
username: proxy-user
password: proxy-password
bypass:
- localhost
- 127.0.0.1

Common Issues and Solutions

SSL Certificate Validation Failures

Symptom: Agent logs show "SSL certificate validation failed" or "certificate chain cannot be verified"

Solutions:

  1. Ensure system date/time is accurate (certificate validity is time-sensitive)
  2. Install latest root certificates on the system
  3. For corporate environments with SSL inspection, import the corporate CA certificate
  4. Update agent to latest version (may include updated certificate bundle)

Authentication Token Expired

Symptom: Agent shows as offline with "401 Unauthorized" errors in logs

Solutions:

  1. Navigate to Agents page in dashboard
  2. Find the affected agent and click Regenerate Token
  3. Copy the new token and update agent configuration file
  4. Restart agent service: Restart-Service CastellanAgent

Intermittent Connection Drops

Symptom: Agent frequently disconnects and reconnects every few minutes

Solutions:

  1. Check for network stability issues (packet loss, high latency)
  2. Adjust WebSocket keep-alive interval in agent.yaml: keepalive_interval: 30
  3. Increase connection timeout: connection_timeout: 60
  4. Check firewall for aggressive timeout settings on long-lived connections

DNS Resolution Failures

Symptom: Logs show "Unable to resolve hostname" or "DNS lookup failed"

Solutions:

  1. Test DNS resolution: nslookup api.castellanai.com
  2. Verify DNS servers are reachable and functioning
  3. Add DNS entries to hosts file as fallback: C:\Windows\System32\drivers\etc\hosts
  4. Configure custom DNS servers in agent configuration if needed

Automated Connectivity Test

CastellanAI provides a built-in connectivity test tool for comprehensive diagnostics:

Run the connectivity test:

castellan-agent test-connection

Expected output for healthy connection:

Connectivity Test Results:
✓ DNS Resolution: api.castellanai.com → 52.168.10.5
✓ HTTPS Connection: Port 443 (200 OK)
✓ WebSocket Connection: Port 443 (Connected)
✓ Authentication: Token valid
✓ Event Transmission: Test event sent successfully

Status: All checks passed
warning

If any checks fail, the tool provides specific error messages and resolution suggestions.

Prevention Best Practices

  • Monitor Agent Health Proactively - Enable agent health notifications to receive alerts when agents go offline or experience connectivity issues.

  • Document Network Requirements - Provide firewall teams with complete CastellanAI network requirements during deployment planning.

  • Keep Agents Updated - Enable automatic agent updates or regularly update agents manually to receive connectivity improvements.

  • Test After Network Changes - Run connectivity tests after firewall rule changes, proxy updates, or network maintenance.

What's Next?