Skip to content

Monitoring, Telemetry, Grafana and Alerts

Chronix includes a large monitoring and telemetry layer around the trading system.

Grafana is the main surface for historical and operational dashboards:

  • connector health;
  • feed status;
  • latency metrics;
  • order round-trip metrics;
  • rate-limit state;
  • account and exposure metrics;
  • strategy business metrics;
  • algo runtime status;
  • algo administration views: status, parameters, pauses/stops, inventory, orders, risk events and rate-limit state;
  • technical service status;
  • error and event timelines;
  • historical risk and execution analytics.

Telemetry is not only for engineering. It is a business and operations layer:

  • traders see whether a workflow is behaving normally;
  • risk managers see exposure and limit state;
  • operators see service health and alerts;
  • engineers diagnose latency, reconnects and failure modes;
  • quants compare strategy behavior across live and historical runs.

Alerts are primarily handled through Grafana alerting, plus lightweight scripts or notification routing where a deployment needs a custom channel or action. This keeps alerting close to the same metrics and event data used for monitoring. Alerts convert abnormal states into action:

  • risk alerts;
  • connector health alerts;
  • latency and rate-limit alerts;
  • strategy error alerts;
  • execution/order-state alerts;
  • formula-based market condition alerts;
  • acknowledgement, silence, resolution and escalation flows.

Chronix should not be allowed to sit silently in an abnormal state. The goal is to surface failures, degradation and unsafe conditions quickly enough that a desk can act.