Skip to main content
Arc provides three observability subsystems: Prometheus metrics, structured NDJSON access logs, and W3C distributed tracing.

Metrics

Configuration

observability:
  metrics_bind: "127.0.0.1:9090"
  metrics_enabled: true

Endpoints

EndpointMethodDescription
/metricsGETPrometheus text format
/healthzGETReturns ok
Access control: if no auth_token is configured, only loopback addresses (127.x, ::1) are allowed. If auth_token is set, the Authorization: Bearer <token> header is required from any IP.

Prometheus metric reference

Connection metrics:
MetricTypeDescription
arc_accepted_totalcounterTotal accepted TCP connections
arc_accept_rejected_totalcounterConnections rejected at accept
arc_active_currentgaugeCurrently active connections
arc_closed_totalcounterTotal closed connections
Request/response metrics:
MetricTypeDescription
arc_requests_totalcounterTotal HTTP requests received
arc_responses_totalcounterTotal HTTP responses sent
Bandwidth:
MetricTypeDescription
arc_bytes_client_in_totalcounterBytes received from clients
arc_bytes_client_out_totalcounterBytes sent to clients
arc_bytes_upstream_in_totalcounterBytes received from upstreams
arc_bytes_upstream_out_totalcounterBytes sent to upstreams
Phase timing (for each of cli_read, up_conn, up_write, up_read, cli_write):
MetricTypeDescription
arc_phase_time_sum_ns_<phase>counterCumulative nanoseconds in phase
arc_phase_count_<phase>counterPhase observations
arc_phase_timeouts_<phase>counterPhase timeout count
io_uring health:
MetricTypeDescription
arc_ring_sq_dropped_totalcounterSubmission queue drops
arc_ring_cq_overflow_totalcounterCompletion queue overflows
Non-zero values in arc_ring_sq_dropped_total or arc_ring_cq_overflow_total indicate the io_uring ring is too small. Increase io_uring.uring_entries in your configuration.
Upstream pool:
MetricTypeDescription
arc_upstream_pool_open_currentgaugeTotal open upstream connections
arc_upstream_pool_idle_currentgaugeIdle upstream connections
arc_upstream_pool_busy_currentgaugeIn-use upstream connections
Traffic mirroring:
MetricTypeDescription
arc_mirror_submitted_totalcounterMirror tasks enqueued
arc_mirror_sent_totalcounterMirror requests sent
arc_mirror_queue_full_totalcounterMirror tasks dropped (queue full)
arc_mirror_timeout_totalcounterMirror tasks dropped (timeout)
arc_mirror_status_2xx_totalcounterShadow 2xx responses
arc_mirror_latency_sum_nscounterTotal shadow latency
Logging subsystem:
MetricTypeDescription
arc_log_written_total{kind=...}counterWritten log records by kind
arc_log_dropped_total{reason=...}counterDropped records by reason
arc_log_write_errors_totalcounterWrite failures
arc_log_buffer_depthgaugeRing buffer occupancy
arc_log_write_duration_secondshistogramio_uring batch write latency
Rate limiting:
MetricTypeDescription
arc_ratelimit_rejected_totalcounterRequests rejected by rate limiter
arc_ratelimit_circuit_opengauge1 when Redis backend circuit breaker is open
Config and routing:
MetricTypeDescription
arc_config_reload_totalcounterSuccessful hot reloads applied
arc_config_reload_error_totalcounterHot reloads rejected
arc_route_requests_total{route="..."}counterRequests matched per named route
TLS and plugins:
MetricTypeDescription
arc_tls_cert_expiry_secondsgaugeSeconds until the nearest certificate expiry
arc_plugin_timeout_totalcounterPlugin invocations interrupted by budget timeout
XDP:
MetricTypeDescription
arc_xdp_blacklisted_totalcounterIPs added to the XDP blacklist (dynamic + manual)

Design

Workers write AtomicU64::fetch_add with Ordering::Relaxed — one instruction, no locking. WorkerMetrics is #[repr(C, align(64))] so each worker’s struct occupies its own cache line. The admin server reads metric snapshots from a background thread that refreshes every 250ms.

Access logs

Arc writes structured NDJSON access logs. There is no text-format option.

Configuration

logging:
  output:
    file: /var/log/arc/access.log
    stdout: false
    rotation:
      max_size: 500mb
      max_files: 30
      compress: true

  access:
    sample: 0.01             # log 1% of requests
    force_on_status: [401, 403, 429, 500, 502, 503, 504]
    force_on_slow: 500       # always log requests > 500ms

  redact:
    headers: [Authorization, Cookie, X-Api-Key]
    query_params: [token, secret, password]

  writer:
    ring_capacity: 8192      # SPSC ring buffer size per worker
    batch_bytes: 262144      # 256 KB — flush when batch reaches this size
    flush_interval: 50       # ms — time-based flush

Log record fields

Each access log line is a single JSON object:
FieldTypeDescription
tsstring (RFC 3339, nanoseconds)Request timestamp
levelstringinfo, warn, or error
kindstring"access"
trace_idstringW3C trace ID (32 hex chars)
span_idstringW3C span ID (16 hex chars)
request_idstringPer-request unique ID
methodstringHTTP method
pathstringRequest path
querystringQuery string (redaction applied)
hoststringHost header
statusnumberHTTP response status
routestringMatched route name
upstreamstringUpstream group name
upstream_addrstringResolved upstream address
client_ipstringClient remote IP
client_portnumberClient remote port
bytes_sentnumberResponse bytes sent
bytes_receivednumberRequest bytes received
duration_msnumberTotal request duration
upstream_connect_msnumber?Time to upstream TCP connect
upstream_response_msnumber?Time to first upstream byte
attemptnumberRetry attempt number
tlsbooleanWhether connection used TLS
http_versionstringHTTP/1.1 or HTTP/2.0
Example:
{
  "ts": "2026-03-02T10:45:23.123456789Z",
  "level": "info",
  "kind": "access",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "span_id": "00f067aa0ba902b7",
  "request_id": "01HRXYZ...",
  "method": "GET",
  "path": "/api/users/42",
  "query": "",
  "host": "api.example.com",
  "status": 200,
  "route": "api",
  "upstream": "app",
  "upstream_addr": "10.0.1.1:3000",
  "client_ip": "203.0.113.5",
  "client_port": 54321,
  "bytes_sent": 1024,
  "bytes_received": 256,
  "duration_ms": 12,
  "upstream_connect_ms": 1,
  "upstream_response_ms": 8,
  "attempt": 1,
  "tls": true,
  "http_version": "HTTP/2.0"
}

Redaction

Arc redacts sensitive values before writing logs:
  • Headers — matched case-insensitively; values replaced with [REDACTED]
  • Query parameters — matched case-insensitively; values replaced with [REDACTED]
  • Body fields — JSONPath-style ($.field.subfield); values replaced with "[REDACTED]"
Redaction rules are rebuilt automatically on hot reload.

File rotation

Rotation is triggered when the active log file exceeds max_size. The steps are:
  1. Rename the active file to a timestamped archive name (near-instant metadata operation)
  2. Reopen the active path immediately so writes continue uninterrupted
  3. Compress the archive with gzip in a background thread (if compress: true)
  4. Delete the oldest archives to keep only max_files retained

Design — no backpressure on workers

Workers push log events into a bounded SPSC ring buffer. If the ring is full, the event is dropped and a counter is incremented. Workers never block. The writer thread drains all rings, encodes NDJSON manually (no serde at write time), and flushes via io_uring Write operations in batches.

Distributed tracing

Arc implements W3C Trace Context. Every request gets a trace_id and span_id that appear in access logs and are propagated to upstreams.

How trace context is resolved

SituationBehavior
Incoming request has a valid traceparentParse and reuse trace_id; use incoming span_id as the parent
traceparent is missing or malformedGenerate a fresh 128-bit trace_id and 64-bit span_id
Forwarding to an upstreamKeep the same trace_id; generate a new span_id for the child span
LoggingEmit trace_id and span_id as lowercase hex

traceparent format

Arc parses and emits the standard W3C format:
00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
  • 2-char version (00)
  • 32-char trace ID (must not be all zeros)
  • 16-char span ID (must not be all zeros)
  • 2-char flags (01 = sampled)

Upstream propagation

When Arc forwards a request to an upstream, it injects a traceparent header with the same trace_id but a freshly generated span_id. This creates a parent-child span relationship between the Arc hop and the upstream.
Client → Arc:     traceparent: 00-TTTT-SSSS-01
Arc → Upstream:   traceparent: 00-TTTT-SNEW-01   (new span ID)
Access log:       trace_id=TTTT, span_id=SSSS     (the Arc span)

Searching logs by trace ID

# Using the arc CLI
arc logs query --trace-id 4bf92f3577b34da6a3ce929d0e0e4736

# Or with jq on the log file
grep "4bf92f3577b34da6a3ce929d0e0e4736" /var/log/arc/access.log | jq .

OTLP export

To export traces to an OpenTelemetry collector:
observability:
  tracing:
    endpoint: "http://otel-collector:4317"
    insecure: true    # disable TLS for the OTLP connection

Troubleshooting

Check that observability.metrics_bind is set and that Arc is running. The default is 127.0.0.1:9090 — requests from other hosts will be refused unless you bind to 0.0.0.0:9090. Confirm with: curl http://127.0.0.1:9090/metrics.
Verify observability.access_log.enabled: true and that logging.output.file is set to a writable path. Also check that sample is not set to 0.0. Logs are written asynchronously via SPSC ring buffer — if the writer thread is behind, a short delay is normal.
Rotation is triggered when FileState.offset >= RotationConfig.max_size_bytes. If the file never reaches that size, no rotation occurs. Check arc_log_written_total in /metrics to confirm bytes are being written. If compression is enabled and arc_log_compress_dropped_total is incrementing, the compression queue is full — reduce write rate or increase queue capacity via ARC_LOG_COMPRESS_QUEUE_CAPACITY.
Check that observability.tracing.endpoint is reachable from Arc. Set insecure: true if the collector does not have TLS. The trace_id field in access logs can be used to confirm Arc is generating trace context even if OTLP export is failing.
Phase timing metrics are only emitted for requests that go through the full pipeline. Check that Arc is receiving and proxying traffic. If metrics appear after the first request, this is expected behavior.