NNO Docs
ArchitectureCross cutting

NNO Observability

Documentation for NNO Observability

Date: 2026-03-30 Status: Partially Implemented (Phase 1) — structured logging deployed; Logpush/CFAE/dashboards are Phase 2 Parent: System Architecture Scope: All NNO core services + every provisioned platform Worker

Phase 1 implemented: packages/logger is live with the Logger class, requestLogger Hono middleware, and x-trace-id propagation. The gateway also has requestIdMiddleware (generating/propagating x-request-id) and tracingMiddleware (distributed tracing context via @neutrino-io/core/tracing) as deployed observability primitives. Deployed in 6 NNO core services (IAM, Registry, Billing, Provisioning, CLI Service, Gateway).

Phase 2 not yet implemented: Cloudflare Logpush configuration, Analytics Engine (CFAE) bindings, SLOs, alerting, and observability dashboards are designed below and planned for Phase 2.


Overview

Neutrino's observability stack is built on three Cloudflare-native pillars:

PillarTechnologyPurpose
LogsStructured console.log → Cloudflare Logpush → R2 / external SIEMAudit trail, debugging, security forensics
MetricsCloudflare Analytics Engine (CFAE)Real-time operational signals, usage metering, SLO tracking
Tracesx-trace-id propagation + CFAE span eventsCross-service request correlation

Each pillar serves two audiences:

  • NNO operators — Visibility across all platforms, all services, all tenants. Used for platform health monitoring, incident response, and capacity planning.
  • Platform admins — Scoped visibility into their platform only. Accessible via the NNO Portal observability dashboard. No access to other platforms' data or NNO internal service internals.

1. Structured Logging [Phase 1]

1.1 Log Format

Every NNO service and every provisioned platform Worker emits logs as newline-delimited JSON (NDJSON). Cloudflare Workers capture console.log() output and include it in Logpush streams.

// packages/logger/src/logger.ts

export type LogLevel = "info" | "warn" | "error" | "debug";

export interface LogEntry {
  timestamp: string; // ISO 8601
  level: LogLevel;
  service: string; // e.g. 'registry' | 'provisioning'
  traceId?: string; // x-trace-id propagated across services
  requestId?: string; // x-request-id per request
  message: string;
  [key: string]: unknown; // arbitrary structured data spread into entry
}

The LogEntry is intentionally flat: the Logger spreads any extra data fields directly into the entry (via ...data) rather than nesting them under a data key. This keeps log lines compact and easily queryable.

1.2 Logger Implementation

// packages/logger/src/logger.ts

export class Logger {
  constructor(
    private readonly service: string,
    private readonly traceId?: string,
    private readonly requestId?: string,
  ) {}

  private emit(
    level: LogLevel,
    message: string,
    data?: Record<string, unknown>,
  ): void {
    const entry: LogEntry = {
      timestamp: new Date().toISOString(),
      level,
      service: this.service,
      ...(this.traceId !== undefined && { traceId: this.traceId }),
      ...(this.requestId !== undefined && { requestId: this.requestId }),
      message,
      ...data, // extra fields are spread into the top-level entry
    };

    const output = JSON.stringify(entry);

    if (level === "error") {
      console.error(output);
    } else {
      console.log(output);
    }
  }

  info(message: string, data?: Record<string, unknown>): void {
    this.emit("info", message, data);
  }
  warn(message: string, data?: Record<string, unknown>): void {
    this.emit("warn", message, data);
  }
  error(message: string, data?: Record<string, unknown>): void {
    this.emit("error", message, data);
  }
  debug(message: string, data?: Record<string, unknown>): void {
    this.emit("debug", message, data);
  }
}

/** Convenience factory — creates a Logger without trace/request IDs */
export function createLogger(service: string): Logger {
  return new Logger(service);
}

Key differences from the previous documentation:

  • Constructor takes (service: string, traceId?: string, requestId?: string) — not a context object with version, platformId, etc.
  • Error-level logs use console.error(); all others use console.log().
  • Extra data fields are spread directly into the top-level log entry (not nested under data).
  • traceId and requestId are only included in the entry when defined (conditional spread).
  • No child(), request(), or fatal() methods exist.

1.3 Hono Request Logging Middleware

All NNO service Workers and platform Workers use a shared middleware that logs every request:

// packages/logger/src/middleware.ts

import { Logger } from "./logger.js";
import { initTrace } from "./trace.js";

export function requestLogger(service: string): MiddlewareHandler {
  return async (c, next) => {
    const start = Date.now();
    const { traceId, requestId } = initTrace(c.req.raw);

    // Set on Hono context — available to all downstream handlers
    c.set("traceId", traceId);
    c.set("requestId", requestId);

    const logger = new Logger(service, traceId, requestId);
    c.set("logger", logger);

    logger.info("→ request", { method: c.req.method, path: c.req.path });

    await next();

    logger.info("← response", {
      method: c.req.method,
      path: c.req.path,
      status: c.res.status,
      duration: Date.now() - start,
    });
  };
}

Key differences from the previous documentation:

  • Takes service: string (not a Logger instance) and creates the Logger internally after extracting trace context.
  • Calls initTrace(c.req.raw) to extract/generate both traceId and requestId.
  • Sets three values on the Hono context: traceId, requestId, and logger — downstream handlers access the logger via c.get("logger").
  • Emits both a request log (→ request) and a response log (← response) with duration.

1.4 Metrics Recording

packages/logger/src/metrics.ts provides a lightweight helper for recording metric data points. In production it writes to the Cloudflare Analytics Engine dataset nno_metrics; in development (or when the binding is unavailable) it falls back to structured console.log output.

// packages/logger/src/metrics.ts

export interface MetricLabels {
  [key: string]: string;
}

export function recordMetric(
  name: string, // e.g. 'gateway.request.count'
  value: number, // count, latency ms, bytes, etc.
  labels: MetricLabels, // key/value dimensions for filtering
  env?: { NNO_METRICS?: AnalyticsDataset }, // CF Workers env binding
): void;

Behaviour:

  • Production (env.NNO_METRICS present): calls env.NNO_METRICS.writeDataPoint() with the metric name and label values as blobs, the numeric value as doubles, and the metric name as indexes. Wrapped in a try/catch so metric failures never throw.
  • Development (no binding): emits \{ metric, value, labels, timestamp \} via console.log (not via Logger, to avoid circular dependency).

Note: recordMetric is not currently re-exported from the barrel index.ts. Import it directly: import \{ recordMetric \} from '@neutrino-io/logger/metrics'.

Phase 2 additions to audit infrastructure: The Registry audit_log table gains dedicated columns actor_email TEXT, ip_address TEXT, user_agent TEXT for queryability. Historical rows will have NULL in these columns; the existing metadata JSON column already captures this data for pre-Phase 2 entries. The platform_lifecycle_events table (Phase 2) provides a dedicated lifecycle audit trail separate from the general audit_log — it records every platform status transition with actor, trigger type, and reason.

1.5 What Each Service Logs

NNO Registry

EventLevelKey fields
Resource created/updated/deletedinforesourceType, resourceId, platformId
Manifest fetcheddebugplatformId, entityId, featureCount
Audit log writedebugaction, actorId
Query timeout (>500ms)warnquery, durationMs
Internal errorerrorerror.stack

NNO Provisioning

EventLevelKey fields
Job createdinfojobId, operation, platformId
Step started/completedinfojobId, step, durationMs
Step failederrorjobId, step, error, willRollback
Rollback started/completedwarnjobId, stepsToRollback
CF API rate limit hitwarnendpoint, retryAfterMs
CF API calldebugmethod, endpoint, status, durationMs

NNO CLI Service

EventLevelKey fields
Repo createdinfoplatformId, repoUrl
Feature config committedinfoplatformId, featureId, commitSha
CF Pages build triggeredinfoplatformId, buildId
GitHub API errorerrorendpoint, status, error

Platform Auth Workers

EventLevelKey fields
Login success/failureinfoplatformId, userId, method, result
Session created/invalidatedinfoplatformId, userId, sessionId
Permission deniedwarnplatformId, userId, permission
2FA triggeredinfoplatformId, userId, method

Auth events are also written to the auth D1 audit_authentication and audit_authorization tables (90-day retention) by the existing audit middleware — the structured log provides real-time streaming; D1 provides queryable history.

Platform Feature Workers

EventLevelKey fields
Request handledinfoplatformId, entityId, featureId, status, durationMs
Auth validation failurewarnplatformId, featureId, reason
D1 query slow (>200ms)warnfeatureId, query, durationMs
Unhandled errorerrorfeatureId, error.stack

2. Cloudflare Logpush [Phase 2]

Cloudflare Logpush streams real-time Worker logs (including console.log output and HTTP request fields) to a configurable destination.

2.1 Logpush Destinations

AudienceDestinationRetention
NNO internal (all services)R2 bucket nno-logs-internal90 days
Per-platform logs (for admins)R2 bucket nno-logs-\{platformId\}30 days
Security/SIEM (optional)Datadog / Splunk / custom HTTPS endpointPer SIEM policy

2.2 NNO Internal Logpush Configuration

One Logpush job per NNO core service, configured via Cloudflare API at provisioning time:

// Called once during NNO core service deployment
async function createLogpushJob(
  accountId: string,
  cfApiToken: string,
  workerName: string,
  r2BucketName: string,
): Promise<void> {
  await fetch(
    `https://api.cloudflare.com/client/v4/accounts/${accountId}/logpush/jobs`,
    {
      method: "POST",
      headers: {
        Authorization: `Bearer ${cfApiToken}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        name: `logpush-${workerName}`,
        destination_conf: `r2://${r2BucketName}/{DATE}/{HOUR}/{filename}`,
        dataset: "workers_trace_events",
        filter: JSON.stringify({
          where: {
            key: "ScriptName",
            value: workerName,
            op: "eq",
          },
        }),
        logpull_options:
          "fields=Event,EventTimestampMs,Outcome,Logs,ScriptName,Ray",
        enabled: true,
      }),
    },
  );
}

2.3 Platform Logpush Configuration

NNO Provisioning creates a Logpush job for each platform's Workers during platform provisioning. All Workers belonging to a platform (auth, feature Workers) are collected into one per-platform R2 bucket:

nno-logs-{platformId}/
├── 2026-02-22/
│   ├── 00/
│   │   └── {platformId}-auth-prod-{uuid}.json.gz
│   ├── 01/
│   │   └── {platformId}-analytics-prod-{uuid}.json.gz
│   └── ...
└── 2026-02-23/
    └── ...

Platform admins can download log files directly from R2 via a signed URL generated by the NNO Portal. In Phase 2, a basic log search UI is provided in the Portal.

2.4 Logpush Record Format

Each Logpush record from Cloudflare Workers contains:

{
  "ScriptName": "k3m9p2xw7q-r8n4t6y1z5-analytics-prod",
  "Ray": "8b2e4f1a2b3c4d5e",
  "Outcome": "ok",
  "EventTimestampMs": 1740220440123,
  "Logs": [
    {
      "Level": "log",
      "Message": [
        "{\"traceId\":\"abc\",\"service\":\"analytics\",\"level\":\"info\",\"message\":\"GET /api/data 200\",\"http\":{\"durationMs\":45}}"
      ],
      "TimestampMs": 1740220440100
    }
  ]
}

The Logs[*].Message[0] field contains the JSON-stringified LogEntry emitted by the Worker's logger. NNO log tooling parses this to extract structured fields.


3. Cloudflare Analytics Engine (Metrics) [Phase 2]

3.1 CFAE Datasets

One Analytics Engine dataset per logical domain. Dataset names follow the NNO naming convention:

DatasetWorkers that write to itKey measurements
nno-core-opsRegistry, Provisioning, CLI Service, Stack Registry, IAMOperation counts, durations, error rates per NNO service
\{platformId\}-usageAll Workers for a given platformPer-feature invocation counts (reused from billing metering)
\{platformId\}-perfAll Workers for a given platformResponse latency percentiles per feature + endpoint
nno-auth-eventsAll Auth Workers across all platformsLogin events, session counts (anonymised), 2FA usage
nno-buildsNNO CLI ServiceCF Pages build outcomes, durations

3.2 Data Point Schema

nno-core-ops

analytics.writeDataPoint({
  blobs: [
    service, // blob1: e.g. 'registry'
    operation, // blob2: e.g. 'GET /platforms'
    outcome, // blob3: 'success' | 'error' | 'timeout'
    platformId, // blob4: which platform the operation was for (or 'nno-internal')
  ],
  doubles: [
    1, // double1: request count (always 1 per data point)
    durationMs, // double2: response time in ms
    isError ? 1 : 0, // double3: error flag
  ],
  indexes: [service],
});

\{platformId\}-perf

analytics.writeDataPoint({
  blobs: [
    featureId, // blob1: e.g. 'analytics'
    endpoint, // blob2: e.g. 'GET /api/data'
    String(status), // blob3: HTTP status code
    entityId, // blob4: tenant
  ],
  doubles: [1, durationMs, status >= 500 ? 1 : 0],
  indexes: [featureId],
});

3.3 Querying CFAE

NNO Portal queries CFAE via the Analytics Engine SQL API:

// Error rate for all features on a platform (last 24h)
const sql = `
  SELECT
    blob1                          AS feature_id,
    SUM(double1)                   AS total_requests,
    SUM(double3)                   AS error_count,
    AVG(double2)                   AS avg_duration_ms,
    quantileWeighted(0.95)(double2, double1) AS p95_duration_ms
  FROM   ${platformId}_perf
  WHERE  timestamp >= now() - INTERVAL '24' HOUR
  GROUP BY blob1
  ORDER BY total_requests DESC
`;
// Provisioning job success rate (last 7 days)
const sql = `
  SELECT
    blob2        AS operation,
    SUM(double1) AS total,
    SUM(double3) AS errors,
    ROUND(100.0 * SUM(double3) / SUM(double1), 2) AS error_pct
  FROM   nno_core_ops
  WHERE  blob1 = 'provisioning'
    AND  timestamp >= now() - INTERVAL '7' DAY
  GROUP BY blob2
`;

4. Distributed Tracing [Phase 1]

Cloudflare Workers do not support OpenTelemetry natively (no spans, no trace context propagation built-in). NNO implements a lightweight tracing model using HTTP headers and CFAE span events.

4.1 Trace ID Propagation

A x-trace-id header is generated at the NNO Gateway and propagated through every downstream service call:

Client request
    → NNO Gateway         x-trace-id: <uuid>  x-request-id: <uuid>  (generated here via crypto.randomUUID())
    → NNO Registry        x-trace-id: <uuid>  x-request-id: <uuid>  (forwarded via withTraceHeaders)
    → NNO Provisioning    x-trace-id: <uuid>  x-request-id: <uuid>  (forwarded)
    → CF API call         (external — trace stops)

If a request already carries x-trace-id (e.g., from the NNO CLI), it is preserved and used throughout.

// packages/logger/src/trace.ts

export function initTrace(request: Request): {
  traceId: string;
  requestId: string;
} {
  const traceId = request.headers.get("x-trace-id") ?? crypto.randomUUID();
  const requestId =
    request.headers.get("x-request-id") ?? crypto.randomUUID();
  return { traceId, requestId };
}

export function withTraceHeaders(
  headers: Headers | [string, string][] | Record<string, string> | undefined,
  traceId: string,
  requestId: string,
): Headers {
  const result = new Headers(headers);
  result.set("x-trace-id", traceId);
  result.set("x-request-id", requestId);
  return result;
}

Key differences from the previous documentation:

  • No global TRACE_STORE — there is no module-level Map. Trace context is passed explicitly via function arguments and Hono context, not stored globally.
  • No currentTrace() — this function does not exist. Services access traceId/requestId from the Hono context (c.get("traceId"), c.get("requestId")) or pass them explicitly.
  • initTrace uses crypto.randomUUID() (Web Crypto API, available in all CF Workers), not nanoid. IDs have no tr_/req_ prefix.
  • initTrace reads both x-trace-id and x-request-id headers from the incoming request, falling back to crypto.randomUUID() for each.
  • withTraceHeaders requires explicit traceId and requestId arguments — it does not read from a global store. It sets both x-trace-id and x-request-id on the outgoing headers.

4.2 CFAE Span Events [Phase 2]

Not yet implemented. The spans.ts file does not exist in packages/logger. The emitSpan function and CFAE span dataset are planned for Phase 2 alongside the broader Analytics Engine integration. The design below is the target specification.

For operations spanning multiple async steps (provisioning jobs, stack activation pipeline), NNO will emit span events to CFAE:

// packages/logger/src/spans.ts  (Phase 2 — not yet implemented)

export function emitSpan(
  analytics: AnalyticsEngineDataset,
  span: {
    traceId: string;
    spanId: string;
    parentSpanId?: string;
    service: string;
    operation: string;
    startMs: number;
    endMs: number;
    outcome: "ok" | "error";
    platformId?: string;
  },
): void {
  analytics.writeDataPoint({
    blobs: [
      span.traceId,
      span.spanId,
      span.parentSpanId ?? "",
      span.service,
      span.operation,
      span.outcome,
      span.platformId ?? "nno-internal",
    ],
    doubles: [
      span.endMs - span.startMs, // duration in ms
      span.outcome === "error" ? 1 : 0,
    ],
    indexes: [span.traceId],
  });
}

With indexes: [traceId], all spans for a given trace will be fetchable efficiently:

-- All spans for a given trace (Phase 2)
SELECT blob4 AS service, blob5 AS operation,
       double1 AS duration_ms, blob6 AS outcome,
       blob2 AS span_id, blob3 AS parent_span_id
FROM   nno_core_ops_spans
WHERE  indexes[0] = 'tr_abc123'
ORDER BY timestamp ASC

4.3 Trace Correlation with Logs

Because every log entry includes traceId, a single trace ID allows correlation of:

  • All log lines across all NNO services that handled the request
  • All CFAE span events for the operation
  • The Logpush records for that specific Cloudflare Ray ID

This gives a complete picture of a single user action across the entire stack without a dedicated tracing backend.


5. Key Metrics & SLOs [Phase 2]

5.1 NNO Core Service SLOs

ServiceMetricTargetAlert at
NNO GatewayRequest error rate (5xx)< 0.1%> 1%
NNO Gatewayp99 latency< 500ms> 1000ms
NNO RegistryRead p99 latency< 100ms> 300ms
NNO RegistryWrite p99 latency< 200ms> 500ms
NNO ProvisioningJob success rate> 99%< 97%
NNO ProvisioningPROVISION_PLATFORM duration< 120s> 300s
NNO ProvisioningACTIVATE_FEATURE duration< 60s> 180s
NNO CLI ServiceFeature activation commit time< 10s> 30s
Stack RegistryTemplate publish p95< 5s> 15s
Stack RegistryVersion validation duration< 60s> 300s

5.2 Platform Worker SLOs (per platform)

MetricTargetNotes
Auth Worker p95 latency< 200msLogin, session validation
Feature Worker p95 latency< 300msPer activated feature
Feature Worker error rate< 0.5%5xx responses
CF Pages build success rate> 98%Platform shell rebuild
CF Pages build duration< 120sTypical pnpm install + build

5.3 Platform Shell SLOs (client-facing)

MetricTargetNotes
Shell TTFB (CF Pages CDN)< 50msStatic asset, fully cached
Auth session check latency< 100msCookie cache hit
Feature registry init time< 50msStatic imports, no network
Remote manifest fetch (Phase 2)< 30msCDN-cached KV read

6. Alerting [Phase 2]

Alerts are sent via email (NNO email Worker) and optionally to a Slack webhook or PagerDuty.

6.1 Alert Configuration

// Stored in NNO Registry — per-platform alert config
interface AlertConfig {
  platformId: string;
  email: string; // platform admin email
  slackWebhookUrl?: string;
  pagerdutyKey?: string;
  alerts: {
    errorRateThreshold: number; // e.g. 0.05 = 5%
    latencyP99ThresholdMs: number; // e.g. 1000
    buildFailureAlert: boolean;
    provisioningFailureAlert: boolean;
    usageAlerts: boolean; // from billing metering
  };
}

6.2 Alert Categories

CategoryTriggerSeverityDefault recipients
Platform Worker error spikeFeature Worker 5xx rate > 5% in 5-min windowHighPlatform admin
Auth Worker downAuth Worker returning 0 requests for 2 minCriticalPlatform admin + NNO ops
CF Pages build failureBuild exits non-zeroMediumPlatform admin
Provisioning job failedJob reaches FAILED stateHighNNO ops team
Provisioning job timeoutJob running > 10 minMediumNNO ops team
Registry D1 latencyp99 read > 300ms for 5 minHighNNO ops team
Stack Registry validation failure rate> 20% of publish attempts in 1 hourLowNNO ops team
Usage threshold50% / 75% / 90% / 100% of tier limitInfo → CriticalPlatform admin

6.3 Alert Message Format

[NNO ALERT] Platform k3m9p2xw7q — analytics Worker error rate critical

Platform:  AcmeCorp (k3m9p2xw7q)
Feature:   analytics
Alert:     Worker error rate exceeded threshold
Threshold: 5%
Current:   12.4% (over last 5 minutes)
Time:      2026-02-22 10:34:00 UTC

Affected endpoints:
  POST /api/data/export — 45% error rate
  GET  /api/report      — 8% error rate

Action: Review logs at https://portal.nno.app/platforms/k3m9p2xw7q/observability
Trace a recent error: https://portal.nno.app/platforms/k3m9p2xw7q/logs?traceId=tr_abc123


NNO Platform Monitoring

7. Platform Admin Dashboard (NNO Portal) [Phase 2]

Platform admins access observability via NNO Portal → Observability (/platforms/\{id\}/observability).

7.1 Overview Tab

  • Health status — Traffic light per deployed Worker (auth, each feature) based on real-time error rate
  • Request volume — Chart: total requests/hour across all platform Workers (last 24h, Recharts)
  • Error rate — Chart: 5xx error rate per feature (last 24h)
  • Active builds — CF Pages build status card with link to CF dashboard

7.2 Feature Performance Tab

Per-feature breakdown:

  • Request rate (req/min), error rate (%), p50/p95/p99 latency
  • Top 10 slowest endpoints
  • Top 10 most-erroring endpoints

Data source: GET /api/observability/features?platformId=\{id\}&range=24h Backed by CFAE query on \{platformId\}-perf dataset.

7.3 Logs Tab

  • Search by: time range, feature, log level, trace ID, user ID, free-text
  • Log viewer — Paginated list of structured log entries, expandable to show full JSON
  • Download — Export log files as .json.gz from R2 (signed URL, 15-min expiry)

Data source: Logpush files in nno-logs-\{platformId\} R2 bucket, read via the NNO Portal backend.

Phase 1: Download only. Phase 2: Real-time search via a lightweight log indexing Worker.

7.4 Audit Log Tab

  • All auth events (logins, logouts, org changes) from audit_authentication table
  • All authorization decisions from audit_authorization table
  • Filterable by event type, user, result (success/failure)
  • 90-day retention, paginated

Data source: GET /api/auth/admin/audit (auth Worker endpoint, platform-admin only)


8. NNO Operator Dashboard (Internal) [Phase 2]

NNO operators access a richer view via the NNO Portal's internal tools section (/internal/observability).

8.1 Cross-Platform Health

  • Fleet overview — One row per active platform: Worker count, aggregate error rate, last build status, subscription tier
  • Incident heatmap — Time × platform grid; cells coloured by error rate

8.2 Provisioning Monitor

  • Active provisioning jobs with live step progress
  • Failed jobs with rollback status and error details
  • Historical job completion times (p50/p95 per operation type)
  • CF API quota usage (Workers scripts deployed today vs. 200/day limit)

8.3 Stack Registry Pipeline

  • Submission queue depth (PENDING + IN_REVIEW counts)
  • Automated validation pass/fail rate (last 7 days)
  • Average review cycle time (submission → approval)
  • Recently approved/rejected packages

8.4 Trace Explorer

  • Search by traceId across all CFAE span datasets
  • Renders a waterfall diagram of span durations across services
  • Links to Logpush records for the same trace

9. Retention Policies

Data storeRetentionReasoning
CFAE metrics (all datasets)90 days rollingCF Analytics Engine limit
Logpush to R2 (internal)90 daysNNO debugging and audit
Logpush to R2 (per platform)30 daysPlatform admin access
Auth D1 audit tables90 daysCompliance requirement
Registry D1 audit_log1 yearPlatform provisioning audit
Billing usage_snapshots2 yearsInvoice dispute resolution
Billing invoices7 yearsLegal/accounting requirement

R2 lifecycle rules are configured at bucket creation to auto-delete objects beyond their retention window.


10. Wrangler Configuration

CFAE Binding (all NNO Workers + platform Workers)

# Added to every NNO service and every provisioned platform Worker template

[[analytics_engine_datasets]]
binding = "ANALYTICS"
dataset = "nno-core-ops"       # NNO internal services

# Platform Workers use:
# dataset = "{platformId}-usage"   (invocation metering — billing)
# dataset = "{platformId}-perf"    (latency / error rate — observability)

R2 Bindings (NNO logging Worker)

# services/nno-logging/wrangler.toml

[[r2_buckets]]
binding     = "INTERNAL_LOGS"
bucket_name = "nno-logs-internal"

# Per-platform log buckets are created dynamically at provisioning time:
# bucket_name = "nno-logs-{platformId}"

Secrets

SecretDescription
CF_API_TOKENToken with Logpush:Edit permission (for job creation at provisioning)
CF_ACCOUNT_IDCloudflare account ID
ALERT_EMAILNNO ops team email for internal alerts
ALERT_SLACK_WEBHOOKSlack webhook for NNO ops alerts
PAGERDUTY_KEYPagerDuty routing key for critical alerts

§11 Implementation Phases

Phase 1 Current State

The following observability infrastructure is built and deployed:

ComponentStatusDetails
packages/loggerLogger class✅ LiveStructured JSON log emission via console.log
packages/loggerrequestLogger middleware✅ LiveHono middleware logging every HTTP request
packages/loggerx-trace-id propagation✅ LiveTrace ID generated at gateway, forwarded to downstream services
packages/loggerrecordMetric helper✅ LiveWrites to CF Analytics Engine nno_metrics dataset (console fallback in dev)
Deployed in IAM✅ Liveservices/iam uses Logger + requestLogger
Deployed in Registry✅ Liveservices/registry uses Logger + requestLogger
Deployed in Billing✅ Liveservices/billing uses Logger + requestLogger
Deployed in Provisioning✅ Liveservices/provisioning uses Logger + requestLogger
Deployed in CLI Service✅ Liveservices/cli uses Logger + requestLogger
Deployed in Gateway✅ Liveservices/gateway has requestIdMiddleware + tracingMiddleware (via @neutrino-io/core/tracing)

The following are not yet implemented (Phase 2):

  • Cloudflare Logpush jobs and R2 log destinations
  • Cloudflare Analytics Engine (CFAE) dataset bindings and metric writes
  • SLO definitions and alert configuration
  • Platform admin observability dashboard (NNO Portal)
  • NNO operator internal dashboard

Implementation delta: Observability Phase 1 Plan.


Status: Partially implemented — Phase 1 structured logging deployed; Phase 2 Logpush/CFAE/dashboards planned Implementation target: packages/logger/ · services/logging/ · apps/console/ Related: System Architecture · NNO Provisioning · NNO Registry · NNO Billing & Metering · NNO Auth Model

On this page