NNO Docs
ArchitectureServices

NNO Provisioning Service

Documentation for NNO Provisioning Service

Date: 2026-03-30 Status: Detailed Design Parent: System Architecture Service: services/provisioning


Overview

The NNO Provisioning Service is responsible for creating, configuring, and deleting Cloudflare infrastructure resources on behalf of client platforms. It is the only NNO service that calls the Cloudflare API directly.

NNO follows a lazy, on-demand provisioning model: platforms receive a minimal resource footprint on signup (auth Worker + registry record + billing customer), and additional Cloudflare resources (Pages projects, D1 databases, R2 buckets, KV namespaces) are created only when a feature that requires them is activated. This mirrors the model of Supabase or Firebase — a project exists instantly, and capabilities are enabled as needed.

Every provisioning operation is:

  • Queue-drivenPOST /provision/* creates a job record (PENDING), enqueues it to the Cloudflare Queue, and returns 202 Accepted. The queue consumer executes the job asynchronously, making real Cloudflare API calls. Callers poll GET /provision/jobs/:jobId to track progress.
  • Tracked — recorded as a provision_job in the Provisioning D1 before any steps run
  • Idempotent — steps check whether the target resource already exists before calling the CF API; already-created resources are detected and skipped
  • Transactional in intent (Phase 2) — failed jobs will trigger rollback of all completed steps
  • Audited — every state change is written to structured logs

Phase 1 vs Phase 2: Phase 1 queue-based async execution is live — real Cloudflare API calls are made for all six job types. The detailed design in §1.2–§1.4 and §6 (Rollback) describe Phase 2 targets (typed CF client class, rollback traversal). Phase 2 will also add DLQ alerting and queue event wiring from IAM/Registry (§7).

┌─── Trigger paths ────────────────────────────────────────────────┐
│                                                                   │
│  1. Registration (automatic)                                      │
│     services/iam sign-up ──► platform.registered (CF Queue)      │
│                                                                   │
│  2. Feature activation (self-service)                             │
│     services/registry PATCH /features ──► feature.activating     │
│                              (CF Queue)                           │
│                                                                   │
│  3. Operator override (manual)                                    │
│     NNO Zero UI ──► services/gateway ──► POST /provision/*       │
│                                                                   │
└──────────────────────────────┬────────────────────────────────────┘


                  NNO Provisioning Service
                  (CF Queue consumer)

               ┌───────────────┼───────────────┐
               ▼               ▼               ▼
         Cloudflare API   NNO Registry    services/billing
         (CF Workers,     (platform,      (Stripe customer
          D1, R2, KV,      resources,      creation)
          Pages)           feature status)

0.5 Phase 1 Implementation Detail [Phase 1]

Implementation flow detail: Provisioning Phase 1 Plan.

Implemented Job Types

All six job types are wired and make real CF API calls:

Job TypeExecutor fileCF API calls made
BOOTSTRAP_PLATFORMexecutors/provision-platform.tsD1 create, Worker deploy, secrets, migrations, Pages project + build
ACTIVATE_FEATUREexecutors/activate-feature.tsD1 create (conditional), Worker deploy (conditional), secrets, migrations
DEACTIVATE_FEATUREexecutors/deactivate-feature.tsWorker delete only — D1 is never deleted
PROVISION_STACKexecutors/provision-stack.tsShared resources + per-feature sub-jobs
DEACTIVATE_STACKexecutors/deactivate-stack.tsEnqueues DEACTIVATE_FEATURE sub-jobs; deletes shared resources only if deleteData: true
DEPROVISION_PLATFORMexecutors/deprovision-platform.tsFull teardown
ONBOARD_PLATFORM (Implemented)executors/onboard-platform.tsNo direct CF API calls — orchestrates Registry + Billing + enqueues BOOTSTRAP_PLATFORM
UPGRADE_AUTH_WORKERexecutors/upgrade-auth-worker.tsFetches latest auth bundle from NNO_AUTH_BUNDLE_URL, re-deploys the auth Worker (preserving D1 binding), refreshes CORS_ORIGINS secret
CREATE_APPexecutors/create-app.tsCreates a CF Pages project and/or stub Worker for a new app/service within an existing workspace stack; optionally creates a D1 database; registers DNS hostnames and resources in Registry
ADD_CUSTOM_DOMAINexecutors/add-custom-domain.tsAdds a CF4SaaS custom hostname to an existing DNS-registered resource; requires CF_ZONE_ID; registers the custom domain record in Registry

ONBOARD_PLATFORM (Implemented) — outer onboarding job that orchestrates pre-provisioning steps before triggering platform resource creation. Steps: (1) create platform + entity records in Registry, (2) create Stripe customer + subscription in Billing, (3) enqueue BOOTSTRAP_PLATFORM job for CF resource creation, (4) update onboarding_sessions checklist as each step completes. Triggered by the self-serve onboarding endpoint in Registry. Wraps but does not replace BOOTSTRAP_PLATFORM.

Phase 1 implementation detail: see Provisioning Phase 1 Plan.


0. Provisioning Triggers [Phase 1]

Provisioning is initiated by three paths, all converging on the same job queue:

0.1 Registration Trigger (Automatic)

When a new client registers, services/iam emits a platform.registered event to a Cloudflare Queue. The provisioning consumer picks this up and runs BOOTSTRAP_PLATFORM — creating only the minimum viable resources needed before the client can log in:

Client sign-up


services/iam
  ├── Create user record in IAM D1
  ├── Create platform record in Registry
  └── Enqueue: platform.registered → CF Queue


         services/provisioning (queue consumer)
           ├── Deploy auth Worker        [always]
           ├── Create auth D1 + migrate  [always]
           └── Create Stripe customer    [always]

No CF Pages project, no feature D1 databases, no R2 buckets — those are created on-demand when features are activated.

0.2 Feature Activation Trigger (Self-Service)

When a client activates a feature from their console, services/registry updates the feature record to activating and emits a feature.activating event. The provisioning consumer reads the feature's FeatureManifest.resources declaration and creates only the resources that feature requires:

Client activates "Analytics" feature


services/registry
  ├── Quota check via services/billing   ← plan gate
  ├── PATCH feature_activation status → 'activating'
  └── Enqueue: feature.activating → CF Queue


         services/provisioning (queue consumer)
           ├── Read FeatureManifest.resources
           │   { d1: true, worker: true, minimumPlan: 'growth' }
           ├── Create analytics D1 + run migrations
           ├── Deploy analytics Worker
           └── PATCH feature_activation status → 'active'

See Section 2.2 for the full ACTIVATE_FEATURE step table.

0.3 Operator Override (Manual)

The NNO operator can provision, deprovision, or force-activate resources for any platform via the operator portal or directly via the provisioning API. Like all other trigger paths, this creates a provision_job record (PENDING) and enqueues it to the Cloudflare Queue — the same queue consumer (handleQueueBatch) executes it asynchronously. The HTTP endpoint returns 202 Accepted immediately; callers poll GET /provision/jobs/:jobId for progress.

The operator plane is a management/override layer — not the primary provisioning path for normal client onboarding.

0.4 Stack Activation Trigger (Self-Service)

When a platform admin activates a stack template (or creates a platform-local stack) from the portal or via CLI, services/registry validates the StackDefinition and emits a stack.activating event:

Platform admin activates "[email protected]" from portal UI


services/registry
  ├── Fetch StackDefinition from services/stack-registry (for template-based stacks)
  │   OR use inline definition (for platform-local stacks)
  ├── Validate all featureIds exist in feature catalogue
  ├── Quota check via services/billing (minimumPlan gate)
  ├── Create stacks record (status: PENDING)
  └── Enqueue: stack.activating → CF Queue


         services/provisioning (queue consumer)
           ├── Fetch StackDefinition from stack-registry (step 1)
           ├── Provision shared resources (D1, R2, KV as declared)
           └── Activate each feature with shared resource bindings

Returns 202 \{ stackInstanceId, jobId \}. The job is tracked as PROVISION_STACK type.


1. Cloudflare API Integration [Phase 1]

1.1 API Credentials

In Phase 1 (NNO full control), the Provisioning Service uses a single set of credentials stored as Wrangler secrets:

SecretDescription
CF_API_TOKENCloudflare API token with scoped permissions (see below)
CF_ACCOUNT_IDCloudflare account ID
CF_ZONE_IDCloudflare Zone ID — required by the ADD_CUSTOM_DOMAIN executor for CF4SaaS custom hostname registration

Required token permissions (scoped, not global):

PermissionScopeWhy
Workers Scripts:EditAccountDeploy and delete Workers
Workers Routes:EditZoneManage Worker routes
Cloudflare Pages:EditAccountCreate and manage Pages projects
D1:EditAccountCreate, delete D1 databases
Workers R2 Storage:EditAccountCreate, delete R2 buckets
Workers KV Storage:EditAccountCreate, delete KV namespaces
Queues:EditAccountCreate, delete Queues

1.2–1.4 CF API Client, Resource Operations, Rate Limits — Phase 2 Design

Phase 2 design. See Provisioning Phase 2 Plan.


2. Provisioning Operations [Phase 1]

2.1 BOOTSTRAP_PLATFORM (formerly PROVISION_PLATFORM)

Triggered automatically on client registration via the platform.registered queue event (§0.1), or manually by the operator via the Zero UI. Creates the minimum viable resource set only — additional resources are provisioned lazily when features are activated (§2.2).

Input:

{
  platformId: string;
  planTier: string;              // 'starter' | 'growth' | 'scale'
  billingEmail: string;          // billing contact email
  defaultEntityId?: string;      // optional pre-generated tenant ID for the default tenant
  environment?: 'dev' | 'stg' | 'prod';  // default: 'prod'
}

Steps (in order) — minimal bootstrap only:

[Updated for DNS architecture] Resource names now use stack-id (default) instead of entity-id. DNS hostnames are registered via CF4SaaS at provisioning time.

StepActionRollback action
1Create \{pid\}-default-auth-db D1 (staging: \{pid\}-default-auth-db-stg)Delete D1
2Deploy \{pid\}-default-auth Worker (staging: \{pid\}-default-auth-stg)Delete Worker
3Set secrets on auth Worker(no rollback — secrets deleted with Worker)
4Run auth D1 migrations(no rollback — migration is idempotent)
5Create Stripe customer recordCancel Stripe customer
6Register platform + auth resources in RegistryMark as deleted
7Register DNS hostname auth.svc.default.\{pid\}.nno.app via CF4SaaSDelete CF4SaaS custom hostname

Not included in bootstrap: CF Pages project, feature D1 databases, R2 buckets, KV namespaces — these are created on-demand when the relevant feature is activated (see §2.2). A client's platform is considered "live" (able to log in) as soon as the auth Worker and D1 are up.

2.2 ACTIVATE_FEATURE

Triggered when a client activates a feature from their console (via the feature.activating queue event, §0.2) or when the operator force-activates a feature from the Zero UI. Creates only the Cloudflare resources declared in the feature's FeatureManifest.resources block — nothing more.

Input:

{
  platformId: string;
  entityId: string; // required — ID of the entity (tenant/sub-tenant) activating the feature
  featureId: string; // e.g. 'analytics'
  featureVersion: string; // e.g. '1.2.0'
  environment: "dev" | "stg" | "prod";
  resources: FeatureResourceRequirements; // read from FeatureManifest.resources at activation time
}

FeatureResourceRequirements (declared in each feature's FeatureManifest):

interface FeatureResourceRequirements {
  worker?: boolean; // Deploy a CF Worker for this feature's backend API
  pages?: boolean; // Create a CF Pages project (SPA portal)
  d1?: boolean; // Create a D1 SQLite database
  r2?: boolean; // Create an R2 object storage bucket
  kv?: boolean; // Create a KV namespace
  queue?: boolean; // Create a Cloudflare Queue
  minimumPlan?: "starter" | "growth" | "scale"; // Plan gate — checked before job is queued
}

Steps (conditioned on resources.*):

StepConditionActionRollback
1AlwaysQuota check via billing service
2AlwaysMark feature_activation as 'activating'Mark as 'failed'
3resources.d1Create feature D1 databaseDelete D1
4resources.d1Register D1 in RegistryMark deleted
5resources.d1Run D1 migrations(idempotent)
6resources.r2Create R2 bucketDelete bucket
7resources.r2Register R2 in RegistryMark deleted
8resources.kvCreate KV namespaceDelete namespace
9resources.kvRegister KV in RegistryMark deleted
10resources.workerDeploy feature WorkerDelete Worker
11resources.workerSet secrets on Worker(with Worker)
12resources.workerRegister Worker in RegistryMark deleted
13resources.pagesCreate CF Pages projectDelete Pages project
14resources.pagesConnect to platform GitHub repoDisconnect
15resources.pagesTrigger initial Pages build(build can fail independently)
16AlwaysUpdate feature_activation status to 'active'Mark as 'failed'
17AlwaysTrigger platform shell rebuild via CLI Service(rebuild can fail independently)

Plan enforcement: Before the job is enqueued, services/registry calls GET /api/v1/billing/quota/check?platformId=&resource= to verify the platform's plan permits the requested resources. A 402 response halts the activation with an upgrade prompt — no job is created.

2.3 DEACTIVATE_FEATURE

Gracefully removes a feature's resources. Data is preserved (D1 not deleted) by default.

Input:

{
  platformId: string;
  entityId: string;                         // required — ID of the entity whose feature resources are being removed
  featureId: string;
  environment: string;
  deleteData?: boolean;      // default: false — preserve D1 data
}

Steps:

StepAction
1Mark feature_activation as 'deactivating'
2Delete feature Worker (stops serving traffic)
3Mark Worker resource as deleted in Registry
4If deleteData: true — delete D1 database
5Mark D1 resource as deleted in Registry (or 'archived' if data preserved)
6Mark feature_activation as 'inactive'
7Trigger platform shell rebuild via CLI Service

2.3a PROVISION_STACK

Triggered when a platform admin activates a stack template (via the stack.activating queue event, §0.4). Provisions shared CF resources first, then activates each feature in the stack with shared resource bindings injected.

Input:

{
  platformId: string;
  stackInstanceId: string; // pre-created stacks record
  stackDefinition: StackDefinition; // resolved from stack-registry or inline (local)
  environment: "dev" | "stg" | "prod";
}

Steps (in order):

StepNameConditionActionRollback
1resolve_stack_definitionAlwaysFetch StackDefinition from stack-registry (template) or use inline (local). Validate all featureIds exist.
2create_shared_d1resources.sharedD1Create D1: \{platformId\}-\{stackId\}-db-\{env\}. Idempotent.Delete D1
3register_shared_d1resources.sharedD1Register D1 in Registry as resource_type: 'd1', service_name: '\{stackId\}'Mark deleted
4create_shared_r2resources.sharedR2Create R2 bucket: \{platformId\}-\{stackId\}-storage-\{env\}. Idempotent.Delete bucket
5register_shared_r2resources.sharedR2Register R2 in RegistryMark deleted
6create_shared_kvresources.sharedKVCreate KV namespace: \{platformId\}-\{stackId\}-kv-\{env\}. Idempotent.Delete namespace
7register_shared_kvresources.sharedKVRegister KV in RegistryMark deleted
8deploy_stack_workerresources.workerDeploy Worker: \{platformId\}-\{stackId\}-orchestrator-\{env\}. Binds STACK_DB, STACK_STORAGE, STACK_KV as available.Delete Worker
9activate_featuresAlwaysFor each feature in features[] (in order): run ACTIVATE_FEATURE sub-job, injecting shared bindings: STACK_DB → sharedD1.cfId, STACK_STORAGE → sharedR2.bucketName, STACK_KV → sharedKv.namespaceId, STACK_ID → stackInstanceId. Required features: failure aborts entire PROVISION_STACK job. Optional features: failure logs warning, continues.Per-feature rollback
10register_dns_hostnamesNon-fatalRegister CF4SaaS custom hostnames for each app/service in the stack: \{name\}.\{type\}.\{stackId\}.\{platformId\}.nno.app. Staging: \{name\}.\{type\}.stg.\{stackId\}.\{platformId\}.nno.app. Record each hostname in Registry dns_records table. Failure does not abort the job.Delete CF4SaaS hostnames
11register_stack_instanceNon-fatalPATCH /platforms/\{platformId\}/stacks/\{stackInstanceId\} → status: active, sharedResources: { d1Id, r2Name, kvId, workerName }Mark FAILED
12trigger_shell_rebuildNon-fatalPOST to CLI Service with stackId context. Failure does not abort the job.

Shared resource bindings injected into each feature Worker:

// Bindings set on each feature Worker in the stack (at activation time)
{
  STACK_DB:      sharedD1?.cfId ?? undefined,
  STACK_STORAGE: sharedR2?.bucketName ?? undefined,
  STACK_KV:      sharedKv?.namespaceId ?? undefined,
  STACK_ID:      stackInstanceId,
}

Required vs optional feature failure:

  • required: true → any ACTIVATE_FEATURE sub-job failure transitions PROVISION_STACK to FAILED and triggers rollback of all completed steps
  • required: false → sub-job failure is logged as a warning; the step is marked SKIPPED; the stack instance status becomes DEGRADED instead of ACTIVE

Stack naming convention for shared resources:

D1:     {platformId}-{stackId}-db-{env}              e.g. k3m9p2xw7q-saas-starter-db-prod
R2:     {platformId}-{stackId}-storage-{env}         e.g. k3m9p2xw7q-saas-starter-storage-prod
KV:     {platformId}-{stackId}-kv-{env}              e.g. k3m9p2xw7q-saas-starter-kv-prod
Worker: {platformId}-{stackId}-orchestrator-{env}    e.g. k3m9p2xw7q-saas-starter-orchestrator-prod

See stacks.md for the full Stack architecture including shared resource prefix conventions.

2.3b Custom Domain Provisioning

When a client maps a custom domain to a platform resource, the provisioning service handles CF4SaaS registration:

Client submits custom domain (e.g. app.acmecorp.com) for dashboard.app.x7y8z9w0q1.a1b2c3d4e5.nno.app


Registry: create dns_records row (status: pending_validation)


Provisioning: POST CF4SaaS custom hostname API
  → CF returns validation TXT record details


Registry: update dns_records row with validation details
Client: add CNAME + TXT records in their DNS registrar


Provisioning: poll CF4SaaS until SSL status = active (or timeout)
  → CF validates ownership and provisions TLS certificate automatically


Registry: update dns_records row (status: active)

Custom hostname records are stored in the Registry dns_records table. See dns-naming.md for the full CF4SaaS flow and registry.md for the dns_records schema.

2.4 DEPROVISION_PLATFORM

Full teardown of a platform. Requires explicit confirmation — irreversible.

Steps: Reverse of BOOTSTRAP_PLATFORM plus all activated feature resources, in reverse dependency order.


3. State Machine [Phase 1]

Every provision_job follows this state machine:

                     ┌──────────┐
                     │  PENDING │  ← job created in Registry
                     └────┬─────┘
                          │ worker picks up job
                     ┌────▼─────┐
                     │ RUNNING  │  ← steps executing
                     └────┬─────┘
              ┌───────────┼───────────┐
              │           │           │
         ┌────▼────┐  ┌───▼───┐  ┌───▼──────┐
         │COMPLETED│  │FAILED │  │ TIMED_OUT│
         └─────────┘  └───┬───┘  └────┬─────┘
                          │            │
                    ┌──── ▼────────────▼──────┐
                    │  retry_count < max_retries│
                    └────┬──────────────────────┘
                         │ yes                no
                    ┌────▼──────┐       ┌────────────┐
                    │  PENDING  │       │ ROLLING_BACK│
                    │(re-queued)│       └─────┬───────┘
                    └───────────┘             │
                                        ┌─────▼──────┐
                                        │ ROLLED_BACK│
                                        └────────────┘

Step Tracking

Each completed step is recorded in provision_jobs.steps as a JSON array. This enables precise rollback — only steps that completed successfully need to be reversed:

// provision_jobs.steps (after partial completion)
[
  {
    "step": 1,
    "action": "create_d1",
    "status": "completed",
    "output": { "cf_id": "fa098e4d-...", "resource_id": "res_abc123" },
    "completedAt": 1740000001000
  },
  {
    "step": 2,
    "action": "deploy_worker",
    "status": "completed",
    "output": {
      "script_name": "k3m9p2xw7q-r8n4t6y1z5-auth-prod",
      "resource_id": "res_def456"
    },
    "completedAt": 1740000008000
  },
  {
    "step": 3,
    "action": "set_secrets",
    "status": "failed",
    "error": "CF API 429: rate limited",
    "failedAt": 1740000012000
  }
]

4. Retry & Backoff [Phase 1]

Retry Strategy

async function withRetry<T>(
  fn: () => Promise<T>,
  job: ProvisionJob,
  step: number,
): Promise<T> {
  const maxRetries = 3;
  let attempt = 0;

  while (attempt <= maxRetries) {
    try {
      return await fn();
    } catch (err) {
      if (err instanceof RateLimitError) {
        // Honour Cloudflare's Retry-After header
        await sleep(err.retryAfterSeconds * 1000);
        attempt++;
        continue;
      }

      if (err instanceof CloudflareApiError && err.isTransient()) {
        // 500, 502, 503, 504 — exponential backoff
        const delay = Math.min(1000 * Math.pow(2, attempt), 30_000);
        await sleep(delay);
        attempt++;
        continue;
      }

      // Non-transient error (400, 409, etc.) — fail immediately
      throw err;
    }
  }

  throw new MaxRetriesExceededError(
    `Step ${step} failed after ${maxRetries} retries`,
  );
}

Transient vs Non-Transient Errors

CF API StatusTypeBehaviour
429TransientRetry after Retry-After header value
500 / 502 / 503 / 504TransientRetry with exponential backoff (1s, 2s, 4s)
400Non-transientFail immediately — request is malformed
403Non-transientFail immediately — permissions misconfigured
409Non-transient (idempotency hit)Resource already exists — treat as success

5. Idempotency [Phase 1]

Every create operation checks whether the resource already exists before calling the CF API. This makes jobs safe to re-run after partial failure:

async function ensureD1Exists(
  cf: CloudflareClient,
  name: string,
): Promise<{ cfId: string; wasCreated: boolean }> {
  // Check Registry first (fastest path)
  const existing = await registry.resources.lookupByCfName(name);
  if (existing?.status === "active") {
    return { cfId: existing.cfId, wasCreated: false };
  }

  // Check CF API (in case Registry is stale)
  const cfExisting = await cf.d1.findByName(name);
  if (cfExisting) {
    return { cfId: cfExisting.uuid, wasCreated: false };
  }

  // Create
  const created = await cf.d1.create({ name });
  return { cfId: created.uuid, wasCreated: true };
}

The 409 Conflict response from the CF API is treated as a success (not an error) for create operations.


6. Rollback [Phase 2]

Phase 2 design. See Provisioning Phase 2 Plan.

Phase 2 also introduces the typed CloudflareClient class (services/provisioning/src/cf-client/index.ts) and the rollback engine (reverse-iterate completed steps, call CF delete APIs). See docs/implementation/phase-2/provisioning.md for full design.


7. Job Worker (Queue Consumer) [Phase 1]

Phase 1 (current): The Cloudflare Queue consumer (executors/queue-consumer.ts) and PROVISION_QUEUE binding are already live. Jobs are enqueued on POST /provision/* and processed asynchronously by the queue consumer.

Phase 2 design (queue event wiring from IAM/Registry). See Provisioning Phase 2 Plan.

7.1 Dead Letter Queue (DLQ) [Phase 2]

When a provisioning job exceeds its retry budget (3 attempts by default), the Cloudflare Queue moves the message to the Dead Letter Queue for operator review and alerting.

PropertyValue
Queue namenno-k3m9p2xw7q-provisioning-dlq-\{env\}
KV namespaceNNO_PROVISIONING_DLQ_KV — stores DLQ message metadata for operator queries
Alert triggerDLQ depth > 0 triggers an operator alert (email/Slack)

Operator endpoints (Phase 2, mounted in services/provisioning):

GET  /api/v1/provision/dlq                  List DLQ messages (paginated)
GET  /api/v1/provision/dlq/:messageId       Get DLQ message details + failed steps
POST /api/v1/provision/dlq/:messageId/retry Re-enqueue a DLQ message for retry
POST /api/v1/provision/dlq/:messageId/dismiss Mark as resolved without retry

DLQ messages retain the original job payload and the steps array showing which step caused the final failure. Operators can inspect the error, fix the underlying issue (e.g., CF API permission), and retry without recreating the provisioning request.


8. Provisioning API [Phase 1]

The Provisioning Service exposes an internal API consumed by the NNO Gateway (not exposed to clients directly):

POST   /api/v1/provision/platform             Trigger BOOTSTRAP_PLATFORM
POST   /api/v1/provision/feature/activate     Trigger ACTIVATE_FEATURE
POST   /api/v1/provision/feature/deactivate   Trigger DEACTIVATE_FEATURE
POST   /api/v1/provision/stack/activate       Trigger PROVISION_STACK
POST   /api/v1/provision/stack/deactivate     Trigger DEACTIVATE_STACK
POST   /api/v1/provision/platform/deprovision Trigger DEPROVISION_PLATFORM (requires confirmation token)
GET    /api/v1/provision/jobs/:jobId          Get job status + step details
GET    /api/v1/provision/jobs                 List jobs for a platform (query: ?platformId=&limit=25&cursor=)

All routes are mounted at /api/v1/provision in services/provisioning/src/index.ts. The service is internal and accessed via the NNO Gateway only.

Pagination: GET /api/v1/provision/jobs uses cursor-based pagination consistent with the NNO Registry pagination standard. Response envelope: \{ "data": [...], "pagination": \{ "hasMore": bool, "nextCursor": string | null \} \}. Provisioning jobs are append-only and never deleted, so cursor and offset are semantically equivalent for this dataset — cursor is used for consistency across the platform.

POST /api/v1/provision/feature/activate request:

{
  "platformId": "k3m9p2xw7q",
  "entityId": "r8n4t6y1z5",
  "featureId": "analytics",
  "featureVersion": "1.2.0",
  "environment": "prod"
}

All POST /api/v1/provision/* responses:

202 Accepted
{ "jobId": "job_n3r8t5w2y6", "status": "PENDING" }

The job is enqueued and processed asynchronously by the Cloudflare Queue consumer. The HTTP response always returns PENDING; use GET /api/v1/provision/jobs/:jobId to poll for the actual job status and step results.

Schema note: The provisioning_jobs table has an environment TEXT column added by migrations/0002_add_environment.sql. This stores the target environment ('dev' | 'stg' | 'prod') for each job. Note: the provisioning service has its own provisioning_jobs table (in its own D1) which is separate from the Registry's provision_jobs table — they track different aspects of the same operation. The provisioning service uses uppercase type values (e.g., BOOTSTRAP_PLATFORM, ACTIVATE_FEATURE). The Registry uses lowercase operation values (e.g., provision_platform, activate_feature) in a different field.

GET /api/v1/provision/jobs/:jobId response:

{
  "id": "job_n3r8t5w2y6",
  "type": "ACTIVATE_FEATURE",
  "status": "RUNNING",
  "platformId": "k3m9p2xw7q",
  "entityId": "r8n4t6y1z5",
  "featureId": "analytics",
  "environment": "prod",
  "steps": [
    {
      "name": "create_d1",
      "status": "COMPLETED",
      "result": { "cfId": "fa098e4d-...", "message": "D1 database created" },
      "startedAt": 1740230460000,
      "completedAt": 1740230462000
    },
    {
      "name": "deploy_worker",
      "status": "RUNNING",
      "result": null,
      "startedAt": 1740230465000,
      "completedAt": null
    }
  ],
  "error": null,
  "createdAt": 1740230460000,
  "startedAt": 1740230461000,
  "completedAt": null
}

9. Observability [Phase 1]

All provisioning operations emit structured logs to Cloudflare Logpush:

// Every step logs at start and end
console.log(
  JSON.stringify({
    level: "info",
    event: "provision_step",
    jobId: job.id,
    step: step.number,
    action: step.action,
    status: "started" | "completed" | "failed",
    durationMs: elapsed,
    cfResource: cfName,
    error: err?.message ?? null,
  }),
);

Key metrics to alert on:

  • Job failure rate > 5% over 1 hour
  • Job duration > 120 seconds (P95)
  • DLQ depth > 0 (any job hitting DLQ)
  • CF API error rate > 1% (watch for permission or quota issues)
  • Workers daily deploy count approaching 200 (CF account limit)

10. Data Retention Policy [Phase 1]

This section documents how Cloudflare resources and Registry records are treated when a feature or stack is deactivated. All policies here reflect Phase 1 actual behaviour (grounded in the executor source code). Time-based hot/cold/permanent deletion tiers are not implemented in Phase 1.

Feature Deactivation (DEACTIVATE_FEATURE)

The executeDeactivateFeature executor (executors/deactivate-feature.ts) only deletes the Worker. The D1 database is never deleted, regardless of request parameters:

DEACTIVATE_FEATURE job
  ├─ Step 1: Mark feature → status: "deactivating" (Registry PATCH)
  ├─ Step 2: Delete feature Worker via cf.workers.delete()  ← only CF resource removed
  ├─ Step 3: Mark feature → status: "inactive" (Registry PATCH)
  └─ Step 4: Trigger shell rebuild (CLI Service)

deleteData flag (Phase 2): The POST /feature/deactivate route accepts deleteData: boolean in the request body, and the DeactivateFeatureSchema validates it. However, the DEACTIVATE_FEATURE executor does not act on this flag in Phase 1 — D1 data is always preserved. deleteData support for standalone features is a Phase 2 addition.

Post-deactivation state:

ResourceStatus after deactivation
Feature WorkerDeleted from Cloudflare
Feature D1 databasePreserved — still exists in Cloudflare account
D1 dataPreserved — accessible via CF Dashboard or direct D1 API
Registry feature_activation recordUpdated to status: "inactive"
Registry resource records (D1, Worker)Worker resource marked deleted; D1 resource retains its cfId

There is no automatic expiry or cleanup of preserved D1 databases in Phase 1. They persist until manually deleted via the CF Dashboard or a future deleteData: true implementation.

Stack Deactivation (DEACTIVATE_STACK)

The executeDeactivateStack executor (executors/deactivate-stack.ts) respects the deleteData flag for shared resources:

DEACTIVATE_STACK job
  ├─ Step 1: Resolve stack instance (Registry GET)
  ├─ Step 2: Mark stack → status: "deactivating" (Registry PATCH)
  ├─ Step 3: Enqueue DEACTIVATE_FEATURE sub-job for each feature activation
  ├─ Step 4: Delete shared CF resources — ONLY if deleteData: true
  │           ├─ cf.workers.delete(workerName)   if sharedResources.workerName
  │           ├─ cf.d1.delete(d1Id)              if sharedResources.d1Id
  │           ├─ cf.r2.delete(r2Name)            if sharedResources.r2Name
  │           └─ cf.kv.delete(kvId)              if sharedResources.kvId
  ├─ Step 5: Mark stack → status: "deactivated" (Registry PATCH)
  └─ Step 6: Trigger shell rebuild (CLI Service)

Post-deactivation state by deleteData flag:

ResourcedeleteData: false (default)deleteData: true
Per-feature WorkersDeleted (by DEACTIVATE_FEATURE sub-jobs)Deleted
Shared stack D1Preserved in CloudflareDeleted via cf.d1.delete()
Shared stack R2 bucketPreserved in CloudflareDeleted via cf.r2.delete()
Shared stack KV namespacePreserved in CloudflareDeleted via cf.kv.delete()
Stack orchestration WorkerPreserved in CloudflareDeleted via cf.workers.delete()
Registry stacks recordUpdated to status: "deactivated"Updated to status: "deactivated"

R2 cost note: Preserved R2 buckets accrue Cloudflare storage costs against the NNO account even after stack deactivation. Operators should monitor for deactivated stacks with large R2 buckets.

Re-activation with Preserved Data

Because D1 databases are preserved after deactivation, re-activating a feature or stack can reconnect to existing data:

  • Feature re-activation (ACTIVATE_FEATURE): The executor checks whether a D1 with the expected name already exists (cf.d1.findByName(dbName)) before creating a new one. If found, it reuses the existing database and its data.
  • Stack re-activation (PROVISION_STACK): Same idempotency check applies to shared D1, R2, and KV resources — existing resources are reused, not recreated.

Phase 2 Planned Additions

Phase 2 additions. See Provisioning Phase 2 Plan.


11. Wrangler Configuration

# services/provisioning/wrangler.toml

name = "nno-k3m9p2xw7q-provisioning"
main = "src/index.ts"
compatibility_date = "2024-09-13"
compatibility_flags = ["nodejs_compat"]

[triggers]
crons = ["*/15 * * * *"]   # ssl-poller: checks pending CF4SaaS SSL issuance

# Production (default — no --env flag)
[[d1_databases]]
binding = "DB"
database_name = "nno-k3m9p2xw7q-provisioning-db"
database_id = "db158316-1fa6-4fbf-a758-bfff40fb0e46"
migrations_dir = "migrations"

[[queues.producers]]
binding = "PROVISION_QUEUE"
queue = "nno-k3m9p2xw7q-provision-queue"

[[queues.consumers]]
queue = "nno-k3m9p2xw7q-provision-queue"
max_batch_size = 1
max_retries = 3
dead_letter_queue = "nno-k3m9p2xw7q-provision-dlq"

[[queues.producers]]
binding = "PROVISION_DLQ"
queue = "nno-k3m9p2xw7q-provision-dlq"

[[analytics_engine_datasets]]
binding = "NNO_METRICS"
dataset = "nno_metrics"

[[kv_namespaces]]
binding = "NNO_PLATFORM_STATUS_KV"
id = "63deeb49457946c6a68a49b23ea4fc5c"

[env.stg]
name = "nno-k3m9p2xw7q-provisioning-stg"
[[env.stg.d1_databases]]
binding = "DB"
database_name = "nno-k3m9p2xw7q-provisioning-db-stg"
database_id = "74b91da9-0e38-4373-bd74-e95ed67dd210"
migrations_dir = "migrations"
[[env.stg.queues.producers]]
binding = "PROVISION_QUEUE"
queue = "nno-k3m9p2xw7q-provision-queue-stg"
[[env.stg.queues.consumers]]
queue = "nno-k3m9p2xw7q-provision-queue-stg"
max_batch_size = 1
max_retries = 3
dead_letter_queue = "nno-k3m9p2xw7q-provision-dlq-stg"
[[env.stg.queues.producers]]
binding = "PROVISION_DLQ"
queue = "nno-k3m9p2xw7q-provision-dlq-stg"
[[env.stg.analytics_engine_datasets]]
binding = "NNO_METRICS"
dataset = "nno_metrics"
[[env.stg.kv_namespaces]]
binding = "NNO_PLATFORM_STATUS_KV"
id = "db49667ca1904af7ba128a02806e7c7e"

Bindings Summary

BindingTypePurpose
DBD1Provisioning job state and step tracking
PROVISION_QUEUEQueue producerEnqueues provisioning jobs for async execution
PROVISION_DLQQueue producerDead-letter queue for jobs that exhaust retries
NNO_METRICSAnalytics EngineStructured metrics and observability events
NNO_PLATFORM_STATUS_KVKV namespaceWritten by provisioning on state transitions; read by gateway for enforcement. Key format: platform:\{platformId\}:status. Values: "active" | "suspended" | "deprovisioned"

Secrets (per environment)

SecretDescription
CF_API_TOKENCloudflare API token for resource provisioning (see §1.1 for required permissions)
CF_ACCOUNT_IDCloudflare account ID
CF_ZONE_IDCloudflare Zone ID — required by the ADD_CUSTOM_DOMAIN executor for CF4SaaS custom hostname registration
NNO_REGISTRY_URLInternal URL of the NNO Registry service
NNO_INTERNAL_API_KEYShared secret for service-to-service calls to Registry
AUTH_API_KEYShared secret for inbound service-to-service auth
CORS_ORIGINSComma-separated allowed origins
NNO_AUTH_BUNDLE_URLURL to pre-built auth Worker JS bundle (CI artifact or R2 public URL) — used by UPGRADE_AUTH_WORKER

Status: Detailed design — PROVISION_STACK added 2026-02-28 Implementation target: services/provisioning/ Related: NNO Registry · System Architecture · Feature Package SDK · Stacks · Stack Registry

On this page