NNO Provisioning Service
Documentation for NNO Provisioning Service
Date: 2026-03-30
Status: Detailed Design
Parent: System Architecture
Service: services/provisioning
Overview
The NNO Provisioning Service is responsible for creating, configuring, and deleting Cloudflare infrastructure resources on behalf of client platforms. It is the only NNO service that calls the Cloudflare API directly.
NNO follows a lazy, on-demand provisioning model: platforms receive a minimal resource footprint on signup (auth Worker + registry record + billing customer), and additional Cloudflare resources (Pages projects, D1 databases, R2 buckets, KV namespaces) are created only when a feature that requires them is activated. This mirrors the model of Supabase or Firebase — a project exists instantly, and capabilities are enabled as needed.
Every provisioning operation is:
- Queue-driven —
POST /provision/*creates a job record (PENDING), enqueues it to the Cloudflare Queue, and returns202 Accepted. The queue consumer executes the job asynchronously, making real Cloudflare API calls. Callers pollGET /provision/jobs/:jobIdto track progress. - Tracked — recorded as a
provision_jobin the Provisioning D1 before any steps run - Idempotent — steps check whether the target resource already exists before calling the CF API; already-created resources are detected and skipped
- Transactional in intent (Phase 2) — failed jobs will trigger rollback of all completed steps
- Audited — every state change is written to structured logs
Phase 1 vs Phase 2: Phase 1 queue-based async execution is live — real Cloudflare API calls are made for all six job types. The detailed design in §1.2–§1.4 and §6 (Rollback) describe Phase 2 targets (typed CF client class, rollback traversal). Phase 2 will also add DLQ alerting and queue event wiring from IAM/Registry (§7).
┌─── Trigger paths ────────────────────────────────────────────────┐
│ │
│ 1. Registration (automatic) │
│ services/iam sign-up ──► platform.registered (CF Queue) │
│ │
│ 2. Feature activation (self-service) │
│ services/registry PATCH /features ──► feature.activating │
│ (CF Queue) │
│ │
│ 3. Operator override (manual) │
│ NNO Zero UI ──► services/gateway ──► POST /provision/* │
│ │
└──────────────────────────────┬────────────────────────────────────┘
│
▼
NNO Provisioning Service
(CF Queue consumer)
│
┌───────────────┼───────────────┐
▼ ▼ ▼
Cloudflare API NNO Registry services/billing
(CF Workers, (platform, (Stripe customer
D1, R2, KV, resources, creation)
Pages) feature status)0.5 Phase 1 Implementation Detail [Phase 1]
Implementation flow detail: Provisioning Phase 1 Plan.
Implemented Job Types
All six job types are wired and make real CF API calls:
| Job Type | Executor file | CF API calls made |
|---|---|---|
BOOTSTRAP_PLATFORM | executors/provision-platform.ts | D1 create, Worker deploy, secrets, migrations, Pages project + build |
ACTIVATE_FEATURE | executors/activate-feature.ts | D1 create (conditional), Worker deploy (conditional), secrets, migrations |
DEACTIVATE_FEATURE | executors/deactivate-feature.ts | Worker delete only — D1 is never deleted |
PROVISION_STACK | executors/provision-stack.ts | Shared resources + per-feature sub-jobs |
DEACTIVATE_STACK | executors/deactivate-stack.ts | Enqueues DEACTIVATE_FEATURE sub-jobs; deletes shared resources only if deleteData: true |
DEPROVISION_PLATFORM | executors/deprovision-platform.ts | Full teardown |
ONBOARD_PLATFORM (Implemented) | executors/onboard-platform.ts | No direct CF API calls — orchestrates Registry + Billing + enqueues BOOTSTRAP_PLATFORM |
UPGRADE_AUTH_WORKER | executors/upgrade-auth-worker.ts | Fetches latest auth bundle from NNO_AUTH_BUNDLE_URL, re-deploys the auth Worker (preserving D1 binding), refreshes CORS_ORIGINS secret |
CREATE_APP | executors/create-app.ts | Creates a CF Pages project and/or stub Worker for a new app/service within an existing workspace stack; optionally creates a D1 database; registers DNS hostnames and resources in Registry |
ADD_CUSTOM_DOMAIN | executors/add-custom-domain.ts | Adds a CF4SaaS custom hostname to an existing DNS-registered resource; requires CF_ZONE_ID; registers the custom domain record in Registry |
ONBOARD_PLATFORM (Implemented) — outer onboarding job that orchestrates pre-provisioning steps before triggering platform resource creation. Steps: (1) create platform + entity records in Registry, (2) create Stripe customer + subscription in Billing, (3) enqueue BOOTSTRAP_PLATFORM job for CF resource creation, (4) update onboarding_sessions checklist as each step completes. Triggered by the self-serve onboarding endpoint in Registry. Wraps but does not replace BOOTSTRAP_PLATFORM.
Phase 1 implementation detail: see Provisioning Phase 1 Plan.
0. Provisioning Triggers [Phase 1]
Provisioning is initiated by three paths, all converging on the same job queue:
0.1 Registration Trigger (Automatic)
When a new client registers, services/iam emits a platform.registered event to a Cloudflare Queue. The provisioning consumer picks this up and runs BOOTSTRAP_PLATFORM — creating only the minimum viable resources needed before the client can log in:
Client sign-up
│
▼
services/iam
├── Create user record in IAM D1
├── Create platform record in Registry
└── Enqueue: platform.registered → CF Queue
│
▼
services/provisioning (queue consumer)
├── Deploy auth Worker [always]
├── Create auth D1 + migrate [always]
└── Create Stripe customer [always]No CF Pages project, no feature D1 databases, no R2 buckets — those are created on-demand when features are activated.
0.2 Feature Activation Trigger (Self-Service)
When a client activates a feature from their console, services/registry updates the feature record to activating and emits a feature.activating event. The provisioning consumer reads the feature's FeatureManifest.resources declaration and creates only the resources that feature requires:
Client activates "Analytics" feature
│
▼
services/registry
├── Quota check via services/billing ← plan gate
├── PATCH feature_activation status → 'activating'
└── Enqueue: feature.activating → CF Queue
│
▼
services/provisioning (queue consumer)
├── Read FeatureManifest.resources
│ { d1: true, worker: true, minimumPlan: 'growth' }
├── Create analytics D1 + run migrations
├── Deploy analytics Worker
└── PATCH feature_activation status → 'active'See Section 2.2 for the full ACTIVATE_FEATURE step table.
0.3 Operator Override (Manual)
The NNO operator can provision, deprovision, or force-activate resources for any platform via the operator portal or directly via the provisioning API. Like all other trigger paths, this creates a provision_job record (PENDING) and enqueues it to the Cloudflare Queue — the same queue consumer (handleQueueBatch) executes it asynchronously. The HTTP endpoint returns 202 Accepted immediately; callers poll GET /provision/jobs/:jobId for progress.
The operator plane is a management/override layer — not the primary provisioning path for normal client onboarding.
0.4 Stack Activation Trigger (Self-Service)
When a platform admin activates a stack template (or creates a platform-local stack) from the portal or via CLI, services/registry validates the StackDefinition and emits a stack.activating event:
Platform admin activates "[email protected]" from portal UI
│
▼
services/registry
├── Fetch StackDefinition from services/stack-registry (for template-based stacks)
│ OR use inline definition (for platform-local stacks)
├── Validate all featureIds exist in feature catalogue
├── Quota check via services/billing (minimumPlan gate)
├── Create stacks record (status: PENDING)
└── Enqueue: stack.activating → CF Queue
│
▼
services/provisioning (queue consumer)
├── Fetch StackDefinition from stack-registry (step 1)
├── Provision shared resources (D1, R2, KV as declared)
└── Activate each feature with shared resource bindingsReturns 202 \{ stackInstanceId, jobId \}. The job is tracked as PROVISION_STACK type.
1. Cloudflare API Integration [Phase 1]
1.1 API Credentials
In Phase 1 (NNO full control), the Provisioning Service uses a single set of credentials stored as Wrangler secrets:
| Secret | Description |
|---|---|
CF_API_TOKEN | Cloudflare API token with scoped permissions (see below) |
CF_ACCOUNT_ID | Cloudflare account ID |
CF_ZONE_ID | Cloudflare Zone ID — required by the ADD_CUSTOM_DOMAIN executor for CF4SaaS custom hostname registration |
Required token permissions (scoped, not global):
| Permission | Scope | Why |
|---|---|---|
Workers Scripts:Edit | Account | Deploy and delete Workers |
Workers Routes:Edit | Zone | Manage Worker routes |
Cloudflare Pages:Edit | Account | Create and manage Pages projects |
D1:Edit | Account | Create, delete D1 databases |
Workers R2 Storage:Edit | Account | Create, delete R2 buckets |
Workers KV Storage:Edit | Account | Create, delete KV namespaces |
Queues:Edit | Account | Create, delete Queues |
1.2–1.4 CF API Client, Resource Operations, Rate Limits — Phase 2 Design
Phase 2 design. See Provisioning Phase 2 Plan.
2. Provisioning Operations [Phase 1]
2.1 BOOTSTRAP_PLATFORM (formerly PROVISION_PLATFORM)
Triggered automatically on client registration via the platform.registered queue event (§0.1), or manually by the operator via the Zero UI. Creates the minimum viable resource set only — additional resources are provisioned lazily when features are activated (§2.2).
Input:
{
platformId: string;
planTier: string; // 'starter' | 'growth' | 'scale'
billingEmail: string; // billing contact email
defaultEntityId?: string; // optional pre-generated tenant ID for the default tenant
environment?: 'dev' | 'stg' | 'prod'; // default: 'prod'
}Steps (in order) — minimal bootstrap only:
[Updated for DNS architecture] Resource names now use stack-id (
default) instead of entity-id. DNS hostnames are registered via CF4SaaS at provisioning time.
| Step | Action | Rollback action |
|---|---|---|
| 1 | Create \{pid\}-default-auth-db D1 (staging: \{pid\}-default-auth-db-stg) | Delete D1 |
| 2 | Deploy \{pid\}-default-auth Worker (staging: \{pid\}-default-auth-stg) | Delete Worker |
| 3 | Set secrets on auth Worker | (no rollback — secrets deleted with Worker) |
| 4 | Run auth D1 migrations | (no rollback — migration is idempotent) |
| 5 | Create Stripe customer record | Cancel Stripe customer |
| 6 | Register platform + auth resources in Registry | Mark as deleted |
| 7 | Register DNS hostname auth.svc.default.\{pid\}.nno.app via CF4SaaS | Delete CF4SaaS custom hostname |
Not included in bootstrap: CF Pages project, feature D1 databases, R2 buckets, KV namespaces — these are created on-demand when the relevant feature is activated (see §2.2). A client's platform is considered "live" (able to log in) as soon as the auth Worker and D1 are up.
2.2 ACTIVATE_FEATURE
Triggered when a client activates a feature from their console (via the feature.activating queue event, §0.2) or when the operator force-activates a feature from the Zero UI. Creates only the Cloudflare resources declared in the feature's FeatureManifest.resources block — nothing more.
Input:
{
platformId: string;
entityId: string; // required — ID of the entity (tenant/sub-tenant) activating the feature
featureId: string; // e.g. 'analytics'
featureVersion: string; // e.g. '1.2.0'
environment: "dev" | "stg" | "prod";
resources: FeatureResourceRequirements; // read from FeatureManifest.resources at activation time
}FeatureResourceRequirements (declared in each feature's FeatureManifest):
interface FeatureResourceRequirements {
worker?: boolean; // Deploy a CF Worker for this feature's backend API
pages?: boolean; // Create a CF Pages project (SPA portal)
d1?: boolean; // Create a D1 SQLite database
r2?: boolean; // Create an R2 object storage bucket
kv?: boolean; // Create a KV namespace
queue?: boolean; // Create a Cloudflare Queue
minimumPlan?: "starter" | "growth" | "scale"; // Plan gate — checked before job is queued
}Steps (conditioned on resources.*):
| Step | Condition | Action | Rollback |
|---|---|---|---|
| 1 | Always | Quota check via billing service | — |
| 2 | Always | Mark feature_activation as 'activating' | Mark as 'failed' |
| 3 | resources.d1 | Create feature D1 database | Delete D1 |
| 4 | resources.d1 | Register D1 in Registry | Mark deleted |
| 5 | resources.d1 | Run D1 migrations | (idempotent) |
| 6 | resources.r2 | Create R2 bucket | Delete bucket |
| 7 | resources.r2 | Register R2 in Registry | Mark deleted |
| 8 | resources.kv | Create KV namespace | Delete namespace |
| 9 | resources.kv | Register KV in Registry | Mark deleted |
| 10 | resources.worker | Deploy feature Worker | Delete Worker |
| 11 | resources.worker | Set secrets on Worker | (with Worker) |
| 12 | resources.worker | Register Worker in Registry | Mark deleted |
| 13 | resources.pages | Create CF Pages project | Delete Pages project |
| 14 | resources.pages | Connect to platform GitHub repo | Disconnect |
| 15 | resources.pages | Trigger initial Pages build | (build can fail independently) |
| 16 | Always | Update feature_activation status to 'active' | Mark as 'failed' |
| 17 | Always | Trigger platform shell rebuild via CLI Service | (rebuild can fail independently) |
Plan enforcement: Before the job is enqueued, services/registry calls GET /api/v1/billing/quota/check?platformId=&resource= to verify the platform's plan permits the requested resources. A 402 response halts the activation with an upgrade prompt — no job is created.
2.3 DEACTIVATE_FEATURE
Gracefully removes a feature's resources. Data is preserved (D1 not deleted) by default.
Input:
{
platformId: string;
entityId: string; // required — ID of the entity whose feature resources are being removed
featureId: string;
environment: string;
deleteData?: boolean; // default: false — preserve D1 data
}Steps:
| Step | Action |
|---|---|
| 1 | Mark feature_activation as 'deactivating' |
| 2 | Delete feature Worker (stops serving traffic) |
| 3 | Mark Worker resource as deleted in Registry |
| 4 | If deleteData: true — delete D1 database |
| 5 | Mark D1 resource as deleted in Registry (or 'archived' if data preserved) |
| 6 | Mark feature_activation as 'inactive' |
| 7 | Trigger platform shell rebuild via CLI Service |
2.3a PROVISION_STACK
Triggered when a platform admin activates a stack template (via the stack.activating queue event, §0.4). Provisions shared CF resources first, then activates each feature in the stack with shared resource bindings injected.
Input:
{
platformId: string;
stackInstanceId: string; // pre-created stacks record
stackDefinition: StackDefinition; // resolved from stack-registry or inline (local)
environment: "dev" | "stg" | "prod";
}Steps (in order):
| Step | Name | Condition | Action | Rollback |
|---|---|---|---|---|
| 1 | resolve_stack_definition | Always | Fetch StackDefinition from stack-registry (template) or use inline (local). Validate all featureIds exist. | — |
| 2 | create_shared_d1 | resources.sharedD1 | Create D1: \{platformId\}-\{stackId\}-db-\{env\}. Idempotent. | Delete D1 |
| 3 | register_shared_d1 | resources.sharedD1 | Register D1 in Registry as resource_type: 'd1', service_name: '\{stackId\}' | Mark deleted |
| 4 | create_shared_r2 | resources.sharedR2 | Create R2 bucket: \{platformId\}-\{stackId\}-storage-\{env\}. Idempotent. | Delete bucket |
| 5 | register_shared_r2 | resources.sharedR2 | Register R2 in Registry | Mark deleted |
| 6 | create_shared_kv | resources.sharedKV | Create KV namespace: \{platformId\}-\{stackId\}-kv-\{env\}. Idempotent. | Delete namespace |
| 7 | register_shared_kv | resources.sharedKV | Register KV in Registry | Mark deleted |
| 8 | deploy_stack_worker | resources.worker | Deploy Worker: \{platformId\}-\{stackId\}-orchestrator-\{env\}. Binds STACK_DB, STACK_STORAGE, STACK_KV as available. | Delete Worker |
| 9 | activate_features | Always | For each feature in features[] (in order): run ACTIVATE_FEATURE sub-job, injecting shared bindings: STACK_DB → sharedD1.cfId, STACK_STORAGE → sharedR2.bucketName, STACK_KV → sharedKv.namespaceId, STACK_ID → stackInstanceId. Required features: failure aborts entire PROVISION_STACK job. Optional features: failure logs warning, continues. | Per-feature rollback |
| 10 | register_dns_hostnames | Non-fatal | Register CF4SaaS custom hostnames for each app/service in the stack: \{name\}.\{type\}.\{stackId\}.\{platformId\}.nno.app. Staging: \{name\}.\{type\}.stg.\{stackId\}.\{platformId\}.nno.app. Record each hostname in Registry dns_records table. Failure does not abort the job. | Delete CF4SaaS hostnames |
| 11 | register_stack_instance | Non-fatal | PATCH /platforms/\{platformId\}/stacks/\{stackInstanceId\} → status: active, sharedResources: { d1Id, r2Name, kvId, workerName } | Mark FAILED |
| 12 | trigger_shell_rebuild | Non-fatal | POST to CLI Service with stackId context. Failure does not abort the job. | — |
Shared resource bindings injected into each feature Worker:
// Bindings set on each feature Worker in the stack (at activation time)
{
STACK_DB: sharedD1?.cfId ?? undefined,
STACK_STORAGE: sharedR2?.bucketName ?? undefined,
STACK_KV: sharedKv?.namespaceId ?? undefined,
STACK_ID: stackInstanceId,
}Required vs optional feature failure:
required: true→ anyACTIVATE_FEATUREsub-job failure transitionsPROVISION_STACKtoFAILEDand triggers rollback of all completed stepsrequired: false→ sub-job failure is logged as a warning; the step is markedSKIPPED; the stack instance status becomesDEGRADEDinstead ofACTIVE
Stack naming convention for shared resources:
D1: {platformId}-{stackId}-db-{env} e.g. k3m9p2xw7q-saas-starter-db-prod
R2: {platformId}-{stackId}-storage-{env} e.g. k3m9p2xw7q-saas-starter-storage-prod
KV: {platformId}-{stackId}-kv-{env} e.g. k3m9p2xw7q-saas-starter-kv-prod
Worker: {platformId}-{stackId}-orchestrator-{env} e.g. k3m9p2xw7q-saas-starter-orchestrator-prodSee stacks.md for the full Stack architecture including shared resource prefix conventions.
2.3b Custom Domain Provisioning
When a client maps a custom domain to a platform resource, the provisioning service handles CF4SaaS registration:
Client submits custom domain (e.g. app.acmecorp.com) for dashboard.app.x7y8z9w0q1.a1b2c3d4e5.nno.app
│
▼
Registry: create dns_records row (status: pending_validation)
│
▼
Provisioning: POST CF4SaaS custom hostname API
→ CF returns validation TXT record details
│
▼
Registry: update dns_records row with validation details
Client: add CNAME + TXT records in their DNS registrar
│
▼
Provisioning: poll CF4SaaS until SSL status = active (or timeout)
→ CF validates ownership and provisions TLS certificate automatically
│
▼
Registry: update dns_records row (status: active)Custom hostname records are stored in the Registry dns_records table. See dns-naming.md for the full CF4SaaS flow and registry.md for the dns_records schema.
2.4 DEPROVISION_PLATFORM
Full teardown of a platform. Requires explicit confirmation — irreversible.
Steps: Reverse of BOOTSTRAP_PLATFORM plus all activated feature resources, in reverse dependency order.
3. State Machine [Phase 1]
Every provision_job follows this state machine:
┌──────────┐
│ PENDING │ ← job created in Registry
└────┬─────┘
│ worker picks up job
┌────▼─────┐
│ RUNNING │ ← steps executing
└────┬─────┘
┌───────────┼───────────┐
│ │ │
┌────▼────┐ ┌───▼───┐ ┌───▼──────┐
│COMPLETED│ │FAILED │ │ TIMED_OUT│
└─────────┘ └───┬───┘ └────┬─────┘
│ │
┌──── ▼────────────▼──────┐
│ retry_count < max_retries│
└────┬──────────────────────┘
│ yes no
┌────▼──────┐ ┌────────────┐
│ PENDING │ │ ROLLING_BACK│
│(re-queued)│ └─────┬───────┘
└───────────┘ │
┌─────▼──────┐
│ ROLLED_BACK│
└────────────┘Step Tracking
Each completed step is recorded in provision_jobs.steps as a JSON array. This enables precise rollback — only steps that completed successfully need to be reversed:
// provision_jobs.steps (after partial completion)
[
{
"step": 1,
"action": "create_d1",
"status": "completed",
"output": { "cf_id": "fa098e4d-...", "resource_id": "res_abc123" },
"completedAt": 1740000001000
},
{
"step": 2,
"action": "deploy_worker",
"status": "completed",
"output": {
"script_name": "k3m9p2xw7q-r8n4t6y1z5-auth-prod",
"resource_id": "res_def456"
},
"completedAt": 1740000008000
},
{
"step": 3,
"action": "set_secrets",
"status": "failed",
"error": "CF API 429: rate limited",
"failedAt": 1740000012000
}
]4. Retry & Backoff [Phase 1]
Retry Strategy
async function withRetry<T>(
fn: () => Promise<T>,
job: ProvisionJob,
step: number,
): Promise<T> {
const maxRetries = 3;
let attempt = 0;
while (attempt <= maxRetries) {
try {
return await fn();
} catch (err) {
if (err instanceof RateLimitError) {
// Honour Cloudflare's Retry-After header
await sleep(err.retryAfterSeconds * 1000);
attempt++;
continue;
}
if (err instanceof CloudflareApiError && err.isTransient()) {
// 500, 502, 503, 504 — exponential backoff
const delay = Math.min(1000 * Math.pow(2, attempt), 30_000);
await sleep(delay);
attempt++;
continue;
}
// Non-transient error (400, 409, etc.) — fail immediately
throw err;
}
}
throw new MaxRetriesExceededError(
`Step ${step} failed after ${maxRetries} retries`,
);
}Transient vs Non-Transient Errors
| CF API Status | Type | Behaviour |
|---|---|---|
| 429 | Transient | Retry after Retry-After header value |
| 500 / 502 / 503 / 504 | Transient | Retry with exponential backoff (1s, 2s, 4s) |
| 400 | Non-transient | Fail immediately — request is malformed |
| 403 | Non-transient | Fail immediately — permissions misconfigured |
| 409 | Non-transient (idempotency hit) | Resource already exists — treat as success |
5. Idempotency [Phase 1]
Every create operation checks whether the resource already exists before calling the CF API. This makes jobs safe to re-run after partial failure:
async function ensureD1Exists(
cf: CloudflareClient,
name: string,
): Promise<{ cfId: string; wasCreated: boolean }> {
// Check Registry first (fastest path)
const existing = await registry.resources.lookupByCfName(name);
if (existing?.status === "active") {
return { cfId: existing.cfId, wasCreated: false };
}
// Check CF API (in case Registry is stale)
const cfExisting = await cf.d1.findByName(name);
if (cfExisting) {
return { cfId: cfExisting.uuid, wasCreated: false };
}
// Create
const created = await cf.d1.create({ name });
return { cfId: created.uuid, wasCreated: true };
}The 409 Conflict response from the CF API is treated as a success (not an error) for create operations.
6. Rollback [Phase 2]
Phase 2 design. See Provisioning Phase 2 Plan.
Phase 2 also introduces the typed CloudflareClient class (services/provisioning/src/cf-client/index.ts) and the rollback engine (reverse-iterate completed steps, call CF delete APIs). See docs/implementation/phase-2/provisioning.md for full design.
7. Job Worker (Queue Consumer) [Phase 1]
Phase 1 (current): The Cloudflare Queue consumer (
executors/queue-consumer.ts) andPROVISION_QUEUEbinding are already live. Jobs are enqueued onPOST /provision/*and processed asynchronously by the queue consumer.
Phase 2 design (queue event wiring from IAM/Registry). See Provisioning Phase 2 Plan.
7.1 Dead Letter Queue (DLQ) [Phase 2]
When a provisioning job exceeds its retry budget (3 attempts by default), the Cloudflare Queue moves the message to the Dead Letter Queue for operator review and alerting.
| Property | Value |
|---|---|
| Queue name | nno-k3m9p2xw7q-provisioning-dlq-\{env\} |
| KV namespace | NNO_PROVISIONING_DLQ_KV — stores DLQ message metadata for operator queries |
| Alert trigger | DLQ depth > 0 triggers an operator alert (email/Slack) |
Operator endpoints (Phase 2, mounted in services/provisioning):
GET /api/v1/provision/dlq List DLQ messages (paginated)
GET /api/v1/provision/dlq/:messageId Get DLQ message details + failed steps
POST /api/v1/provision/dlq/:messageId/retry Re-enqueue a DLQ message for retry
POST /api/v1/provision/dlq/:messageId/dismiss Mark as resolved without retryDLQ messages retain the original job payload and the steps array showing which step caused the final failure. Operators can inspect the error, fix the underlying issue (e.g., CF API permission), and retry without recreating the provisioning request.
8. Provisioning API [Phase 1]
The Provisioning Service exposes an internal API consumed by the NNO Gateway (not exposed to clients directly):
POST /api/v1/provision/platform Trigger BOOTSTRAP_PLATFORM
POST /api/v1/provision/feature/activate Trigger ACTIVATE_FEATURE
POST /api/v1/provision/feature/deactivate Trigger DEACTIVATE_FEATURE
POST /api/v1/provision/stack/activate Trigger PROVISION_STACK
POST /api/v1/provision/stack/deactivate Trigger DEACTIVATE_STACK
POST /api/v1/provision/platform/deprovision Trigger DEPROVISION_PLATFORM (requires confirmation token)
GET /api/v1/provision/jobs/:jobId Get job status + step details
GET /api/v1/provision/jobs List jobs for a platform (query: ?platformId=&limit=25&cursor=)All routes are mounted at
/api/v1/provisioninservices/provisioning/src/index.ts. The service is internal and accessed via the NNO Gateway only.Pagination:
GET /api/v1/provision/jobsuses cursor-based pagination consistent with the NNO Registry pagination standard. Response envelope:\{ "data": [...], "pagination": \{ "hasMore": bool, "nextCursor": string | null \} \}. Provisioning jobs are append-only and never deleted, so cursor and offset are semantically equivalent for this dataset — cursor is used for consistency across the platform.
POST /api/v1/provision/feature/activate request:
{
"platformId": "k3m9p2xw7q",
"entityId": "r8n4t6y1z5",
"featureId": "analytics",
"featureVersion": "1.2.0",
"environment": "prod"
}All POST /api/v1/provision/* responses:
202 Accepted
{ "jobId": "job_n3r8t5w2y6", "status": "PENDING" }The job is enqueued and processed asynchronously by the Cloudflare Queue consumer. The HTTP response always returns PENDING; use GET /api/v1/provision/jobs/:jobId to poll for the actual job status and step results.
Schema note: The
provisioning_jobstable has anenvironment TEXTcolumn added bymigrations/0002_add_environment.sql. This stores the target environment ('dev'|'stg'|'prod') for each job. Note: the provisioning service has its ownprovisioning_jobstable (in its own D1) which is separate from the Registry'sprovision_jobstable — they track different aspects of the same operation. The provisioning service uses uppercasetypevalues (e.g.,BOOTSTRAP_PLATFORM,ACTIVATE_FEATURE). The Registry uses lowercaseoperationvalues (e.g.,provision_platform,activate_feature) in a different field.
GET /api/v1/provision/jobs/:jobId response:
{
"id": "job_n3r8t5w2y6",
"type": "ACTIVATE_FEATURE",
"status": "RUNNING",
"platformId": "k3m9p2xw7q",
"entityId": "r8n4t6y1z5",
"featureId": "analytics",
"environment": "prod",
"steps": [
{
"name": "create_d1",
"status": "COMPLETED",
"result": { "cfId": "fa098e4d-...", "message": "D1 database created" },
"startedAt": 1740230460000,
"completedAt": 1740230462000
},
{
"name": "deploy_worker",
"status": "RUNNING",
"result": null,
"startedAt": 1740230465000,
"completedAt": null
}
],
"error": null,
"createdAt": 1740230460000,
"startedAt": 1740230461000,
"completedAt": null
}9. Observability [Phase 1]
All provisioning operations emit structured logs to Cloudflare Logpush:
// Every step logs at start and end
console.log(
JSON.stringify({
level: "info",
event: "provision_step",
jobId: job.id,
step: step.number,
action: step.action,
status: "started" | "completed" | "failed",
durationMs: elapsed,
cfResource: cfName,
error: err?.message ?? null,
}),
);Key metrics to alert on:
- Job failure rate > 5% over 1 hour
- Job duration > 120 seconds (P95)
- DLQ depth > 0 (any job hitting DLQ)
- CF API error rate > 1% (watch for permission or quota issues)
- Workers daily deploy count approaching 200 (CF account limit)
10. Data Retention Policy [Phase 1]
This section documents how Cloudflare resources and Registry records are treated when a feature or stack is deactivated. All policies here reflect Phase 1 actual behaviour (grounded in the executor source code). Time-based hot/cold/permanent deletion tiers are not implemented in Phase 1.
Feature Deactivation (DEACTIVATE_FEATURE)
The executeDeactivateFeature executor (executors/deactivate-feature.ts) only deletes the Worker. The D1 database is never deleted, regardless of request parameters:
DEACTIVATE_FEATURE job
├─ Step 1: Mark feature → status: "deactivating" (Registry PATCH)
├─ Step 2: Delete feature Worker via cf.workers.delete() ← only CF resource removed
├─ Step 3: Mark feature → status: "inactive" (Registry PATCH)
└─ Step 4: Trigger shell rebuild (CLI Service)
deleteDataflag (Phase 2): ThePOST /feature/deactivateroute acceptsdeleteData: booleanin the request body, and theDeactivateFeatureSchemavalidates it. However, theDEACTIVATE_FEATUREexecutor does not act on this flag in Phase 1 — D1 data is always preserved.deleteDatasupport for standalone features is a Phase 2 addition.
Post-deactivation state:
| Resource | Status after deactivation |
|---|---|
| Feature Worker | Deleted from Cloudflare |
| Feature D1 database | Preserved — still exists in Cloudflare account |
| D1 data | Preserved — accessible via CF Dashboard or direct D1 API |
Registry feature_activation record | Updated to status: "inactive" |
Registry resource records (D1, Worker) | Worker resource marked deleted; D1 resource retains its cfId |
There is no automatic expiry or cleanup of preserved D1 databases in Phase 1. They persist until manually deleted via the CF Dashboard or a future deleteData: true implementation.
Stack Deactivation (DEACTIVATE_STACK)
The executeDeactivateStack executor (executors/deactivate-stack.ts) respects the deleteData flag for shared resources:
DEACTIVATE_STACK job
├─ Step 1: Resolve stack instance (Registry GET)
├─ Step 2: Mark stack → status: "deactivating" (Registry PATCH)
├─ Step 3: Enqueue DEACTIVATE_FEATURE sub-job for each feature activation
├─ Step 4: Delete shared CF resources — ONLY if deleteData: true
│ ├─ cf.workers.delete(workerName) if sharedResources.workerName
│ ├─ cf.d1.delete(d1Id) if sharedResources.d1Id
│ ├─ cf.r2.delete(r2Name) if sharedResources.r2Name
│ └─ cf.kv.delete(kvId) if sharedResources.kvId
├─ Step 5: Mark stack → status: "deactivated" (Registry PATCH)
└─ Step 6: Trigger shell rebuild (CLI Service)Post-deactivation state by deleteData flag:
| Resource | deleteData: false (default) | deleteData: true |
|---|---|---|
| Per-feature Workers | Deleted (by DEACTIVATE_FEATURE sub-jobs) | Deleted |
| Shared stack D1 | Preserved in Cloudflare | Deleted via cf.d1.delete() |
| Shared stack R2 bucket | Preserved in Cloudflare | Deleted via cf.r2.delete() |
| Shared stack KV namespace | Preserved in Cloudflare | Deleted via cf.kv.delete() |
| Stack orchestration Worker | Preserved in Cloudflare | Deleted via cf.workers.delete() |
Registry stacks record | Updated to status: "deactivated" | Updated to status: "deactivated" |
R2 cost note: Preserved R2 buckets accrue Cloudflare storage costs against the NNO account even after stack deactivation. Operators should monitor for deactivated stacks with large R2 buckets.
Re-activation with Preserved Data
Because D1 databases are preserved after deactivation, re-activating a feature or stack can reconnect to existing data:
- Feature re-activation (
ACTIVATE_FEATURE): The executor checks whether a D1 with the expected name already exists (cf.d1.findByName(dbName)) before creating a new one. If found, it reuses the existing database and its data. - Stack re-activation (
PROVISION_STACK): Same idempotency check applies to shared D1, R2, and KV resources — existing resources are reused, not recreated.
Phase 2 Planned Additions
Phase 2 additions. See Provisioning Phase 2 Plan.
11. Wrangler Configuration
# services/provisioning/wrangler.toml
name = "nno-k3m9p2xw7q-provisioning"
main = "src/index.ts"
compatibility_date = "2024-09-13"
compatibility_flags = ["nodejs_compat"]
[triggers]
crons = ["*/15 * * * *"] # ssl-poller: checks pending CF4SaaS SSL issuance
# Production (default — no --env flag)
[[d1_databases]]
binding = "DB"
database_name = "nno-k3m9p2xw7q-provisioning-db"
database_id = "db158316-1fa6-4fbf-a758-bfff40fb0e46"
migrations_dir = "migrations"
[[queues.producers]]
binding = "PROVISION_QUEUE"
queue = "nno-k3m9p2xw7q-provision-queue"
[[queues.consumers]]
queue = "nno-k3m9p2xw7q-provision-queue"
max_batch_size = 1
max_retries = 3
dead_letter_queue = "nno-k3m9p2xw7q-provision-dlq"
[[queues.producers]]
binding = "PROVISION_DLQ"
queue = "nno-k3m9p2xw7q-provision-dlq"
[[analytics_engine_datasets]]
binding = "NNO_METRICS"
dataset = "nno_metrics"
[[kv_namespaces]]
binding = "NNO_PLATFORM_STATUS_KV"
id = "63deeb49457946c6a68a49b23ea4fc5c"
[env.stg]
name = "nno-k3m9p2xw7q-provisioning-stg"
[[env.stg.d1_databases]]
binding = "DB"
database_name = "nno-k3m9p2xw7q-provisioning-db-stg"
database_id = "74b91da9-0e38-4373-bd74-e95ed67dd210"
migrations_dir = "migrations"
[[env.stg.queues.producers]]
binding = "PROVISION_QUEUE"
queue = "nno-k3m9p2xw7q-provision-queue-stg"
[[env.stg.queues.consumers]]
queue = "nno-k3m9p2xw7q-provision-queue-stg"
max_batch_size = 1
max_retries = 3
dead_letter_queue = "nno-k3m9p2xw7q-provision-dlq-stg"
[[env.stg.queues.producers]]
binding = "PROVISION_DLQ"
queue = "nno-k3m9p2xw7q-provision-dlq-stg"
[[env.stg.analytics_engine_datasets]]
binding = "NNO_METRICS"
dataset = "nno_metrics"
[[env.stg.kv_namespaces]]
binding = "NNO_PLATFORM_STATUS_KV"
id = "db49667ca1904af7ba128a02806e7c7e"Bindings Summary
| Binding | Type | Purpose |
|---|---|---|
DB | D1 | Provisioning job state and step tracking |
PROVISION_QUEUE | Queue producer | Enqueues provisioning jobs for async execution |
PROVISION_DLQ | Queue producer | Dead-letter queue for jobs that exhaust retries |
NNO_METRICS | Analytics Engine | Structured metrics and observability events |
NNO_PLATFORM_STATUS_KV | KV namespace | Written by provisioning on state transitions; read by gateway for enforcement. Key format: platform:\{platformId\}:status. Values: "active" | "suspended" | "deprovisioned" |
Secrets (per environment)
| Secret | Description |
|---|---|
CF_API_TOKEN | Cloudflare API token for resource provisioning (see §1.1 for required permissions) |
CF_ACCOUNT_ID | Cloudflare account ID |
CF_ZONE_ID | Cloudflare Zone ID — required by the ADD_CUSTOM_DOMAIN executor for CF4SaaS custom hostname registration |
NNO_REGISTRY_URL | Internal URL of the NNO Registry service |
NNO_INTERNAL_API_KEY | Shared secret for service-to-service calls to Registry |
AUTH_API_KEY | Shared secret for inbound service-to-service auth |
CORS_ORIGINS | Comma-separated allowed origins |
NNO_AUTH_BUNDLE_URL | URL to pre-built auth Worker JS bundle (CI artifact or R2 public URL) — used by UPGRADE_AUTH_WORKER |
Status: Detailed design — PROVISION_STACK added 2026-02-28
Implementation target: services/provisioning/
Related: NNO Registry · System Architecture · Feature Package SDK · Stacks · Stack Registry