Add stabilization megaplan (6061c4)

This commit is contained in:
Marco Allegretti 2026-03-05 12:56:35 +01:00
parent 84bedd43dd
commit 7273532b30

View file

@ -0,0 +1,210 @@
# Likwid Stabilization Megaplan (6061c4)
Stabilize Likwid into a production-usable system by shipping a coherent admin modular-rule management UX (built-in + WASM), making WASM packages production-grade (including background jobs + SSRF-hardening), and converging authorization on roles/permissions.
## Decisions (confirmed)
- Git history: **linear `main` via squash merges**.
- Demo VPS: **tracks `main`**.
- Modular rules approach: **Hybrid** (built-in DB-backed settings + built-in modules + WASM plugin packages).
- Authz direction: **roles/permissions is authoritative**.
- Phase 1 default instance guidance: `instance_type=multi_community`, `platform_mode=admin_only`.
- `plugin_allow_background_jobs`: **full implementation** (end-to-end semantics, not just a stored flag).
- Registry SSRF hardening: **yes** (DNS-aware; do not rely only on host string / IP-literal checks).
- Community admin UX scope: **full** (plugin policy + WASM packages + built-in plugins in one coherent flow).
- Plan handling: **commit plan to `main`** once approved (plan-first discipline).
## Non-negotiable outcomes (Phase 1)
- Operational reliability + install/upgrade story.
- Admin modular rule management UX.
- WASM third-party plugin packages are production-grade.
- End-user UX consistency (avoid confusing partial configuration states).
## Current state (source-backed highlights)
- Backend exposes community endpoints:
- `GET/PUT /api/communities/{id}/plugin-policy`
- `GET/POST/PUT /api/communities/{id}/plugin-packages` (+ `install-registry`)
- WASM runtime exists (wasmtime; fuel + timeout + memory limits).
- WASM outbound HTTP is capability-gated and allowlisted.
- Registry allowlist currently:
- blocks `localhost`
- blocks IP-literal loopback/private/link-local/unspecified
- matches exact host or `*.suffix`
- **does not** do DNS resolution + post-resolution IP classification.
- Frontend currently does not call `plugin-policy` / `plugin-packages` (community UI covers built-in plugins only).
## Scope boundaries
- No architectural rewrite.
- Minimal, targeted changes per milestone.
- No dependency additions unless clearly required for an explicit acceptance criterion.
## Milestone discipline / verification
- Each milestone lands as a single squash-merge PR to `main`.
- Required verification per milestone:
- Backend: `cargo check` (and `cargo test` if stable)
- Frontend: `npm run build`
- Demo VPS: deploy `main` and run `./scripts/smoke-test.sh demo`
---
## Phase 0 — Baseline + operator invariants (gate)
### Deliverables
- Single authoritative operator workflow for:
- local dev start/stop
- demo deploy/update
- rollback
- smoke test
- Confirm demo systemd + compose wiring is consistent with docs and scripts.
### Acceptance criteria
- Demo VPS updated to latest `main` and `./scripts/smoke-test.sh demo` passes.
### Verification
- Run smoke test on VPS.
---
## Phase 1 — Admin modular rule management UX (hybrid)
### Goal
One coherent admin flow for:
- community built-in plugins (`plugins` + `community_plugins`)
- community WASM plugin packages (`plugin_packages` + `community_plugin_packages`)
- community plugin policy (`communities.settings` keys)
### Deliverables (frontend)
- Community admin UI adds a “Plugins / Rules” surface that includes:
- Built-in community plugins management (existing functionality retained).
- Plugin policy editor (read + update):
- trust policy
- install sources
- registry allowlist
- trusted publishers
- outbound HTTP toggle + allowlist
- background jobs toggle
- WASM package manager:
- list installed packages
- upload package
- install from registry URL
- activate/deactivate
- edit package settings (schema-driven when available; raw JSON fallback)
- Clear error messages for:
- policy forbids uploads / registry installs
- signature requirements fail
- registry allowlist blocks URL
### Deliverables (backend contract hardening)
- Ensure API error responses are stable and actionable for UI (status code + message consistency).
- Ensure event emission for key actions is consistent (`public_events`).
### Acceptance criteria
- As community admin/moderator:
- can view/update plugin policy
- can install a WASM package (upload + registry) when policy allows
- can activate/deactivate packages
- can edit package settings and receive server-side schema validation errors when invalid
### Verification
- Manual UI walkthrough covering:
- signed-only policy
- registry allowlist allow/deny
- outbound HTTP allowlist allow/deny
- background jobs on/off behavior (see Phase 2 definition)
---
## Phase 2 — WASM packages production-grade hardening
### 2.1 Background jobs: **full implementation**
#### Proposed semantics (must be implemented consistently)
- `plugin_allow_background_jobs=false` means:
- WASM plugins must **not** be invoked for cron hooks for that community.
- Any future “scheduled” behavior for WASM must be gated by the same setting.
- `plugin_allow_background_jobs=true` means:
- WASM plugins may receive cron hooks they declare in their manifest (e.g. `cron.minute`, `cron.hourly`, `cron.daily`, ...).
#### Implementation outline (expected code touchpoints)
- Resolve where WASM cron hooks are dispatched (currently cron loop exists in `backend/src/main.rs` and invokes `PluginManager::do_wasm_action_for_community`).
- Add a community-settings check (`communities.settings.plugin_allow_background_jobs`) in the WASM cron dispatch path.
- Ensure policy API default behavior is explicit and safe:
- confirm default is false (current parse default is false).
#### Acceptance criteria
- When `plugin_allow_background_jobs=false`, WASM cron hooks are not executed for that community.
- When `plugin_allow_background_jobs=true`, WASM cron hooks execute normally.
### 2.2 Registry install SSRF hardening (DNS-aware)
#### Goal
Registry install should not be able to reach internal/private addresses via DNS rebinding or private resolution.
#### Deliverables
- Extend registry allowlist enforcement to:
- resolve DNS for hostname-based registry URLs
- reject if any resolved IP is loopback/private/link-local/unspecified/unique-local (IPv6)
- Keep existing protections:
- reject `localhost`
- reject IP-literal private/loopback
- enforce allowlist patterns
#### Acceptance criteria
- Registry install is blocked when a hostname resolves to a private/loopback/link-local address.
### 2.3 Registry fetch hardening (timeouts/size caps)
#### Deliverables
- Add explicit timeout and size bounds to registry bundle fetch.
- Current code path uses `reqwest::get(...)` without explicit timeout/size cap.
#### Acceptance criteria
- Registry fetch cannot hang indefinitely.
- Registry fetch cannot load an unbounded payload.
### 2.4 Operator-visible metadata
#### Deliverables
- UI shows package metadata:
- publisher
- sha256
- signature present
- source (upload/registry)
- registry URL
- manifest-declared hooks + capabilities
- effective outbound HTTP permission status
---
## Phase 3 — Authz convergence (roles/permissions authoritative)
### Goal
Stop using `community_members.role` as the primary enforcement mechanism for privileged actions.
### Deliverables
- Inventory endpoints that currently use `ensure_admin_or_moderator` style gates.
- Introduce/confirm permissions for:
- managing plugin policy
- managing plugin packages
- managing community plugin settings
- Migrate gates to permission checks consistently.
### Acceptance criteria
- Plugin policy + package management endpoints authorize via roles/permissions.
---
## Phase 4 — Technical debt hotspot inventory + targeted fixes
### Deliverables
- Evidence-backed inventory (file/function-level) of:
- cross-layer coupling hotspots
- duplicated policy parsing/enforcement
- plugin plane confusion (instance defaults vs community plugins vs wasm packages)
- any unstable areas discovered during Phases 13
- Only fix hotspots that block Phase 13 acceptance criteria.
---
## Commit plan (after this plan is approved)
- Add this plan into the repo under `.windsurf/plans/` and commit to `main`.
- Implementation starts only after the plan commit lands.