likwid/.windsurf/plans/likwid-stabilization-6061c4.md

210 lines
8.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Likwid Stabilization Megaplan (6061c4)
Stabilize Likwid into a production-usable system by shipping a coherent admin modular-rule management UX (built-in + WASM), making WASM packages production-grade (including background jobs + SSRF-hardening), and converging authorization on roles/permissions.
## Decisions (confirmed)
- Git history: **linear `main` via squash merges**.
- Demo VPS: **tracks `main`**.
- Modular rules approach: **Hybrid** (built-in DB-backed settings + built-in modules + WASM plugin packages).
- Authz direction: **roles/permissions is authoritative**.
- Phase 1 default instance guidance: `instance_type=multi_community`, `platform_mode=admin_only`.
- `plugin_allow_background_jobs`: **full implementation** (end-to-end semantics, not just a stored flag).
- Registry SSRF hardening: **yes** (DNS-aware; do not rely only on host string / IP-literal checks).
- Community admin UX scope: **full** (plugin policy + WASM packages + built-in plugins in one coherent flow).
- Plan handling: **commit plan to `main`** once approved (plan-first discipline).
## Non-negotiable outcomes (Phase 1)
- Operational reliability + install/upgrade story.
- Admin modular rule management UX.
- WASM third-party plugin packages are production-grade.
- End-user UX consistency (avoid confusing partial configuration states).
## Current state (source-backed highlights)
- Backend exposes community endpoints:
- `GET/PUT /api/communities/{id}/plugin-policy`
- `GET/POST/PUT /api/communities/{id}/plugin-packages` (+ `install-registry`)
- WASM runtime exists (wasmtime; fuel + timeout + memory limits).
- WASM outbound HTTP is capability-gated and allowlisted.
- Registry allowlist currently:
- blocks `localhost`
- blocks IP-literal loopback/private/link-local/unspecified
- matches exact host or `*.suffix`
- **does not** do DNS resolution + post-resolution IP classification.
- Frontend currently does not call `plugin-policy` / `plugin-packages` (community UI covers built-in plugins only).
## Scope boundaries
- No architectural rewrite.
- Minimal, targeted changes per milestone.
- No dependency additions unless clearly required for an explicit acceptance criterion.
## Milestone discipline / verification
- Each milestone lands as a single squash-merge PR to `main`.
- Required verification per milestone:
- Backend: `cargo check` (and `cargo test` if stable)
- Frontend: `npm run build`
- Demo VPS: deploy `main` and run `./scripts/smoke-test.sh demo`
---
## Phase 0 — Baseline + operator invariants (gate)
### Deliverables
- Single authoritative operator workflow for:
- local dev start/stop
- demo deploy/update
- rollback
- smoke test
- Confirm demo systemd + compose wiring is consistent with docs and scripts.
### Acceptance criteria
- Demo VPS updated to latest `main` and `./scripts/smoke-test.sh demo` passes.
### Verification
- Run smoke test on VPS.
---
## Phase 1 — Admin modular rule management UX (hybrid)
### Goal
One coherent admin flow for:
- community built-in plugins (`plugins` + `community_plugins`)
- community WASM plugin packages (`plugin_packages` + `community_plugin_packages`)
- community plugin policy (`communities.settings` keys)
### Deliverables (frontend)
- Community admin UI adds a “Plugins / Rules” surface that includes:
- Built-in community plugins management (existing functionality retained).
- Plugin policy editor (read + update):
- trust policy
- install sources
- registry allowlist
- trusted publishers
- outbound HTTP toggle + allowlist
- background jobs toggle
- WASM package manager:
- list installed packages
- upload package
- install from registry URL
- activate/deactivate
- edit package settings (schema-driven when available; raw JSON fallback)
- Clear error messages for:
- policy forbids uploads / registry installs
- signature requirements fail
- registry allowlist blocks URL
### Deliverables (backend contract hardening)
- Ensure API error responses are stable and actionable for UI (status code + message consistency).
- Ensure event emission for key actions is consistent (`public_events`).
### Acceptance criteria
- As community admin/moderator:
- can view/update plugin policy
- can install a WASM package (upload + registry) when policy allows
- can activate/deactivate packages
- can edit package settings and receive server-side schema validation errors when invalid
### Verification
- Manual UI walkthrough covering:
- signed-only policy
- registry allowlist allow/deny
- outbound HTTP allowlist allow/deny
- background jobs on/off behavior (see Phase 2 definition)
---
## Phase 2 — WASM packages production-grade hardening
### 2.1 Background jobs: **full implementation**
#### Proposed semantics (must be implemented consistently)
- `plugin_allow_background_jobs=false` means:
- WASM plugins must **not** be invoked for cron hooks for that community.
- Any future “scheduled” behavior for WASM must be gated by the same setting.
- `plugin_allow_background_jobs=true` means:
- WASM plugins may receive cron hooks they declare in their manifest (e.g. `cron.minute`, `cron.hourly`, `cron.daily`, ...).
#### Implementation outline (expected code touchpoints)
- Resolve where WASM cron hooks are dispatched (currently cron loop exists in `backend/src/main.rs` and invokes `PluginManager::do_wasm_action_for_community`).
- Add a community-settings check (`communities.settings.plugin_allow_background_jobs`) in the WASM cron dispatch path.
- Ensure policy API default behavior is explicit and safe:
- confirm default is false (current parse default is false).
#### Acceptance criteria
- When `plugin_allow_background_jobs=false`, WASM cron hooks are not executed for that community.
- When `plugin_allow_background_jobs=true`, WASM cron hooks execute normally.
### 2.2 Registry install SSRF hardening (DNS-aware)
#### Goal
Registry install should not be able to reach internal/private addresses via DNS rebinding or private resolution.
#### Deliverables
- Extend registry allowlist enforcement to:
- resolve DNS for hostname-based registry URLs
- reject if any resolved IP is loopback/private/link-local/unspecified/unique-local (IPv6)
- Keep existing protections:
- reject `localhost`
- reject IP-literal private/loopback
- enforce allowlist patterns
#### Acceptance criteria
- Registry install is blocked when a hostname resolves to a private/loopback/link-local address.
### 2.3 Registry fetch hardening (timeouts/size caps)
#### Deliverables
- Add explicit timeout and size bounds to registry bundle fetch.
- Current code path uses `reqwest::get(...)` without explicit timeout/size cap.
#### Acceptance criteria
- Registry fetch cannot hang indefinitely.
- Registry fetch cannot load an unbounded payload.
### 2.4 Operator-visible metadata
#### Deliverables
- UI shows package metadata:
- publisher
- sha256
- signature present
- source (upload/registry)
- registry URL
- manifest-declared hooks + capabilities
- effective outbound HTTP permission status
---
## Phase 3 — Authz convergence (roles/permissions authoritative)
### Goal
Stop using `community_members.role` as the primary enforcement mechanism for privileged actions.
### Deliverables
- Inventory endpoints that currently use `ensure_admin_or_moderator` style gates.
- Introduce/confirm permissions for:
- managing plugin policy
- managing plugin packages
- managing community plugin settings
- Migrate gates to permission checks consistently.
### Acceptance criteria
- Plugin policy + package management endpoints authorize via roles/permissions.
---
## Phase 4 — Technical debt hotspot inventory + targeted fixes
### Deliverables
- Evidence-backed inventory (file/function-level) of:
- cross-layer coupling hotspots
- duplicated policy parsing/enforcement
- plugin plane confusion (instance defaults vs community plugins vs wasm packages)
- any unstable areas discovered during Phases 13
- Only fix hotspots that block Phase 13 acceptance criteria.
---
## Commit plan (after this plan is approved)
- Add this plan into the repo under `.windsurf/plans/` and commit to `main`.
- Implementation starts only after the plan commit lands.