karapace/docs/store-spec.md
Marco Allegretti 5306963cce docs: comprehensive public documentation
- docs/getting-started.md — install per distro, first use, common workflows
- docs/architecture.md — 9-crate dependency graph, design decisions, data flow
- docs/manifest-spec.md — manifest v1 specification
- docs/lock-spec.md — lock file v2 specification
- docs/store-spec.md — store format v2 specification
- docs/hash-contract.md — two-phase identity hashing algorithm
- docs/security-model.md — threat model, mount/device/env policy, privilege model
- docs/cli-stability.md — 23 stable commands, exit codes, stability guarantees
- docs/protocol-v1.md — remote protocol v1 draft
- docs/layer-limitations-v1.md — phase 1 layer limitations
- docs/api-reference.md — public API reference (Engine, D-Bus)
- docs/versioning-policy.md — semantic versioning, deprecation policy
- docs/verification.md — release artifact verification (SHA256, cosign, SBOM)
- docs/e2e-testing.md — E2E test guide with distro-specific prerequisites
- README.md — project overview, features, quick start, installation
- CONTRIBUTING.md — development setup, architecture principles, code standards
- CHANGELOG.md — full changelog for 0.1.0 and 2.0 hardening
2026-02-22 18:38:41 +01:00

150 lines
4.8 KiB
Markdown

# Karapace Store Format Specification (v2)
## Overview
The Karapace store is a content-addressable filesystem structure that holds all environment data: objects, layers, metadata, environment directories, and crash recovery state.
## Directory Layout
```
<store_root>/
store/
version # JSON: { "format_version": 2 }
.lock # flock(2) file for exclusive access
objects/<hash> # Content-addressable blobs (blake3)
layers/<hash> # Layer manifests (JSON)
metadata/<env_id> # Environment metadata (JSON)
staging/ # Temporary workspace for atomic operations
wal/ # Write-ahead log entries (JSON)
env/
<env_id>/
upper/ # Writable overlay layer (fuse-overlayfs upperdir)
lower -> ... # Symlink to base image rootfs
work/ # Overlay workdir (ephemeral)
merged/ # Overlay mount point
images/
<cache_key>/
rootfs/ # Extracted base image rootfs
```
## Format Version
- Current version: **2**
- Stored in `store/version` as JSON.
- Checked on every store access; mismatches are rejected.
- Version 1 stores are not auto-migrated; a clean rebuild is required.
## Objects
- Keyed by blake3 hex digest of their content.
- Written atomically: write to tempfile, then rename.
- Integrity verified on every read: content re-hashed and compared to filename.
- Idempotent: writing the same content twice is a no-op.
## Layers
Each layer is a JSON manifest:
```json
{
"hash": "<layer_hash>",
"kind": "Base" | "Dependency" | "Policy" | "Snapshot",
"parent": "<parent_hash>" | null,
"object_refs": ["<hash>", ...],
"read_only": true,
"tar_hash": "<blake3_hash>"
}
```
- `tar_hash` (v2): blake3 hash of the deterministic tar archive stored in the object store.
- Base layers have no parent. Their `hash` equals their `tar_hash`.
- Dependency layers reference a base parent.
- Snapshot layers are created by `commit`. Their `hash` is a composite identity: `blake3("snapshot:{env_id}:{base_layer}:{tar_hash}")` to prevent collision with base layers.
## Metadata
Each environment has a JSON metadata file:
```json
{
"env_id": "...",
"short_id": "...",
"name": "my-env",
"state": "Defined" | "Built" | "Running" | "Frozen" | "Archived",
"manifest_hash": "<object_hash>",
"base_layer": "<layer_hash>",
"dependency_layers": ["<hash>", ...],
"policy_layer": null | "<hash>",
"created_at": "RFC3339",
"updated_at": "RFC3339",
"ref_count": 1
}
```
- `name` is optional (`#[serde(default)]`). Old metadata without this field deserializes correctly.
## Atomic Write Contract
All writes follow the pattern:
1. Create `NamedTempFile` in the target directory.
2. Write full content.
3. `flush()`.
4. `persist()` (atomic rename).
This ensures no partial files are visible.
## Garbage Collection
- Environments with `ref_count == 0` and state not in {`Running`, `Archived`} are eligible for collection.
- Layers not referenced by any live environment are orphaned.
- Objects not referenced by any live layer or live metadata (`manifest_hash`) are orphaned.
- GC never deletes running or archived environments.
- GC supports graceful cancellation via signal handler (`SIGINT`/`SIGTERM`).
- `--dry-run` reports what would be removed without acting.
- The caller must hold the store lock before running GC.
## Write-Ahead Log (WAL)
The `store/wal/` directory contains JSON entries for in-flight mutating operations. Each entry tracks:
```json
{
"op_id": "20260215120000123-a1b2c3d4",
"kind": "Build" | "Rebuild" | "Commit" | "Restore" | "Destroy",
"env_id": "...",
"timestamp": "RFC3339",
"rollback_steps": [
{ "RemoveDir": "/path/to/orphaned/dir" },
{ "RemoveFile": "/path/to/orphaned/file" }
]
}
```
### Recovery Protocol
1. On `Engine::new()`, the WAL directory is scanned for incomplete entries.
2. Each entry's rollback steps are executed in **reverse order**.
3. The WAL entry is then removed.
4. Corrupt or unreadable WAL entries are silently deleted.
### Invariants
- **INV-W1**: Kill during rebuild → next startup rolls back orphaned env_dir.
- **INV-W2**: Kill during build → orphaned env_dir cleaned.
- **INV-W3**: Successful operations leave zero WAL entries.
## Staging Directory
The `store/staging/` directory is a temporary workspace used for atomic operations:
- **Restore**: snapshot tar is unpacked into `staging/restore-{env_id}`, then renamed to replace the overlay upper directory.
- **Layer packing**: temporary files during tar creation.
The staging directory is cleaned up after each operation. Leftover staging data is safe to delete.
## Backward Compatibility
- Layout changes require a format version bump.
- Karapace 1.0 requires format version 2.
- Version 1 stores are not auto-migrated; environments must be rebuilt.
- The `name` and `tar_hash` fields use `#[serde(default)]` for forward-compatible deserialization.