mirror of
https://github.com/marcoallegretti/karapace.git
synced 2026-03-26 21:43:09 +00:00
- docs/getting-started.md — install per distro, first use, common workflows - docs/architecture.md — 9-crate dependency graph, design decisions, data flow - docs/manifest-spec.md — manifest v1 specification - docs/lock-spec.md — lock file v2 specification - docs/store-spec.md — store format v2 specification - docs/hash-contract.md — two-phase identity hashing algorithm - docs/security-model.md — threat model, mount/device/env policy, privilege model - docs/cli-stability.md — 23 stable commands, exit codes, stability guarantees - docs/protocol-v1.md — remote protocol v1 draft - docs/layer-limitations-v1.md — phase 1 layer limitations - docs/api-reference.md — public API reference (Engine, D-Bus) - docs/versioning-policy.md — semantic versioning, deprecation policy - docs/verification.md — release artifact verification (SHA256, cosign, SBOM) - docs/e2e-testing.md — E2E test guide with distro-specific prerequisites - README.md — project overview, features, quick start, installation - CONTRIBUTING.md — development setup, architecture principles, code standards - CHANGELOG.md — full changelog for 0.1.0 and 2.0 hardening
150 lines
4.8 KiB
Markdown
150 lines
4.8 KiB
Markdown
# Karapace Store Format Specification (v2)
|
|
|
|
## Overview
|
|
|
|
The Karapace store is a content-addressable filesystem structure that holds all environment data: objects, layers, metadata, environment directories, and crash recovery state.
|
|
|
|
## Directory Layout
|
|
|
|
```
|
|
<store_root>/
|
|
store/
|
|
version # JSON: { "format_version": 2 }
|
|
.lock # flock(2) file for exclusive access
|
|
objects/<hash> # Content-addressable blobs (blake3)
|
|
layers/<hash> # Layer manifests (JSON)
|
|
metadata/<env_id> # Environment metadata (JSON)
|
|
staging/ # Temporary workspace for atomic operations
|
|
wal/ # Write-ahead log entries (JSON)
|
|
env/
|
|
<env_id>/
|
|
upper/ # Writable overlay layer (fuse-overlayfs upperdir)
|
|
lower -> ... # Symlink to base image rootfs
|
|
work/ # Overlay workdir (ephemeral)
|
|
merged/ # Overlay mount point
|
|
images/
|
|
<cache_key>/
|
|
rootfs/ # Extracted base image rootfs
|
|
```
|
|
|
|
## Format Version
|
|
|
|
- Current version: **2**
|
|
- Stored in `store/version` as JSON.
|
|
- Checked on every store access; mismatches are rejected.
|
|
- Version 1 stores are not auto-migrated; a clean rebuild is required.
|
|
|
|
## Objects
|
|
|
|
- Keyed by blake3 hex digest of their content.
|
|
- Written atomically: write to tempfile, then rename.
|
|
- Integrity verified on every read: content re-hashed and compared to filename.
|
|
- Idempotent: writing the same content twice is a no-op.
|
|
|
|
## Layers
|
|
|
|
Each layer is a JSON manifest:
|
|
|
|
```json
|
|
{
|
|
"hash": "<layer_hash>",
|
|
"kind": "Base" | "Dependency" | "Policy" | "Snapshot",
|
|
"parent": "<parent_hash>" | null,
|
|
"object_refs": ["<hash>", ...],
|
|
"read_only": true,
|
|
"tar_hash": "<blake3_hash>"
|
|
}
|
|
```
|
|
|
|
- `tar_hash` (v2): blake3 hash of the deterministic tar archive stored in the object store.
|
|
- Base layers have no parent. Their `hash` equals their `tar_hash`.
|
|
- Dependency layers reference a base parent.
|
|
- Snapshot layers are created by `commit`. Their `hash` is a composite identity: `blake3("snapshot:{env_id}:{base_layer}:{tar_hash}")` to prevent collision with base layers.
|
|
|
|
## Metadata
|
|
|
|
Each environment has a JSON metadata file:
|
|
|
|
```json
|
|
{
|
|
"env_id": "...",
|
|
"short_id": "...",
|
|
"name": "my-env",
|
|
"state": "Defined" | "Built" | "Running" | "Frozen" | "Archived",
|
|
"manifest_hash": "<object_hash>",
|
|
"base_layer": "<layer_hash>",
|
|
"dependency_layers": ["<hash>", ...],
|
|
"policy_layer": null | "<hash>",
|
|
"created_at": "RFC3339",
|
|
"updated_at": "RFC3339",
|
|
"ref_count": 1
|
|
}
|
|
```
|
|
|
|
- `name` is optional (`#[serde(default)]`). Old metadata without this field deserializes correctly.
|
|
|
|
## Atomic Write Contract
|
|
|
|
All writes follow the pattern:
|
|
1. Create `NamedTempFile` in the target directory.
|
|
2. Write full content.
|
|
3. `flush()`.
|
|
4. `persist()` (atomic rename).
|
|
|
|
This ensures no partial files are visible.
|
|
|
|
## Garbage Collection
|
|
|
|
- Environments with `ref_count == 0` and state not in {`Running`, `Archived`} are eligible for collection.
|
|
- Layers not referenced by any live environment are orphaned.
|
|
- Objects not referenced by any live layer or live metadata (`manifest_hash`) are orphaned.
|
|
- GC never deletes running or archived environments.
|
|
- GC supports graceful cancellation via signal handler (`SIGINT`/`SIGTERM`).
|
|
- `--dry-run` reports what would be removed without acting.
|
|
- The caller must hold the store lock before running GC.
|
|
|
|
## Write-Ahead Log (WAL)
|
|
|
|
The `store/wal/` directory contains JSON entries for in-flight mutating operations. Each entry tracks:
|
|
|
|
```json
|
|
{
|
|
"op_id": "20260215120000123-a1b2c3d4",
|
|
"kind": "Build" | "Rebuild" | "Commit" | "Restore" | "Destroy",
|
|
"env_id": "...",
|
|
"timestamp": "RFC3339",
|
|
"rollback_steps": [
|
|
{ "RemoveDir": "/path/to/orphaned/dir" },
|
|
{ "RemoveFile": "/path/to/orphaned/file" }
|
|
]
|
|
}
|
|
```
|
|
|
|
### Recovery Protocol
|
|
|
|
1. On `Engine::new()`, the WAL directory is scanned for incomplete entries.
|
|
2. Each entry's rollback steps are executed in **reverse order**.
|
|
3. The WAL entry is then removed.
|
|
4. Corrupt or unreadable WAL entries are silently deleted.
|
|
|
|
### Invariants
|
|
|
|
- **INV-W1**: Kill during rebuild → next startup rolls back orphaned env_dir.
|
|
- **INV-W2**: Kill during build → orphaned env_dir cleaned.
|
|
- **INV-W3**: Successful operations leave zero WAL entries.
|
|
|
|
## Staging Directory
|
|
|
|
The `store/staging/` directory is a temporary workspace used for atomic operations:
|
|
|
|
- **Restore**: snapshot tar is unpacked into `staging/restore-{env_id}`, then renamed to replace the overlay upper directory.
|
|
- **Layer packing**: temporary files during tar creation.
|
|
|
|
The staging directory is cleaned up after each operation. Leftover staging data is safe to delete.
|
|
|
|
## Backward Compatibility
|
|
|
|
- Layout changes require a format version bump.
|
|
- Karapace 1.0 requires format version 2.
|
|
- Version 1 stores are not auto-migrated; environments must be rebuilt.
|
|
- The `name` and `tar_hash` fields use `#[serde(default)]` for forward-compatible deserialization.
|