mirror of
https://github.com/marcoallegretti/karapace.git
synced 2026-03-26 21:43:09 +00:00
Delete 14 old docs files (AI-generated, riddled with Phase/M1/1.0 jargon, references to non-existent commands, stale CI snippets). New documentation (6 files), written from repository source analysis: - docs/architecture.md — crate graph, engine lifecycle, identity computation, runtime backends, store design, WAL, GC, unsafe blocks - docs/cli-reference.md — all 23 commands with syntax, args, flags, exit codes, env vars, verified against crates/karapace-cli/src/main.rs - docs/storage-format.md — directory layout, objects, layers, metadata, manifest format, lock file, WAL, atomic write contract - docs/security-model.md — mount/device/env var policies with exact defaults from security.rs, trust assumptions, what is NOT protected - docs/build-and-reproducibility.md — CI env vars, RUSTFLAGS, cargo profile, reproducibility verification, toolchain pinning - docs/contributing.md — setup, verification, project layout, code standards, testing, CI workflows README.md rewritten: concise, no marketing language, prerequisites first, usage example, command table, limitations section. CONTRIBUTING.md now points to docs/contributing.md. CHANGELOG.md cleaned: removed M1-M8 labels, Phase refs, stale counts.
212 lines
5.6 KiB
Markdown
212 lines
5.6 KiB
Markdown
# Storage Format
|
||
|
||
Store format version: **2**. Defined in `karapace-store/src/layout.rs::STORE_FORMAT_VERSION`.
|
||
|
||
## Directory layout
|
||
|
||
Default root: `~/.local/share/karapace`.
|
||
|
||
```
|
||
<root>/
|
||
store/
|
||
version # { "format_version": 2 }
|
||
.lock # flock(2) exclusive lock
|
||
objects/<blake3_hex> # content-addressable blobs
|
||
layers/<blake3_hex> # layer manifests (JSON)
|
||
metadata/<env_id> # environment metadata (JSON)
|
||
staging/ # temp workspace for atomic operations
|
||
wal/<op_id>.json # write-ahead log entries
|
||
env/
|
||
<env_id>/
|
||
upper/ # overlay writable layer
|
||
overlay/ # overlay mount point
|
||
images/
|
||
<cache_key>/
|
||
rootfs/ # extracted base image filesystem
|
||
```
|
||
|
||
Paths defined in `karapace-store/src/layout.rs::StoreLayout`.
|
||
|
||
## Version file
|
||
|
||
```json
|
||
{ "format_version": 2 }
|
||
```
|
||
|
||
Checked on every store access. Mismatched versions are rejected with `StoreError::VersionMismatch`.
|
||
|
||
## Objects
|
||
|
||
Content-addressable blobs keyed by blake3 hex digest of their content.
|
||
|
||
- Write: `NamedTempFile` in objects dir → write content → `sync_all()` → `persist()` (atomic rename)
|
||
- Read: read file → recompute blake3 → compare to filename → reject on mismatch
|
||
- Idempotent: writing identical content is a no-op
|
||
|
||
Defined in `karapace-store/src/objects.rs::ObjectStore`.
|
||
|
||
## Layers
|
||
|
||
JSON files in `store/layers/`. Each describes a tar archive stored in the object store.
|
||
|
||
```json
|
||
{
|
||
"hash": "<layer_hash>",
|
||
"kind": "Base | Dependency | Policy | Snapshot",
|
||
"parent": "<parent_hash> | null",
|
||
"object_refs": ["<hash>", ...],
|
||
"read_only": true,
|
||
"tar_hash": "<blake3_of_tar>"
|
||
}
|
||
```
|
||
|
||
Defined in `karapace-store/src/layers.rs::LayerManifest`.
|
||
|
||
**Layer kinds:**
|
||
|
||
| Kind | Hash computation | Parent |
|
||
|------|-----------------|--------|
|
||
| `Base` | `tar_hash` | None |
|
||
| `Dependency` | `tar_hash` | Base layer |
|
||
| `Policy` | `tar_hash` | — |
|
||
| `Snapshot` | `blake3("snapshot:{env_id}:{base_layer}:{tar_hash}")` | Base layer |
|
||
|
||
Layer integrity is verified on read: the file content is re-hashed and compared to the filename.
|
||
|
||
### Deterministic tar packing
|
||
|
||
`karapace-store/src/layers.rs::pack_layer(source_dir)`:
|
||
|
||
- Entries sorted by path
|
||
- Timestamps set to 0
|
||
- Owner set to `0:0`
|
||
- Permissions preserved
|
||
- Symlink targets preserved
|
||
|
||
**Dropped during packing:** extended attributes, device nodes, hardlinks (stored as regular files), SELinux labels, ACLs, sparse file holes.
|
||
|
||
`unpack_layer(tar_data, target_dir)` reverses the process.
|
||
|
||
## Metadata
|
||
|
||
JSON files in `store/metadata/`, one per environment. Filename is the `env_id`.
|
||
|
||
```json
|
||
{
|
||
"env_id": "...",
|
||
"short_id": "...",
|
||
"name": null,
|
||
"state": "Built",
|
||
"manifest_hash": "<object_hash>",
|
||
"base_layer": "<layer_hash>",
|
||
"dependency_layers": [],
|
||
"policy_layer": null,
|
||
"created_at": "RFC3339",
|
||
"updated_at": "RFC3339",
|
||
"ref_count": 1,
|
||
"checksum": "<blake3_of_json>"
|
||
}
|
||
```
|
||
|
||
Defined in `karapace-store/src/metadata.rs::EnvMetadata`.
|
||
|
||
**States:** `Defined`, `Built`, `Running`, `Frozen`, `Archived`.
|
||
|
||
**Checksum:** blake3 of the JSON content (excluding the checksum field itself). Computed on every `put()`, verified on every `get()`. Absent in legacy metadata (`#[serde(default)]`).
|
||
|
||
**Names:** optional, validated by `validate_env_name`: pattern `[a-zA-Z0-9_-]`, 1–64 characters. Unique across all environments.
|
||
|
||
## Manifest format
|
||
|
||
File: `karapace.toml`. Parsed by `karapace-schema/src/manifest.rs`.
|
||
|
||
```toml
|
||
manifest_version = 1
|
||
|
||
[base]
|
||
image = "rolling"
|
||
|
||
[system]
|
||
packages = ["git", "curl"]
|
||
|
||
[gui]
|
||
apps = []
|
||
|
||
[hardware]
|
||
gpu = false
|
||
audio = false
|
||
|
||
[mounts]
|
||
workspace = "./:/workspace"
|
||
|
||
[runtime]
|
||
backend = "namespace"
|
||
network_isolation = false
|
||
|
||
[runtime.resource_limits]
|
||
cpu_shares = 1024
|
||
memory_limit_mb = 4096
|
||
```
|
||
|
||
**Required:** `manifest_version` (must be `1`), `base.image` (non-empty).
|
||
|
||
**Optional:** all other sections. Unknown fields cause a parse error (`deny_unknown_fields`).
|
||
|
||
**Normalization** (`ManifestV1::normalize`): trim strings, sort and deduplicate packages/apps, sort mounts by label, lowercase backend name. Produces `NormalizedManifest` with a `canonical_json()` method.
|
||
|
||
## Lock file
|
||
|
||
File: `karapace.lock`. Written next to the manifest. TOML format.
|
||
|
||
```toml
|
||
lock_version = 2
|
||
env_id = "46e1d96f..."
|
||
short_id = "46e1d96fdd6f"
|
||
base_image = "rolling"
|
||
base_image_digest = "a1b2c3d4..."
|
||
runtime_backend = "namespace"
|
||
hardware_gpu = false
|
||
hardware_audio = false
|
||
network_isolation = false
|
||
|
||
[[resolved_packages]]
|
||
name = "git"
|
||
version = "2.44.0-1"
|
||
```
|
||
|
||
Defined in `karapace-schema/src/lock.rs::LockFile`.
|
||
|
||
**Verification:**
|
||
- `verify_integrity()`: recomputes `env_id` from locked fields, compares to stored value
|
||
- `verify_manifest_intent()`: checks manifest hasn't drifted from what was locked
|
||
|
||
## Hashing
|
||
|
||
All hashing uses **blake3**, 256-bit output, hex-encoded (64 characters).
|
||
|
||
Used for: object keys, layer hashes, env_id computation, metadata checksums, image content digests.
|
||
|
||
## Write-ahead log
|
||
|
||
`store/wal/<op_id>.json`. Defined in `karapace-store/src/wal.rs`.
|
||
|
||
```json
|
||
{
|
||
"op_id": "20260215120000123-a1b2c3d4",
|
||
"kind": "Build",
|
||
"env_id": "...",
|
||
"timestamp": "RFC3339",
|
||
"rollback_steps": [
|
||
{ "RemoveDir": "/path" },
|
||
{ "RemoveFile": "/path" }
|
||
]
|
||
}
|
||
```
|
||
|
||
**Operations:** `Build`, `Rebuild`, `Commit`, `Restore`, `Destroy`, `Gc`.
|
||
|
||
**Recovery:** on `Engine::new()`, all WAL entries are scanned. Each entry's rollback steps execute in reverse order. The entry is then deleted. Corrupt entries are silently removed.
|
||
|
||
## Atomic write contract
|
||
|
||
All store writes follow: `NamedTempFile::new_in(dir)` → write → `flush()` → `persist()` (atomic rename). No partial files are visible. Defined throughout `karapace-store`.
|