karapace/docs/storage-format.md
Marco Allegretti e6e0f3dd6d docs: rewrite all documentation from source code
Delete 14 old docs files (AI-generated, riddled with Phase/M1/1.0
jargon, references to non-existent commands, stale CI snippets).

New documentation (6 files), written from repository source analysis:
- docs/architecture.md — crate graph, engine lifecycle, identity
  computation, runtime backends, store design, WAL, GC, unsafe blocks
- docs/cli-reference.md — all 23 commands with syntax, args, flags,
  exit codes, env vars, verified against crates/karapace-cli/src/main.rs
- docs/storage-format.md — directory layout, objects, layers, metadata,
  manifest format, lock file, WAL, atomic write contract
- docs/security-model.md — mount/device/env var policies with exact
  defaults from security.rs, trust assumptions, what is NOT protected
- docs/build-and-reproducibility.md — CI env vars, RUSTFLAGS, cargo
  profile, reproducibility verification, toolchain pinning
- docs/contributing.md — setup, verification, project layout, code
  standards, testing, CI workflows

README.md rewritten: concise, no marketing language, prerequisites
first, usage example, command table, limitations section.

CONTRIBUTING.md now points to docs/contributing.md.
CHANGELOG.md cleaned: removed M1-M8 labels, Phase refs, stale counts.
2026-02-23 01:25:07 +01:00

5.6 KiB
Raw Permalink Blame History

Storage Format

Store format version: 2. Defined in karapace-store/src/layout.rs::STORE_FORMAT_VERSION.

Directory layout

Default root: ~/.local/share/karapace.

<root>/
  store/
    version                # { "format_version": 2 }
    .lock                  # flock(2) exclusive lock
    objects/<blake3_hex>   # content-addressable blobs
    layers/<blake3_hex>    # layer manifests (JSON)
    metadata/<env_id>      # environment metadata (JSON)
    staging/               # temp workspace for atomic operations
    wal/<op_id>.json       # write-ahead log entries
  env/
    <env_id>/
      upper/               # overlay writable layer
      overlay/             # overlay mount point
  images/
    <cache_key>/
      rootfs/              # extracted base image filesystem

Paths defined in karapace-store/src/layout.rs::StoreLayout.

Version file

{ "format_version": 2 }

Checked on every store access. Mismatched versions are rejected with StoreError::VersionMismatch.

Objects

Content-addressable blobs keyed by blake3 hex digest of their content.

  • Write: NamedTempFile in objects dir → write content → sync_all()persist() (atomic rename)
  • Read: read file → recompute blake3 → compare to filename → reject on mismatch
  • Idempotent: writing identical content is a no-op

Defined in karapace-store/src/objects.rs::ObjectStore.

Layers

JSON files in store/layers/. Each describes a tar archive stored in the object store.

{
  "hash": "<layer_hash>",
  "kind": "Base | Dependency | Policy | Snapshot",
  "parent": "<parent_hash> | null",
  "object_refs": ["<hash>", ...],
  "read_only": true,
  "tar_hash": "<blake3_of_tar>"
}

Defined in karapace-store/src/layers.rs::LayerManifest.

Layer kinds:

Kind Hash computation Parent
Base tar_hash None
Dependency tar_hash Base layer
Policy tar_hash
Snapshot blake3("snapshot:{env_id}:{base_layer}:{tar_hash}") Base layer

Layer integrity is verified on read: the file content is re-hashed and compared to the filename.

Deterministic tar packing

karapace-store/src/layers.rs::pack_layer(source_dir):

  • Entries sorted by path
  • Timestamps set to 0
  • Owner set to 0:0
  • Permissions preserved
  • Symlink targets preserved

Dropped during packing: extended attributes, device nodes, hardlinks (stored as regular files), SELinux labels, ACLs, sparse file holes.

unpack_layer(tar_data, target_dir) reverses the process.

Metadata

JSON files in store/metadata/, one per environment. Filename is the env_id.

{
  "env_id": "...",
  "short_id": "...",
  "name": null,
  "state": "Built",
  "manifest_hash": "<object_hash>",
  "base_layer": "<layer_hash>",
  "dependency_layers": [],
  "policy_layer": null,
  "created_at": "RFC3339",
  "updated_at": "RFC3339",
  "ref_count": 1,
  "checksum": "<blake3_of_json>"
}

Defined in karapace-store/src/metadata.rs::EnvMetadata.

States: Defined, Built, Running, Frozen, Archived.

Checksum: blake3 of the JSON content (excluding the checksum field itself). Computed on every put(), verified on every get(). Absent in legacy metadata (#[serde(default)]).

Names: optional, validated by validate_env_name: pattern [a-zA-Z0-9_-], 164 characters. Unique across all environments.

Manifest format

File: karapace.toml. Parsed by karapace-schema/src/manifest.rs.

manifest_version = 1

[base]
image = "rolling"

[system]
packages = ["git", "curl"]

[gui]
apps = []

[hardware]
gpu = false
audio = false

[mounts]
workspace = "./:/workspace"

[runtime]
backend = "namespace"
network_isolation = false

[runtime.resource_limits]
cpu_shares = 1024
memory_limit_mb = 4096

Required: manifest_version (must be 1), base.image (non-empty).

Optional: all other sections. Unknown fields cause a parse error (deny_unknown_fields).

Normalization (ManifestV1::normalize): trim strings, sort and deduplicate packages/apps, sort mounts by label, lowercase backend name. Produces NormalizedManifest with a canonical_json() method.

Lock file

File: karapace.lock. Written next to the manifest. TOML format.

lock_version = 2
env_id = "46e1d96f..."
short_id = "46e1d96fdd6f"
base_image = "rolling"
base_image_digest = "a1b2c3d4..."
runtime_backend = "namespace"
hardware_gpu = false
hardware_audio = false
network_isolation = false

[[resolved_packages]]
name = "git"
version = "2.44.0-1"

Defined in karapace-schema/src/lock.rs::LockFile.

Verification:

  • verify_integrity(): recomputes env_id from locked fields, compares to stored value
  • verify_manifest_intent(): checks manifest hasn't drifted from what was locked

Hashing

All hashing uses blake3, 256-bit output, hex-encoded (64 characters).

Used for: object keys, layer hashes, env_id computation, metadata checksums, image content digests.

Write-ahead log

store/wal/<op_id>.json. Defined in karapace-store/src/wal.rs.

{
  "op_id": "20260215120000123-a1b2c3d4",
  "kind": "Build",
  "env_id": "...",
  "timestamp": "RFC3339",
  "rollback_steps": [
    { "RemoveDir": "/path" },
    { "RemoveFile": "/path" }
  ]
}

Operations: Build, Rebuild, Commit, Restore, Destroy, Gc.

Recovery: on Engine::new(), all WAL entries are scanned. Each entry's rollback steps execute in reverse order. The entry is then deleted. Corrupt entries are silently removed.

Atomic write contract

All store writes follow: NamedTempFile::new_in(dir) → write → flush()persist() (atomic rename). No partial files are visible. Defined throughout karapace-store.