Pack Format
Codecrate outputs a single Markdown file. When --split-max-chars is used,
it can also emit .index.md and .partN.md files intended for LLM
consumption containing enough information to:
browse code quickly (directory tree + symbol index)
reconstruct original files (full layout) or via stubs + canonical sources (stub layout)
High-level structure
A typical pack includes:
How to Use This Pack: reading guidance for LLMs
Directory Tree: a simple text tree of files
Symbol Index: per-file symbol list with line ranges
Function Library (stub layout only): canonical function bodies keyed by ID
Files: full file content (full layout) or stubbed files (stub layout)
The Manifest is required for machine operations (unpack/patch/validate-pack). For token
efficiency, split .partN.md files omit it, and you can disable it entirely with
--no-manifest (LLM-only packs).
Manifest metadata also records explicit ID/marker schemes for forward compatibility:
id_format_version(currentlysha1-8-upper:v1)marker_format_version(currentlyv1)per-definition
has_markerhints in stub layouts (for validation accuracy)
Codecrate can also emit JSON sidecars:
codecrate.manifest-json.v1: manifest-focused tooling exportcodecrate.index-json.v1: retrieval-oriented file/symbol/part index for agents and tools
See Index JSON Sidecar for the detailed sidecar contract.
Profiles can change output defaults without changing the underlying pack format:
humankeeps current markdown-first behavioragentimplies compact navigation and normalized v3 index JSON outputlean-agentimplies the leanest normalized v3 agent sidecar defaultshybridkeeps current markdown behavior and also emits index JSON outputportableimplies manifest-enabledfulllayout for standalone unpackportable-agentkeeps reconstructablefulllayout plus normalized retrieval defaults
The index sidecar includes deterministic per-repository metadata for:
emitted markdown part files
file-to-part lookup
symbol-to-file and symbol-to-canonical-body lookup
direct href-style links for file and symbol navigation
unsplit markdown line ranges for file sections, symbol index entries, and canonical bodies
explicit reverse lookup indexes for files and symbols
part character and token estimates
part oversize status and effective split policy
safety findings
per-file language detection and symbol extraction backend/status reporting
Split part membership is captured directly during split generation rather than recovered later by reparsing emitted markdown.
For non-Python files, the index sidecar reports:
language_detectedsymbol_backend_requestedsymbol_backend_usedsymbol_extraction_status
This makes it explicit whether symbol extraction was unavailable, disabled, unsupported for the file type, or completed successfully.
The index sidecar also separates human-facing and machine-facing identifiers:
display_id/display_local_idkeep the current short pack IDs used by markdown anchorscanonical_id/local_iduse stronger SHA-256 based machine IDs for toolingdisplay_id_format_versionandcanonical_id_format_versionrecord both schemes explicitly
Per-file entries also include lightweight review metadata such as byte, character, and token estimates for both original and effective packed content.
Machine Header includes:
formatrepo_label/repo_slugmanifest_sha256
Protocol constants
pack format:
codecrate.v4patch metadata format:
codecrate.patch.v1manifest-json format:
codecrate.manifest-json.v1index-json format:
codecrate.index-json.v1machine header fence:
codecrate-machine-headermanifest fence:
codecrate-manifestpatch metadata fence:
codecrate-patch-meta
Layouts
fullThe pack includes full file contents under Files. The manifest is minimal and does not include function metadata.
stubsThe pack includes stubbed file contents under Files and a Function Library with canonical function bodies.
autoChooses
stubsonly when deduplication actually collapses something; otherwise choosesfullfor best token efficiency.
Portable unpack contract
The initial standalone unpack flow targets a conservative subset of the pack format:
unsplit markdown is the authoritative machine-readable reconstruction source
fullportable unpack requires the Manifest plus file bodies under## Filesstubsportable unpack additionally requires the Function Library plus manifestdefsmetadata to resolve markers back into canonical bodiessplit
.index.md/.partN.mdoutputs are not the standalone machine source
codecrate pack --profile portable --emit-standalone-unpacker writes a
standard-library-only <output>.unpack.py beside the main markdown output.
Generated portable-agent markdown also includes a non-authoritative
codecrate-agent-workflow JSON fence. It gives coding agents a deterministic
first-action hint, including the recommended python3 -S reconstruction
command, sidecar filenames, fallback interpreters, and a reminder to avoid
manual markdown scraping unless unpacking fails with a Codecrate error.
IDs and deduplication
In stub layout, Codecrate distinguishes:
local_idUnique per definition occurrence (stable by file path + qualname + def line).
idCanonical body ID. When dedupe is enabled and identical bodies are detected, multiple
local_idvalues may share the same canonicalid.
Stub markers
Stubbed file bodies contain markers like:
... # ↪ FUNC:v1:XXXXXXXX
The marker references the function definition occurrence. During unpack, Codecrate
locates the marker, finds the def line above it (including decorators), and
replaces that region with the canonical function body from the Function Library.
Patch metadata
Generated patch markdown includes a codecrate-patch-meta fence with:
patch format id (
codecrate.patch.v1)baseline manifest checksum
baseline per-file original checksums
apply uses this metadata to verify that baseline files still match before
applying hunks.
Determinism
Pack ordering is deterministic by normalized relative path and stable id order. Split outputs preserve deterministic section/file/function ordering and avoid splitting inside fenced code blocks.
When a single logical block exceeds --split-max-chars, Codecrate keeps it
intact in an oversize part by default. Use --split-strict to fail instead,
or --split-allow-cut-files to explicitly cut oversized file blocks across
multiple parts.
When binary files are detected during packing, they are skipped and reported as
Skipped as binary: N file(s) in the pack header and Safety Report (when enabled).
Line ranges
The Symbol Index can include markdown line ranges (Lx-y) that refer to line numbers
inside the packed Markdown file itself.
When a pack is split into .partN.md files, these markdown line ranges are omitted in
the split parts because they are not stable across files. Use the per-part links
instead (for example context.part3.md#src-... / #func-...).