Command Line Interface

Codecrate provides a small CLI with subcommands.

Configuration file

Codecrate reads configuration from the repository root. It will look for:

.codecrate.toml (preferred, if present)
codecrate.toml (fallback)
pyproject.toml under [tool.codecrate] (fallback if no codecrate TOML file exists)

Precedence (highest first):

CLI flags
.codecrate.toml / codecrate.toml
pyproject.toml [tool.codecrate]

See Configuration Reference for the exhaustive generated config reference, including:

[codecrate]
output = "context.md"
include_preset = "python+docs"
profile = "human"
index_json_mode = "normalized"
index_json_enabled = true
index_json_output = ""
standalone_unpacker_output = ""

Overview

codecrate --version
codecrate pack [ROOT ...] [--repo REPO ...] [options]
codecrate unpack PACK.md -o OUT_DIR [--check-machine-header] [--strict]
codecrate patch OLD_PACK.md ROOT [-o patch.md]
codecrate apply PATCH.md ROOT [--check-baseline|--ignore-baseline]
codecrate validate-pack PACK.md [--root ROOT] [--strict] [policy flags]
codecrate doctor [ROOT]
codecrate config show [ROOT] [--effective] [--json]
codecrate config schema [--json]

pack

Create a packed Markdown context file from one or more repositories.

codecrate pack . -o context
codecrate pack /path/to/repo1 /path/to/repo2 -o multi.md
codecrate pack --repo /path/to/repo1 --repo /path/to/repo2 -o multi.md

Use either positional ROOT arguments or repeated --repo arguments for multi-repo packs. Mixing the two styles is an error.

Useful flags:

--dedupe / --no-dedupe: enable or disable deduplication
--profile human|agent|lean-agent|hybrid|portable|portable-agent: choose output defaults profile
--layout auto|stubs|full: choose layout (auto selects best token efficiency)
--nav-mode auto|compact|full: navigation density; auto uses compact for unsplit output and full when split outputs are requested
--symbol-backend auto|python|tree-sitter|none: optional non-Python symbol extraction backend (Python files always use AST)
--keep-docstrings / --no-keep-docstrings: keep docstrings in stubbed views
--manifest / --no-manifest: include or omit the Manifest section
--respect-gitignore / --no-respect-gitignore: include ignored files or not
--security-check / --no-security-check: enable or disable sensitive-file safety filtering
--security-content-sniff / --no-security-content-sniff: optionally scan file content for key/token patterns
--security-redaction / --no-security-redaction: redact flagged files instead of skipping them
--safety-report / --no-safety-report: include Safety Report section in output
--security-path-pattern GLOB (repeatable): override sensitive path rule set
--security-path-pattern-add GLOB (repeatable): append sensitive path rules
--security-path-pattern-remove GLOB (repeatable): remove sensitive path rules
--security-content-pattern RULE (repeatable): override sensitive content rule set (name=regex or regex)
.codecrateignore: gitignore-style ignore file in repo root (always respected)
--include GLOB (repeatable): include patterns
--include-preset python-only|python+docs|everything: include preset
--exclude GLOB (repeatable): exclude patterns
--stdin: read file paths from stdin (one per line) instead of scanning
--stdin0: read file paths from stdin as NUL-separated entries
--print-files: debug-print selected files after filtering
--print-skipped: debug-print skipped files and reasons
--print-rules: debug-print effective include/exclude/ignore/safety rules
--split-max-chars N: additionally emit .index.md and .partN.md files for LLMs
--split-strict / --no-split-strict: fail instead of writing oversize logical blocks
--split-allow-cut-files / --no-split-allow-cut-files: explicitly cut oversized file blocks across multiple part files
--token-count-tree [threshold]: show file tree with token counts; optional threshold shows only files with >=N tokens (for example, --token-count-tree 100)
--top-files-len N: show N largest files by token count in stderr report
--token-count-encoding NAME: tokenizer encoding (for tiktoken backend)
--file-summary / --no-file-summary: enable or disable pack summary output
--max-file-bytes N: skip files larger than N bytes
--max-total-bytes N: fail if included files exceed N bytes
--max-file-tokens N: skip files above N tokens
--max-total-tokens N: fail if included files exceed N tokens
--max-workers N: cap thread pool size for IO/parsing/token counting
--manifest-json [PATH]: write manifest JSON for tooling (default: <output>.manifest.json)
--index-json [PATH]: write index JSON for agent/tooling lookup (default: <output>.index.json; explicit --index-json preserves profile/config mode defaults unless --index-json-mode overrides them)
--index-json-mode full|compact|minimal|normalized: choose sidecar mode and enable
index-json output (agent and portable-agent default to normalized; hybrid defaults to full)
--index-json-lookup / --no-index-json-lookup: include or trim lookup maps
in compact/minimal v2 sidecars
--index-json-symbol-index-lines / --no-index-json-symbol-index-lines:
include or trim compact v2 symbol index line ranges
--index-json-symbol-locators / --no-index-json-symbol-locators: include or
trim symbol locator payloads
--index-json-symbol-references / --no-index-json-symbol-references:
include or trim conservative symbol reference and call-like metadata
--index-json-graph / --no-index-json-graph, --index-json-test-links / --no-index-json-test-links, --index-json-guide / --no-index-json-guide, --index-json-file-imports / --no-index-json-file-imports, --index-json-classes / --no-index-json-classes, --index-json-exports / --no-index-json-exports, --index-json-module-docstrings / --no-index-json-module-docstrings: independently trim analysis sections
--no-index-json: disable index JSON output, including profile-implied defaults
--emit-standalone-unpacker: write <output>.unpack.py for zero-install
reconstruction of manifest-enabled packs
--locator-space auto|markdown|reconstructed|dual: choose whether
sidecar locators target the markdown pack, the reconstructed file tree, or both; auto resolves to reconstructed when --emit-standalone-unpacker is enabled and otherwise to markdown
--encoding-errors replace|strict: UTF-8 decode policy when reading files
-o/--output PATH: output markdown path (defaults to config output or context.md)

Profile defaults:

human: current markdown-first behavior
agent: compact navigation plus normalized v3 index-json output
lean-agent: smaller normalized v3 sidecars with lean analysis defaults
hybrid: current markdown behavior plus full index-json output
portable: manifest-enabled full layout intended for standalone unpack
portable-agent: full layout, standalone unpacker, normalized sidecar, and dual locators by default

Portable reconstruction example:

codecrate pack . -o context.md --profile portable --emit-standalone-unpacker
python3 -S context.unpack.py context.md -o reconstructed/ --check-machine-header --strict --fail-on-warning

The emitted script uses only the Python standard library. It supports both full and stubs layouts; portable remains the recommended profile when you want a reconstruction-first full pack.

On Windows, use py -3 -S context.unpack.py context.md -o reconstructed --check-machine-header --strict --fail-on-warning.

If you also emit index-json, the default locator_space = "auto" switches the sidecar to reconstructed locators so tools can target the unpacked tree directly.

When --emit-standalone-unpacker is used together with --split-max-chars, Codecrate still writes the unsplit markdown to the main output path because that unsplit pack remains the authoritative machine-readable reconstruction source.

--stdin / --stdin0 notes:

--stdin accepts one path per line from stdin.
--stdin0 accepts NUL-separated paths from stdin.
--stdin ignores blank lines and lines starting with #.
Requires a single ROOT (cannot be combined with --repo).
Include globs are not applied to explicit stdin files.
Exclude rules and ignore files still apply.
Outside-root and missing explicit paths are skipped.
With --print-skipped, explicit file filtering reports reasons like not-a-file, outside-root, duplicate, ignored, and excluded.

Include precedence:

explicit --include
explicit --include-preset
config include
config include_preset
built-in default preset (python+docs)

Token diagnostics notes:

Token diagnostics are CLI-only and do not modify generated markdown.
If tiktoken is not installed, counting falls back to an approximate method.
If tokenizer initialization fails, codecrate still reports top-N largest files using heuristic counts.
Safety scanning uses conservative defaults; you can override both path and content rule sets.
With redaction enabled, flagged files remain in output with masked content.
A compact Pack Summary (files/tokens/chars/output path) is printed by default and can be disabled with --no-file-summary or file_summary = false in config.
File code fences are automatically widened when file content contains backticks, so generated markdown remains parsable.

unpack

Reconstruct files into an output directory:

codecrate unpack context.md -o /tmp/out --check-machine-header --strict --fail-on-warning

Use --check-machine-header to verify the machine-header manifest checksum before writing files, --strict to fail on missing/broken part mappings, and --fail-on-warning to make warning conditions exit non-zero. If the input pack omits the Manifest section (for example from codecrate pack --no-manifest), unpack fails with a clear hint to re-pack with manifest enabled.

patch

Generate a diff-only Markdown patch between an old pack and the current repo:

codecrate patch old_context.md . -o patch.md

The output is Markdown containing one or more ```diff fences. Patch requires a pack with Manifest; --no-manifest packs are rejected with a clear hint. Patch output includes a codecrate-patch-meta fence with baseline hashes.

apply

Apply a patch Markdown to a repo root:

codecrate apply patch.md .
codecrate apply patch.md . --dry-run
codecrate apply patch.md . --check-baseline
codecrate apply patch.md . --ignore-baseline

Use --dry-run to parse and validate hunks without writing files. Baseline policy:

default: verify baseline hashes when metadata is present
--check-baseline: require metadata and verify
--ignore-baseline: skip baseline verification

validate-pack

Validate pack internals (sha/markers/canonical consistency). Optionally compare with files on disk:

codecrate validate-pack context.md
codecrate validate-pack context.md --root .

Use --strict to treat unresolved marker mapping as validation errors. Use --fail-on-warning to turn any warning into a non-zero exit. Use --fail-on-root-drift with --root to fail when disk content differs from the pack. Use --fail-on-redaction or --fail-on-safety-skip for stricter safety policy enforcement. Validation output groups issues by repository section and includes short hints. Packs created with --no-manifest are rejected with a consistent error message. Use --json for machine-readable report output. For an end-to-end agent-oriented usage guide, see Agent Workflows.

doctor

Inspect configuration and runtime capabilities:

codecrate doctor .

Doctor reports:

config discovery and precedence
selected config source (if any)
ignore file detection (.gitignore, .codecrateignore)
token backend availability
optional parsing backend availability (tree-sitter)

config show

Inspect the resolved configuration for a repository root:

codecrate config show . --effective
codecrate config show . --effective --json

The command reports:

selected config source (or defaults-only)
effective values after precedence resolution
full resolved security_path_patterns list (after add/remove)
configured security_content_patterns list
per-field provenance, including config aliases such as include_manifest

config schema

Inspect the authoritative config metadata generated from code:

codecrate config schema
codecrate config schema --json