Command Line Interface

Codecrate provides a small CLI with subcommands.

Configuration file

Codecrate reads configuration from the repository root. It will look for:

  • .codecrate.toml (preferred, if present)

  • codecrate.toml (fallback)

  • pyproject.toml under [tool.codecrate] (fallback if no codecrate TOML file exists)

Precedence (highest first):

  • CLI flags

  • .codecrate.toml / codecrate.toml

  • pyproject.toml [tool.codecrate]

See Configuration Reference for the exhaustive generated config reference, including:

[codecrate]
output = "context.md"
include_preset = "python+docs"
profile = "human"
index_json_mode = "normalized"
index_json_enabled = true
index_json_output = ""
standalone_unpacker_output = ""

Overview

codecrate --version
codecrate pack [ROOT ...] [--repo REPO ...] [options]
codecrate unpack PACK.md -o OUT_DIR [--check-machine-header] [--strict]
codecrate patch OLD_PACK.md ROOT [-o patch.md]
codecrate apply PATCH.md ROOT [--check-baseline|--ignore-baseline]
codecrate validate-pack PACK.md [--root ROOT] [--strict] [policy flags]
codecrate doctor [ROOT]
codecrate config show [ROOT] [--effective] [--json]
codecrate config schema [--json]

pack

Create a packed Markdown context file from one or more repositories.

codecrate pack . -o context
codecrate pack /path/to/repo1 /path/to/repo2 -o multi.md
codecrate pack --repo /path/to/repo1 --repo /path/to/repo2 -o multi.md

Use either positional ROOT arguments or repeated --repo arguments for multi-repo packs. Mixing the two styles is an error.

Useful flags:

  • --dedupe / --no-dedupe: enable or disable deduplication

  • --profile human|agent|lean-agent|hybrid|portable|portable-agent: choose output defaults profile

  • --layout auto|stubs|full: choose layout (auto selects best token efficiency)

  • --nav-mode auto|compact|full: navigation density; auto uses compact for unsplit output and full when split outputs are requested

  • --symbol-backend auto|python|tree-sitter|none: optional non-Python symbol extraction backend (Python files always use AST)

  • --keep-docstrings / --no-keep-docstrings: keep docstrings in stubbed views

  • --manifest / --no-manifest: include or omit the Manifest section

  • --respect-gitignore / --no-respect-gitignore: include ignored files or not

  • --security-check / --no-security-check: enable or disable sensitive-file safety filtering

  • --security-content-sniff / --no-security-content-sniff: optionally scan file content for key/token patterns

  • --security-redaction / --no-security-redaction: redact flagged files instead of skipping them

  • --safety-report / --no-safety-report: include Safety Report section in output

  • --security-path-pattern GLOB (repeatable): override sensitive path rule set

  • --security-path-pattern-add GLOB (repeatable): append sensitive path rules

  • --security-path-pattern-remove GLOB (repeatable): remove sensitive path rules

  • --security-content-pattern RULE (repeatable): override sensitive content rule set (name=regex or regex)

  • .codecrateignore: gitignore-style ignore file in repo root (always respected)

  • --include GLOB (repeatable): include patterns

  • --include-preset python-only|python+docs|everything: include preset

  • --exclude GLOB (repeatable): exclude patterns

  • --stdin: read file paths from stdin (one per line) instead of scanning

  • --stdin0: read file paths from stdin as NUL-separated entries

  • --print-files: debug-print selected files after filtering

  • --print-skipped: debug-print skipped files and reasons

  • --print-rules: debug-print effective include/exclude/ignore/safety rules

  • --split-max-chars N: additionally emit .index.md and .partN.md files for LLMs

  • --split-strict / --no-split-strict: fail instead of writing oversize logical blocks

  • --split-allow-cut-files / --no-split-allow-cut-files: explicitly cut oversized file blocks across multiple part files

  • --token-count-tree [threshold]: show file tree with token counts; optional threshold shows only files with >=N tokens (for example, --token-count-tree 100)

  • --top-files-len N: show N largest files by token count in stderr report

  • --token-count-encoding NAME: tokenizer encoding (for tiktoken backend)

  • --file-summary / --no-file-summary: enable or disable pack summary output

  • --max-file-bytes N: skip files larger than N bytes

  • --max-total-bytes N: fail if included files exceed N bytes

  • --max-file-tokens N: skip files above N tokens

  • --max-total-tokens N: fail if included files exceed N tokens

  • --max-workers N: cap thread pool size for IO/parsing/token counting

  • --manifest-json [PATH]: write manifest JSON for tooling (default: <output>.manifest.json)

  • --index-json [PATH]: write index JSON for agent/tooling lookup (default: <output>.index.json; explicit --index-json preserves profile/config mode defaults unless --index-json-mode overrides them)

  • --index-json-mode full|compact|minimal|normalized: choose sidecar mode and enable

    index-json output (agent and portable-agent default to normalized; hybrid defaults to full)

  • --index-json-lookup / --no-index-json-lookup: include or trim lookup maps

    in compact/minimal v2 sidecars

  • --index-json-symbol-index-lines / --no-index-json-symbol-index-lines:

    include or trim compact v2 symbol index line ranges

  • --index-json-symbol-locators / --no-index-json-symbol-locators: include or

    trim symbol locator payloads

  • --index-json-symbol-references / --no-index-json-symbol-references:

    include or trim conservative symbol reference and call-like metadata

  • --index-json-graph / --no-index-json-graph, --index-json-test-links / --no-index-json-test-links, --index-json-guide / --no-index-json-guide, --index-json-file-imports / --no-index-json-file-imports, --index-json-classes / --no-index-json-classes, --index-json-exports / --no-index-json-exports, --index-json-module-docstrings / --no-index-json-module-docstrings: independently trim analysis sections

  • --no-index-json: disable index JSON output, including profile-implied defaults

  • --emit-standalone-unpacker: write <output>.unpack.py for zero-install

    reconstruction of manifest-enabled packs

  • --locator-space auto|markdown|reconstructed|dual: choose whether

    sidecar locators target the markdown pack, the reconstructed file tree, or both; auto resolves to reconstructed when --emit-standalone-unpacker is enabled and otherwise to markdown

  • --encoding-errors replace|strict: UTF-8 decode policy when reading files

  • -o/--output PATH: output markdown path (defaults to config output or context.md)

Profile defaults:

  • human: current markdown-first behavior

  • agent: compact navigation plus normalized v3 index-json output

  • lean-agent: smaller normalized v3 sidecars with lean analysis defaults

  • hybrid: current markdown behavior plus full index-json output

  • portable: manifest-enabled full layout intended for standalone unpack

  • portable-agent: full layout, standalone unpacker, normalized sidecar, and dual locators by default

Portable reconstruction example:

codecrate pack . -o context.md --profile portable --emit-standalone-unpacker
python3 -S context.unpack.py context.md -o reconstructed/ --check-machine-header --strict --fail-on-warning

The emitted script uses only the Python standard library. It supports both full and stubs layouts; portable remains the recommended profile when you want a reconstruction-first full pack.

On Windows, use py -3 -S context.unpack.py context.md -o reconstructed --check-machine-header --strict --fail-on-warning.

If you also emit index-json, the default locator_space = "auto" switches the sidecar to reconstructed locators so tools can target the unpacked tree directly.

When --emit-standalone-unpacker is used together with --split-max-chars, Codecrate still writes the unsplit markdown to the main output path because that unsplit pack remains the authoritative machine-readable reconstruction source.

--stdin / --stdin0 notes:

  • --stdin accepts one path per line from stdin.

  • --stdin0 accepts NUL-separated paths from stdin.

  • --stdin ignores blank lines and lines starting with #.

  • Requires a single ROOT (cannot be combined with --repo).

  • Include globs are not applied to explicit stdin files.

  • Exclude rules and ignore files still apply.

  • Outside-root and missing explicit paths are skipped.

  • With --print-skipped, explicit file filtering reports reasons like not-a-file, outside-root, duplicate, ignored, and excluded.

Include precedence:

  • explicit --include

  • explicit --include-preset

  • config include

  • config include_preset

  • built-in default preset (python+docs)

Token diagnostics notes:

  • Token diagnostics are CLI-only and do not modify generated markdown.

  • If tiktoken is not installed, counting falls back to an approximate method.

  • If tokenizer initialization fails, codecrate still reports top-N largest files using heuristic counts.

  • Safety scanning uses conservative defaults; you can override both path and content rule sets.

  • With redaction enabled, flagged files remain in output with masked content.

  • A compact Pack Summary (files/tokens/chars/output path) is printed by default and can be disabled with --no-file-summary or file_summary = false in config.

  • File code fences are automatically widened when file content contains backticks, so generated markdown remains parsable.

unpack

Reconstruct files into an output directory:

codecrate unpack context.md -o /tmp/out --check-machine-header --strict --fail-on-warning

Use --check-machine-header to verify the machine-header manifest checksum before writing files, --strict to fail on missing/broken part mappings, and --fail-on-warning to make warning conditions exit non-zero. If the input pack omits the Manifest section (for example from codecrate pack --no-manifest), unpack fails with a clear hint to re-pack with manifest enabled.

patch

Generate a diff-only Markdown patch between an old pack and the current repo:

codecrate patch old_context.md . -o patch.md

The output is Markdown containing one or more ```diff fences. Patch requires a pack with Manifest; --no-manifest packs are rejected with a clear hint. Patch output includes a codecrate-patch-meta fence with baseline hashes.

apply

Apply a patch Markdown to a repo root:

codecrate apply patch.md .
codecrate apply patch.md . --dry-run
codecrate apply patch.md . --check-baseline
codecrate apply patch.md . --ignore-baseline

Use --dry-run to parse and validate hunks without writing files. Baseline policy:

  • default: verify baseline hashes when metadata is present

  • --check-baseline: require metadata and verify

  • --ignore-baseline: skip baseline verification

validate-pack

Validate pack internals (sha/markers/canonical consistency). Optionally compare with files on disk:

codecrate validate-pack context.md
codecrate validate-pack context.md --root .

Use --strict to treat unresolved marker mapping as validation errors. Use --fail-on-warning to turn any warning into a non-zero exit. Use --fail-on-root-drift with --root to fail when disk content differs from the pack. Use --fail-on-redaction or --fail-on-safety-skip for stricter safety policy enforcement. Validation output groups issues by repository section and includes short hints. Packs created with --no-manifest are rejected with a consistent error message. Use --json for machine-readable report output. For an end-to-end agent-oriented usage guide, see Agent Workflows.

doctor

Inspect configuration and runtime capabilities:

codecrate doctor .

Doctor reports:

  • config discovery and precedence

  • selected config source (if any)

  • ignore file detection (.gitignore, .codecrateignore)

  • token backend availability

  • optional parsing backend availability (tree-sitter)

config show

Inspect the resolved configuration for a repository root:

codecrate config show . --effective
codecrate config show . --effective --json

The command reports:

  • selected config source (or defaults-only)

  • effective values after precedence resolution

  • full resolved security_path_patterns list (after add/remove)

  • configured security_content_patterns list

  • per-field provenance, including config aliases such as include_manifest

config schema

Inspect the authoritative config metadata generated from code:

codecrate config schema
codecrate config schema --json