Command Line Interface
======================

Codecrate provides a small CLI with subcommands.

Configuration file
------------------

Codecrate reads configuration from the repository root. It will look for:

* ``.codecrate.toml`` (preferred, if present)
* ``codecrate.toml`` (fallback)
* ``pyproject.toml`` under ``[tool.codecrate]`` (fallback if no codecrate TOML file exists)

Precedence (highest first):

* CLI flags
* ``.codecrate.toml`` / ``codecrate.toml``
* ``pyproject.toml`` ``[tool.codecrate]``

See :doc:`config` for the exhaustive generated config reference, including:

.. code-block:: toml

   [codecrate]
   output = "context.md"
   include_preset = "python+docs"
   profile = "human"
   index_json_mode = "normalized"
   index_json_enabled = true
   index_json_output = ""
   standalone_unpacker_output = ""

Overview
--------

.. code-block:: console

   codecrate --version
   codecrate pack [ROOT ...] [--repo REPO ...] [options]
   codecrate unpack PACK.md -o OUT_DIR [--check-machine-header] [--strict]
   codecrate patch OLD_PACK.md ROOT [-o patch.md]
   codecrate apply PATCH.md ROOT [--check-baseline|--ignore-baseline]
   codecrate validate-pack PACK.md [--root ROOT] [--strict] [policy flags]
   codecrate doctor [ROOT]
   codecrate config show [ROOT] [--effective] [--json]
   codecrate config schema [--json]


pack
----

Create a packed Markdown context file from one or more repositories.

.. code-block:: console

   codecrate pack . -o context
   codecrate pack /path/to/repo1 /path/to/repo2 -o multi.md
   codecrate pack --repo /path/to/repo1 --repo /path/to/repo2 -o multi.md

Use either positional ``ROOT`` arguments or repeated ``--repo`` arguments for
multi-repo packs. Mixing the two styles is an error.

Useful flags:

* ``--dedupe / --no-dedupe``: enable or disable deduplication
* ``--profile human|agent|lean-agent|hybrid|portable|portable-agent``: choose
  output defaults profile
* ``--layout auto|stubs|full``: choose layout (auto selects best token efficiency)
* ``--nav-mode auto|compact|full``: navigation density; auto uses compact for
  unsplit output and full when split outputs are requested
* ``--symbol-backend auto|python|tree-sitter|none``: optional non-Python symbol
  extraction backend (Python files always use AST)
* ``--keep-docstrings / --no-keep-docstrings``: keep docstrings in stubbed views
* ``--manifest / --no-manifest``: include or omit the Manifest section
* ``--respect-gitignore / --no-respect-gitignore``: include ignored files or not
* ``--security-check / --no-security-check``: enable or disable sensitive-file
  safety filtering
* ``--security-content-sniff / --no-security-content-sniff``: optionally scan
  file content for key/token patterns
* ``--security-redaction / --no-security-redaction``: redact flagged files instead
  of skipping them
* ``--safety-report / --no-safety-report``: include Safety Report section in output
* ``--security-path-pattern GLOB`` (repeatable): override sensitive path rule set
* ``--security-path-pattern-add GLOB`` (repeatable): append sensitive path rules
* ``--security-path-pattern-remove GLOB`` (repeatable): remove sensitive path rules
* ``--security-content-pattern RULE`` (repeatable): override sensitive content
  rule set (``name=regex`` or ``regex``)
* ``.codecrateignore``: gitignore-style ignore file in repo root (always respected)
* ``--include GLOB`` (repeatable): include patterns
* ``--include-preset python-only|python+docs|everything``: include preset
* ``--exclude GLOB`` (repeatable): exclude patterns
* ``--stdin``: read file paths from stdin (one per line) instead of scanning
* ``--stdin0``: read file paths from stdin as NUL-separated entries
* ``--print-files``: debug-print selected files after filtering
* ``--print-skipped``: debug-print skipped files and reasons
* ``--print-rules``: debug-print effective include/exclude/ignore/safety rules
* ``--split-max-chars N``: additionally emit ``.index.md`` and ``.partN.md`` files for LLMs
* ``--split-strict / --no-split-strict``: fail instead of writing oversize logical blocks
* ``--split-allow-cut-files / --no-split-allow-cut-files``: explicitly cut oversized
  file blocks across multiple part files
* ``--token-count-tree [threshold]``: show file tree with token counts; optional
  threshold shows only files with >=N tokens (for example,
  ``--token-count-tree 100``)
* ``--top-files-len N``: show N largest files by token count in stderr report
* ``--token-count-encoding NAME``: tokenizer encoding (for tiktoken backend)
* ``--file-summary / --no-file-summary``: enable or disable pack summary output
* ``--max-file-bytes N``: skip files larger than N bytes
* ``--max-total-bytes N``: fail if included files exceed N bytes
* ``--max-file-tokens N``: skip files above N tokens
* ``--max-total-tokens N``: fail if included files exceed N tokens
* ``--max-workers N``: cap thread pool size for IO/parsing/token counting
* ``--manifest-json [PATH]``: write manifest JSON for tooling (default:
  ``<output>.manifest.json``)
* ``--index-json [PATH]``: write index JSON for agent/tooling lookup (default:
  ``<output>.index.json``; explicit ``--index-json`` preserves profile/config
  mode defaults unless ``--index-json-mode`` overrides them)
* ``--index-json-mode full|compact|minimal|normalized``: choose sidecar mode and enable
    index-json output (``agent`` and ``portable-agent`` default to ``normalized``;
    ``hybrid`` defaults to ``full``)
* ``--index-json-lookup / --no-index-json-lookup``: include or trim lookup maps
    in compact/minimal v2 sidecars
* ``--index-json-symbol-index-lines / --no-index-json-symbol-index-lines``:
    include or trim compact v2 symbol index line ranges
* ``--index-json-symbol-locators / --no-index-json-symbol-locators``: include or
    trim symbol locator payloads
* ``--index-json-symbol-references / --no-index-json-symbol-references``:
    include or trim conservative symbol reference and call-like metadata
* ``--index-json-graph / --no-index-json-graph``, ``--index-json-test-links /
  --no-index-json-test-links``, ``--index-json-guide / --no-index-json-guide``,
  ``--index-json-file-imports / --no-index-json-file-imports``,
  ``--index-json-classes / --no-index-json-classes``,
  ``--index-json-exports / --no-index-json-exports``,
  ``--index-json-module-docstrings / --no-index-json-module-docstrings``:
  independently trim analysis sections
* ``--no-index-json``: disable index JSON output, including profile-implied defaults
* ``--emit-standalone-unpacker``: write ``<output>.unpack.py`` for zero-install
   reconstruction of manifest-enabled packs
* ``--locator-space auto|markdown|reconstructed|dual``: choose whether
   sidecar locators target the markdown pack, the reconstructed file tree, or
   both; ``auto`` resolves to ``reconstructed`` when
   ``--emit-standalone-unpacker`` is enabled and otherwise to ``markdown``
* ``--encoding-errors replace|strict``: UTF-8 decode policy when reading files
* ``-o/--output PATH``: output markdown path (defaults to config ``output`` or ``context.md``)

Profile defaults:

* ``human``: current markdown-first behavior
* ``agent``: compact navigation plus normalized v3 ``index-json`` output
* ``lean-agent``: smaller normalized v3 sidecars with lean analysis defaults
* ``hybrid``: current markdown behavior plus full ``index-json`` output
* ``portable``: manifest-enabled ``full`` layout intended for standalone unpack
* ``portable-agent``: ``full`` layout, standalone unpacker, normalized sidecar,
  and dual locators by default

Portable reconstruction example:

.. code-block:: console

   codecrate pack . -o context.md --profile portable --emit-standalone-unpacker
   python3 -S context.unpack.py context.md -o reconstructed/ --check-machine-header --strict --fail-on-warning

The emitted script uses only the Python standard library. It supports both
``full`` and ``stubs`` layouts; ``portable`` remains the recommended profile
when you want a reconstruction-first ``full`` pack.

On Windows, use ``py -3 -S context.unpack.py context.md -o reconstructed
--check-machine-header --strict --fail-on-warning``.

If you also emit ``index-json``, the default ``locator_space = "auto"``
switches the sidecar to reconstructed locators so tools can target the unpacked
tree directly.

When ``--emit-standalone-unpacker`` is used together with ``--split-max-chars``,
Codecrate still writes the unsplit markdown to the main output path because that
unsplit pack remains the authoritative machine-readable reconstruction source.

``--stdin`` / ``--stdin0`` notes:

* ``--stdin`` accepts one path per line from stdin.
* ``--stdin0`` accepts NUL-separated paths from stdin.
* ``--stdin`` ignores blank lines and lines starting with ``#``.
* Requires a single ``ROOT`` (cannot be combined with ``--repo``).
* Include globs are not applied to explicit stdin files.
* Exclude rules and ignore files still apply.
* Outside-root and missing explicit paths are skipped.
* With ``--print-skipped``, explicit file filtering reports reasons like
  ``not-a-file``, ``outside-root``, ``duplicate``, ``ignored``, and ``excluded``.

Include precedence:

* explicit ``--include``
* explicit ``--include-preset``
* config ``include``
* config ``include_preset``
* built-in default preset (``python+docs``)

Token diagnostics notes:

* Token diagnostics are CLI-only and do not modify generated markdown.
* If ``tiktoken`` is not installed, counting falls back to an approximate method.
* If tokenizer initialization fails, codecrate still reports top-N largest files
  using heuristic counts.
* Safety scanning uses conservative defaults; you can override both path and
  content rule sets.
* With redaction enabled, flagged files remain in output with masked content.
* A compact ``Pack Summary`` (files/tokens/chars/output path) is printed by
  default and can be disabled with ``--no-file-summary`` or
  ``file_summary = false`` in config.
* File code fences are automatically widened when file content contains backticks,
  so generated markdown remains parsable.


unpack
------

Reconstruct files into an output directory:

.. code-block:: console

   codecrate unpack context.md -o /tmp/out --check-machine-header --strict --fail-on-warning

Use ``--check-machine-header`` to verify the machine-header manifest checksum
before writing files, ``--strict`` to fail on missing/broken part mappings, and
``--fail-on-warning`` to make warning conditions exit non-zero.
If the input pack omits the Manifest section (for example from
``codecrate pack --no-manifest``), unpack fails with a clear hint to re-pack with
manifest enabled.


patch
-----

Generate a diff-only Markdown patch between an old pack and the current repo:

.. code-block:: console

   codecrate patch old_context.md . -o patch.md

The output is Markdown containing one or more `````diff`` fences.
Patch requires a pack with Manifest; ``--no-manifest`` packs are rejected with a
clear hint.
Patch output includes a ``codecrate-patch-meta`` fence with baseline hashes.


apply
-----

Apply a patch Markdown to a repo root:

.. code-block:: console

   codecrate apply patch.md .
   codecrate apply patch.md . --dry-run
   codecrate apply patch.md . --check-baseline
   codecrate apply patch.md . --ignore-baseline

Use ``--dry-run`` to parse and validate hunks without writing files.
Baseline policy:

* default: verify baseline hashes when metadata is present
* ``--check-baseline``: require metadata and verify
* ``--ignore-baseline``: skip baseline verification


validate-pack
-------------

Validate pack internals (sha/markers/canonical consistency). Optionally compare with
files on disk:

.. code-block:: console

   codecrate validate-pack context.md
   codecrate validate-pack context.md --root .

Use ``--strict`` to treat unresolved marker mapping as validation errors.
Use ``--fail-on-warning`` to turn any warning into a non-zero exit.
Use ``--fail-on-root-drift`` with ``--root`` to fail when disk content differs from the pack.
Use ``--fail-on-redaction`` or ``--fail-on-safety-skip`` for stricter safety policy enforcement.
Validation output groups issues by repository section and includes short hints.
Packs created with ``--no-manifest`` are rejected with a consistent error message.
Use ``--json`` for machine-readable report output.
For an end-to-end agent-oriented usage guide, see :doc:`agent_workflows`.


doctor
------

Inspect configuration and runtime capabilities:

.. code-block:: console

   codecrate doctor .

Doctor reports:

* config discovery and precedence
* selected config source (if any)
* ignore file detection (``.gitignore``, ``.codecrateignore``)
* token backend availability
* optional parsing backend availability (tree-sitter)


config show
-----------

Inspect the resolved configuration for a repository root:

.. code-block:: console

   codecrate config show . --effective
   codecrate config show . --effective --json

The command reports:

* selected config source (or defaults-only)
* effective values after precedence resolution
* full resolved ``security_path_patterns`` list (after add/remove)
* configured ``security_content_patterns`` list
* per-field provenance, including config aliases such as ``include_manifest``


config schema
-------------

Inspect the authoritative config metadata generated from code:

.. code-block:: console

   codecrate config schema
   codecrate config schema --json