Index JSON Sidecar
index-json is the machine-facing companion to the Markdown pack. The
Markdown output stays optimized for human reading, patch/apply, and round-trip
reconstruction; the sidecar exists so tools can answer common retrieval
questions without scraping markdown.
Generation
Generate the sidecar explicitly:
codecrate pack . -o context.md --index-json
Or let the profile enable it:
codecrate pack . -o context.md --profile agent
codecrate pack . -o context.md --profile lean-agent
codecrate pack . -o context.md --profile hybrid
codecrate pack . -o context.md --profile portable-agent
--profile agent resolves to the normalized v3 sidecar by default, while
--profile lean-agent keeps normalized v3 but trims analysis-heavy payloads
and pretty-print whitespace by default, while --profile hybrid keeps the
full v1-compatible sidecar. --profile portable-agent pairs a reconstructable
full pack with a normalized sidecar, a generated standalone unpacker, and
dual locator families.
Or choose a specific sidecar mode:
codecrate pack . -o context.md --index-json-mode compact
codecrate pack . -o context.md --index-json-mode minimal
codecrate pack . -o context.md --index-json-mode normalized
codecrate pack . -o context.md --index-json-mode minimal --locator-space dual
--index-json alone defaults to the full compatibility surface. Use
--index-json-mode compact, --index-json-mode minimal, or
--index-json-mode normalized when you want a machine-first sidecar
explicitly.
Use normalized for the smallest recommended sidecar in agent workflows.
Use minimal for the smallest v2-compatible sidecar.
By default, the sidecar is written next to the markdown output as
<output>.index.json.
Contract and compatibility
The sidecar is versioned independently:
codecrate.index-json.v1: full sidecar surfacecodecrate.index-json.v2: compact or minimal sidecar surfacecodecrate.index-json.v3: normalized sidecar surface
The top-level mode field distinguishes full, compact, and
minimal, and normalized output.
Compatibility rules:
v1 remains the full-fidelity compatibility surface
v2 is a machine-first retrieval surface that removes redundant display and reverse-index duplication by default
v3 is the most compact analysis-oriented surface and interns repeated paths, qualnames, and strings into shared tables
machine-facing lookups should prefer explicit IDs and lookup maps over markdown scraping
The pack and sidecar are generated from the same export model, so repository, file, symbol, and split-part metadata describe the markdown that was actually written.
Locator targets are configurable independently from sidecar mode:
--locator-space markdownkeeps locators pointed at the markdown pack--locator-space reconstructedpoints locators at the reconstructed file tree--locator-space dualemits both families--locator-space autoresolves toreconstructedwhen--emit-standalone-unpackeris enabled and otherwise tomarkdown
Top-level shape
The payload has this high-level structure:
{
"format": "codecrate.index-json.v3",
"mode": "normalized",
"pack": { ... },
"repositories": [ ... ]
}
packGlobal metadata about the emitted artifact set.
repositoriesPer-repository entries for both single-repo and multi-repo output.
Pack metadata
Useful pack fields include:
index_json_modeThe resolved sidecar mode used for the emitted payload.
formatThe markdown pack protocol version.
is_splitWhether markdown was emitted as a single pack or as
.index.mdplus.partN.mdfiles.output_filesRelative paths to all emitted markdown files.
display_id_format_version/canonical_id_format_versionExplicit ID schemes for display IDs and machine IDs.
capabilitiesBoolean feature flags such as manifest availability and whether unsplit line ranges are available.
authorityDeclares which artifact is authoritative for full layout, stub layout, and patch flows.
Repository metadata
Each entry in repositories[] describes one packed repository.
Useful fields include:
label/slugHuman-facing and path-safe repository identifiers.
layout/effective_layoutRequested and resolved layout behavior.
nav_modeThe actual navigation density reflected in the rendered markdown.
locator_modeHow direct locators should be interpreted:
anchors+line-rangesfor unsplit markdownanchorsfor split output
locator_space/secondary_locator_spaceThe primary machine-facing locator family, plus an optional secondary family when both markdown and reconstructed locators are emitted.
reconstructed_rootPresent for combined multi-repo packs when reconstructed locators are enabled. Reconstructed paths are relative to the unpack output root, so combined packs use
<slug>/...paths.markdown_pathPresent for unsplit output;
nullfor split output.has_manifest/has_machine_headerTrust and round-trip signals for machine consumers.
partsMetadata for the emitted markdown files belonging to the repository.
index_json_featuresDeclares optional v2 retrieval families such as lookup-map emission and compact symbol index lines.
filesFile-level retrieval, integrity, language, and location metadata.
symbolsSymbol-level occurrence and canonical-body metadata.
lookupReverse indexes for direct access by path or ID.
graph/test_links/guideOptional analysis metadata: import edges, heuristic test coupling, and a repository guide.
package_summaries / entrypoint_paths / centrality_rank /
likely_edit_targets
Optional package and hot-path summaries for quicker subsystem orientation.
reference_graphOptional conservative symbol-call metadata for impact analysis and review.
Mode summary
full(v1)Preserves the current compatibility surface, including display IDs, richer file and symbol metadata, and the larger reverse lookup set.
compact(v2)Keeps machine-first retrieval with direct file and symbol navigation while dropping display-oriented duplication and heavyweight membership metadata. When
index_json_features.lookupis true, the lookup maps are:file_by_pathpart_by_filefile_by_symbolsymbol_by_local_id
minimal(v2)Starts from compact mode and trims additional convenience duplication. It is the smallest v2-compatible sidecar surface rather than the smallest overall sidecar. When
index_json_features.lookupis true, the lookup maps are:file_by_pathsymbol_by_local_id
normalized(v3)Interns repeated strings into
repositories[].tablesand replaces path/module/qualname references with integer indexes. It keeps the same machine-facing essentials as the richer modes while omitting markdown href duplication and v2 lookup maps.
Analysis metadata
When analysis metadata is enabled, the sidecar also exposes:
repositories[].classes[]with first-class class entriesrepositories[].files[].importsrepositories[].files[].exportsrepositories[].files[].module_docstring_linesrepositories[].files[].role_hintrepositories[].files[].inclusion_reasonfor focused packsrepositories[].files[].references_out/references_inrepositories[].symbols[].owner_classrepositories[].symbols[].decoratorsrepositories[].symbols[].references_out/references_inrepositories[].graph.import_edgesrepositories[].test_linksrepositories[].guiderepositories[].package_summaries/entrypoint_pathsrepositories[].reference_graph.call_like_edges
Use --no-analysis-metadata when you want a smaller sidecar and do not need
those architecture-oriented hints.
--profile lean-agent applies that smaller-sidecar posture by default and
also minifies the JSON payload unless you opt back into --index-json-pretty.
Part metadata
repositories[].parts[] records the markdown files that contain repository
content.
Useful fields include:
part_idStable repository-scoped identifier such as
repo:packorrepo:part3.path/kindRelative output path and whether the part is the unsplit pack, split index, or a split content part.
char_count/line_count/token_estimateLightweight sizing information for retrieval and UI decisions.
sha256_contentIntegrity hash of the emitted markdown file content.
containsPrecomputed membership lists for file paths, canonical IDs, display canonical IDs, and section types contained in the part.
File metadata
repositories[].files[] is the main entrypoint for locating source files in
the emitted markdown.
Useful fields include:
path/moduleRepository-relative file path and Python module name when applicable.
part_path/markdown_pathOutput file holding the file body and the unsplit pack path when present.
hrefs/anchorsDirect markdown targets for the file index entry and source body.
locatorsLocator metadata for the file entry. In v1 payloads this still includes the legacy availability booleans plus
markdown,split_part, and/orreconstructedlocator objects. In v2 payloads it carries the locator objects directly.inclusion_reasonPresent for focused packs. Records why the file was selected and which paths pulled it into the pack.
references_out/references_in/unresolved_references_countOptional conservative file-reference summaries derived from Python symbol analysis.
markdown_linesUnsplit line range for the file section when line ranges are available.
language/fence_language/language_familyRendering and retrieval-oriented language metadata.
sha256_original/sha256_stubbed/sha256_effectiveIntegrity hashes for original file content, stubbed content, and the actual packed body.
sizesCharacter, byte, and token estimates for original and effective file bodies.
summary.summary_textDeterministic short prose describing the file’s role and primary symbols.
symbol_ids/display_symbol_ids/symbol_canonical_idsDirect symbol membership for the file.
In normalized v3 payloads, file entries instead use indexed fields such as
p (path), part (part path), lang (language), mod (module), and
optional analysis fields like imp (imports), exp (exports), doc
([start_line, end_line]), role, and sum.st (summary text).
Symbol metadata
repositories[].symbols[] provides both occurrence-level and canonical-body
metadata.
Useful fields include:
display_id/display_local_idShort markdown-facing IDs.
canonical_id/local_idMachine-facing SHA-256 based IDs.
idsNested alias object containing both display and machine IDs.
path/qualname/kind/def_lineSource identity and location.
file_part/file_href/file_anchorDirect location of the file body containing the symbol occurrence.
canonical_part/canonical_href/canonical_anchorCanonical function-library location for stub layout.
index_markdown_lines/file_markdown_lines/canonical_markdown_linesUnsplit markdown line ranges when available.
occurrence_count_for_canonical_idNumber of source occurrences sharing the same canonical body.
locatorsSymbol locator metadata.
markdowncan include file, symbol-index, and canonical markdown ranges;split_partpoints at the split artifact being read;reconstructedpoints at the reconstructed file span and body span.purpose_textDeterministic short prose summarizing the symbol’s role, ownership, and signature hints.
references_out/references_in/unresolved_references_countOptional conservative symbol-reference summaries.
In normalized v3 payloads, symbol entries use compact indexed fields such as
i (local machine ID), c (canonical machine ID when needed), p (path
index),
q(qualname index),k(kind index),l1/l2(line range), plus optionalo(owner class ID),d(decorator indexes), andpt(purpose-text index).
Lookup maps
Use repositories[].lookup when you need constant-shape access instead of
scanning arrays.
Useful maps include:
file_by_pathPath to a compact file summary with part and href metadata.
part_by_filePath to the emitted markdown file containing that file body.
symbols_by_file/display_symbols_by_fileFile-to-symbol membership by machine or display IDs.
file_by_symbol/file_by_display_symbolSymbol-to-file reverse indexes.
symbol_by_local_id/symbol_by_display_local_idDirect symbol entry lookup by occurrence ID.
symbols_by_canonical_id/symbols_by_display_idGrouped symbol entries for canonical-body lookups.
In v2 payloads, check repositories[].index_json_features first. If
lookup is false, consumers should scan files[] and symbols[]
directly instead of assuming lookup is present. If
symbol_index_lines is false, compact payloads intentionally omit
index_markdown_lines even for unsplit packs.
Normalized v3 payloads intentionally do not include the v2 lookup maps. Use the
intern tables plus files[], classes[], symbols[], and the optional
analysis sections directly.
Locator semantics
Locator fields are intended to be truthful with respect to the emitted markdown.
In unsplit output:
anchor hrefs are available
line ranges are also available
compact navigation still preserves machine-targetable anchors
locator_space = markdownpointslocators.markdownintocontext.md
In split output:
hrefs still point to the actual
.index.mdor.partN.mdfileunsplit line ranges are omitted
locators.split_partprovides stable line ranges inside the split artifactconsumers should follow
part_pathand hrefs instead of assuming a single markdown file
When --emit-standalone-unpacker is enabled:
locator_space = autoresolves to reconstructed locatorsfile and symbol
locators.reconstructedpoint at the unpacked file treecombined multi-repo packs prefix reconstructed paths with the repository slug
If a locator field is present, it should resolve against the written output.
Validation helper
codecrate.validate_index_json.validate_index_payload() validates internal
sidecar consistency.
It checks:
output file existence when a base directory is provided
href targets and anchor existence for v1/v2 payloads
part/file/symbol cross references
line-range validity
lookup map consistency for v1/v2 payloads
normalized-table index validity for v3 payloads
Example:
import json
from pathlib import Path
from codecrate.validate_index_json import validate_index_payload
payload = json.loads(Path("context.index.json").read_text(encoding="utf-8"))
errors = validate_index_payload(payload, base_dir=Path("."))
if errors:
raise SystemExit("\n".join(errors))
Query recipes
The schema reference is only half of the consumer story. Common recipes:
find the file for a symbol:
python examples/find_symbol_file.py context.index.json codecrate.cli:mainlist entrypoints plus their reachable file counts:
python examples/list_entrypoints.py context.index.jsonlocate related tests for a changed file:
python examples/find_related_tests.py context.index.json codecrate/pack_pipeline.pyprefer reconstructed locators when they exist:
python examples/prefer_reconstructed_locators.py context.index.json codecrate/cli.pyread normalized tables correctly:
python examples/read_normalized_tables.py context.index.json
Consumer guidance
For most tooling:
start with
repositories[].lookupwhen you already know a path or IDuse
repositories[].files[]to locate the rendered file bodyuse
repositories[].symbols[]when symbol identity or canonical bodies matteruse
repositories[].parts[]to drive split-output retrieval UIs
Prefer machine IDs for stable automation and display IDs only when you need to match existing markdown anchors or present short identifiers to users.