Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Contributing a ballpark item

A ballpark item is a single paper’s entry point into the Econ-ARK ecosystem: enough structure, context, and formalization that an ambitious graduate student can progress the paper from “interesting” through “formal recursive model” toward a REMARK or DemARK candidate in one semester.

This file specifies what a ballpark item should contain and how to submit one. For background on the project, see ‘In the ballpark’ of the Econ-ARK ....


Before you start

  1. Is the paper in scope? Ballpark items are papers that are either (a) serious structural models producing interesting quantitative results, or (b) strong empirical evidence that begs for a model. See ‘In the ballpark’ of the Econ-ARK ... for the two tracks (models/ vs empirical/).

  2. Has the paper been cited enough to matter? The paper must have at least 3 citations in Google Scholar to be eligible as a ballpark candidate. This is a hard gate: it filters for papers whose ideas have begun to circulate in the literature, without excluding recent papers that have not yet accumulated many citations. Paste the Google Scholar citation count (as of submission date) into the submission PR description.

  3. Is it already here? Check models/We-Would-Like-In-Econ-ARK/ for an existing subdirectory under the paper’s citekey. If one exists, open a PR improving it rather than creating a parallel entry.

  4. Is it listed but not yet claimed? If there is a subdirectory but it is thin (legacy “slideware” — one notebook of markdown + figures), your contribution can be to refactor it to the canonical structure below.

  5. None of the above? Open a Wanted Ballpark Paper issue using the provided form (citation, DOI, Google Scholar citation count, track, topic, 3-sentence pitch). We will confirm before you invest effort.


Three layers of a ballpark item

A canonical ballpark item has three layers. The exposition layer is required; the formalization layer turns a summary into a modular-DP scaffold (this is the output of a course-project workflow); the asset layer holds source material for human and AI re-reading.

1. Exposition layer — required

Four notebooks assembled by one <citekey>.md, plus a README.md orientation file. Name the notebooks (and the assembler markdown file) with the paper’s citekey prefix, e.g. benhabib2019_intro.ipynb and benhabib2019.md.

FileContent
README.mdA short orientation file. Written in GitHub-flavored markdown (not MyST) so GitHub auto-renders it inline below the directory listing for anyone landing on the GitHub view of the entry. Use Unicode for math (e.g. Π_r, g(·;τ,r), Π_τ ⊗ Π_r, c ≤ a) rather than $...$. Provides: (1) a prominent callout link to the rendered MyST page (e.g. > 📖 Rendered version: [econ-ark.github.io/ballpark/<citekey>/](…)) at the very top, so a directory-view reader can immediately jump to the proper rendered entry; (2) a brief reading guide for any technical artifacts in the directory — for entries with the Formalization layer, name the conventions used (dolo-plus YAML format, the Matsya iteration loop that produced the YAMLs, inline # unresolved: flags), the excerpt-plus-YAML pairing pattern, and where the construction audit trail lives; (3) a one-line index of the main files in the directory. Keep it short — orientation only, not duplication.
<citekey>.mdMyST page with {include} directives for the four notebooks below (in order), plus YAML frontmatter giving the rendered title and metadata. Name the file with the paper’s citekey, not index.md — mystmd derives the rendered URL slug from the filename, and a citekey-named file gives a clean URL like /<citekey>/ (e.g. /benhabib-et-al-2019/); whereas multiple entries all named index.md would auto-collide and yield ugly slugs (/index-1/, /index-2/, …).
<citekey>_intro.ipynbFull citation with DOI link. Original ballpark author (name + date). Updated by (latest + date). 3-sentence pitch: why the paper is in-ballpark for Econ-ARK.
<citekey>_prior-literature.ipynbWhere the paper sits in the foundational literature (Bewley / Huggett / Aiyagari / de Nardi / ...). Use {cite:t} citations rendered from references.bib.
<citekey>_summary.ipynbNon-technical motivation + findings overview (required at Draft), extended at Primer promotion with a “The Model” section stating the recursive formulation explicitly: no u(c) placeholders, explicit CRRA or EZ kernel, explicit bequest function, explicit transitions, explicit shock distributions, explicit constraint set. This “The Model” section is what the formalization layer will build on.
<citekey>_subsequent-literature.ipynbResearch directions that followed the paper. Cite from subsequent-literature.bib.

Audience and voice for _intro.ipynb and _summary.ipynb. Write for a reader (PhD student or faculty researcher) who is deciding whether to invest 90 minutes in the paper itself. Two self-checks before submitting: (1) Does the first screen of _summary.ipynb give the paper’s headline quantitative result — numbers, not adjectives? (2) Does it explain what makes the result credible — the identification strategy or theoretical mechanism, not just the methodology label? If either answer requires the reader to scroll past several paragraphs of prose to find, condense.

The Benhabib_et_al_2019 item is the reference instance of this layer.

Four files that take the explicit recursive formulation from _summary.ipynb and lift it toward a dolo-plus stage.

FileContent
bellman-excerpt.mdA standalone modular-DDSL Bellman statement produced by iterating with Matsya (see “The formalization iteration” below). Contains: a comprehensive symbol table; a timing convention (numbered steps within one period); a decomposition of the problem into periods, stages, and perches (arrival \prec / decision / continuation \succ); a table listing state, control, shock, constraint, payoff at their native perches; the stage operator T=IB\mathbb{T} = \mathbb{I} \circ \mathbb{B}; explicit utility/bequest forms; and — once the iteration has matured — a “Stage composition” subsection and an EGM channel discussion where applicable (aligned to SolvingMicroDSOPs §§12–13).
dolo-plus-draft.yamlA minimal one-stage YAML (interior period only is acceptable). Any features that do not map cleanly onto canonical dolo-plus syntax must be flagged inline with a # workaround: or # unresolved: comment rather than silently fudged.
verification.mdOne paragraph stating what was accepted, edited, or rejected from Matsya’s output, and why — verified against the published paper, not only the ballpark summary.
matsya-session.txtA single line: the --session string used on every Matsya call for this item (e.g. topics2026-<citekey>). Staff can inspect the server-side conversation by session name; you do not paste the transcript.

The three workaround categories you are most likely to hit are mechanical (non-optimized) deductions, Γ^1γ\hat{\Gamma}^{1-\gamma}-style value-function scaling inside expectations, and state-contingent shock distributions. Flag them; do not hide them.

The formalization iteration (the meat of the work)

The single hardest — and most valuable — part of the formalization layer is arriving at a decomposition of the paper’s problem into periods, stages, and perches. This is not a one-shot translation; it is a loop between a general-purpose AI (Claude, Cursor) and Matsya, the DDSL-aware evaluator. bellman-excerpt.md is the single evolving artifact that this loop produces — not two separate files for “before Matsya” and “after Matsya.”

The iteration runs as follows:

  1. Extract. Ask an AI (Claude Opus recommended) to read the paper (<citekey>.mmd preferred over .pdf) and the recursive formulation in _summary.ipynb, and to draft bellman-excerpt.md as a modular-DDSL Bellman statement: symbol table, timing, a candidate period/stage/perch decomposition, transitions, and movers.

  2. Evaluate. Feed bellman-excerpt.md to Matsya (same --session name each time — record it in matsya-session.txt) and ask it to identify missing or under-specified elements of the decomposition: symbols that appear without a row in the symbol table, perches that are referenced but not defined, transitions that are unlabeled, movers that collapse silently rather than being explicitly labeled as identities, constraint sets that are not carried through, parameters whose domain is unstated, etc.

  3. Improve. Take Matsya’s critique back to the AI and ask it to revise bellman-excerpt.md to address each flagged gap — either by filling it in from the paper, or by marking it as genuinely absent from the paper (a signal that the paper does not pin this down and the formalizer will have to make an explicit modeling choice).

  4. Repeat steps 2–3 until one of two terminating conditions is reached:

    • Success: Matsya reports no further missing elements, the symbol table is closed under reference, every perch and transition is either defined or explicitly labeled degenerate/identity, and the decomposition is coherent. The item is ready for the dolo-plus-draft YAML.

    • Failure (informative): It becomes clear that the paper itself does not specify its problem clearly enough to admit a periods/stages/perches decomposition — e.g. the budget at terminal is undefined, the timing of shock realization is ambiguous, the constraint set changes silently across sections. At this point, stop iterating and record the blocking ambiguities in verification.md under a “Paper under-specifies” heading. The item remains at Primer tier; it cannot reach Formalized until the ambiguities are resolved (by the formalizer making explicit modeling choices, by a companion note reconciling the paper’s inconsistencies, or by correspondence with the authors).

The iteration is where the economics lives. bellman-excerpt.md on its own is a file; the value is in the judgments made during the loop — which of Matsya’s gaps are real, which are artifacts of the evaluator’s priors, which expose genuine ambiguities in the paper. Those judgments are what verification.md records.

3. Asset layer — required in part

FileRequired?Content
<citekey>.pdfrequiredThe paper. If license forbids redistribution, replace with a DOI-only pointer in _intro.ipynb.
<citekey>.mmdlocal-only — gitignored, do not commitMarkdown conversion of the paper PDF for AI ingestion (Cursor, Claude, Matsya). Produce locally via Mathpix (better for math-heavy papers) or pandoc <citekey>.pdf -o <citekey>.mmd. *.mmd is gitignored at the repo root because the markdown is a derivative work of the publisher’s PDF and inherits its copyright. Each contributor maintains their own local copy.
references.bibrequiredBib entries cited from _prior-literature.ipynb and _summary.ipynb. A superset is acceptable — uncited entries (e.g., a broader reading list the contributor maintains) do not need to be pruned. MyST renders only cited entries in the published bibliography.
self.bibrecommendedThe paper’s own bib entry. Keeps the paper citation separable from its context.
subsequent-literature.bibrequired if the notebook is non-emptyBib entries cited from _subsequent-literature.ipynb.
Figures / tables (e.g. fig1.png, Table2.png)as neededUse paper’s own labels where possible.

4. REMARK-ready extension — optional

If the formalization layer has stabilized and you have working code, add a replication/ subdirectory with reproduce.sh, CITATION.cff, binder/environment.yml, and a validated (not draft) dolo-plus stage. At that point you are eligible to move the item to REMARK or DemARK per the criteria in those repos.


Machine-readable metadata (for AI indexing)

Ballpark entries are designed to be discovered and cited by both humans and AI agents. The <citekey>.md frontmatter and an optional AGENTS.md provide the structured signals that make this work.

Required frontmatter fields on <citekey>.md

---
title: "<Paper title> — Ballpark Entry"
schema_type: ScholarlyArticle              # schema.org type; Dataset also acceptable
about:
  doi: 10.XXXX/YYYY                        # paper DOI
  authors: [LastName, LastName, LastName]
  year: 2019
  journal: American Economic Review
keywords: [kebab-case, tags]               # free-form topical tags
econ_ark_topic:                            # controlled vocabulary — pick from:
  - HA-macro                               #   HA-macro, lifecycle, wealth-distribution,
  - wealth-distribution                    #   monetary, fiscal-policy, optimal-taxation,
  - lifecycle                              #   housing, labor, business-cycles,
                                           #   computational-methods, open-economy,
                                           #   liquidity-trap, demographics,
                                           #   financial-crisis, inequality
jel: [D31, E21, J62]                       # JEL codes (array)
difficulty: stretch                        # good-first-ballpark | stretch | research-grade
tier: formalized                           # draft | primer | formalized — see "Ballpark tiers" below
has_formalization_layer: true              # true iff the formalization-layer files exist
ballpark_contributor:
  name: "<name>"
  orcid: "0000-0000-0000-0000"             # optional but strongly encouraged
updated_by:                                # one entry per material revision; most recent last
  - name: "<name>"
    orcid: "..."
    date: 2026-01-27
---

MyST renders this frontmatter as JSON-LD on the published page, which Google Scholar, LLM training pipelines, and retrieval agents recognize. The same frontmatter powers the browsable catalog’s filter UI (one source of truth).

doi: 10.5281/zenodo.XXXXXXX                # Zenodo DOI for this ballpark entry itself
superseded_by: https://github.com/econ-ark/REMARK/...   # once promoted
requires: [CRRA, EGM, bequest-utility]     # model features — free-form tags

A short structured brief aimed at coding agents (Claude Code, Cursor, etc.) that a user’s local session will read when the directory is opened. Distinct from the human-readable <citekey>.md. See the Benhabib_et_al_2019 worked example.

Purpose:

How to produce your AGENTS.md

Copy the template below into AGENTS.md in your item directory and fill in the six sections. Every section has a grounded source in files you have already produced — you should not be inventing content.

SectionWhere its content comes from
Paper<citekey>_intro.ipynb — citation, DOI, one-sentence pitch of why the paper is in-ballpark. Copy verbatim; this is the one place duplication with <citekey>.md is intentional, because the agent may open AGENTS.md first.
If a user asks to work on this item<citekey>_summary.ipynb (section “The Model”) is the authoritative recursive statement. <citekey>.mmd is the AI-friendly paper source — locally-produced (gitignored), so an agent may need to produce one from the PDF if not already present. If your formalization layer is present, point at bellman-excerpt.md as “read first” instead of the summary notebook.
Formalization statusTick which layer files you committed: bellman-excerpt.md, dolo-plus-draft.yaml, verification.md, matsya-session.txt. Be honest about what is not yet present.
Known model features requiring attentionPull from verification.md (the items you rejected or edited) and from the inline # workaround: / # unresolved: comments in dolo-plus-draft.yaml. This is the single most useful section for an agent — it is the list of things it should not re-discover. If the formalization layer is absent, list the model features you already know will be awkward (state-contingent shocks, mechanical deductions, non-standard normalizations, etc.).
Common next tasksList what you intentionally left undone. Examples from real items: “add terminal-period stage to YAML”, “formalize the dynasty wrapper”, “add age-varying wage overrides”. Cite the specific file or line a next-task should touch. Do not list tasks you would have liked to do but have no grounding for.
Workflow remindersMostly boilerplate. Keep the Matsya session-naming convention (topics2026-<slug> for coursework), the paper-verification reminder, and the workaround-comment convention. Delete anything that does not apply to your item.

Template (copy and fill in):

# Ballpark entry: <Authors> (<Year>)

> Structured brief for coding agents (Claude Code, Cursor, etc.). Human-facing content lives in [`<citekey>.md`](<citekey>.md).

## Paper

- **Citation:** <Authors (Year), "Title," Journal vol(issue), pages>.
- **DOI:** [<doi>](https://doi.org/<doi>)
- **Core model:** <one-sentence description: lifecycle / HA / OLG / ..., key state and control, what's stochastic, what closes the problem>.
- **Why in-ballpark:** <one sentence: what makes this paper interesting for Econ-ARK>.

## If a user asks to work on this item

1. **Read first:** <file> — <why this is authoritative>.
2. **Paper source for AI ingestion:** `<citekey>.mmd` (locally-produced via Mathpix or pandoc; gitignored, not committed). Prefer this over `<citekey>.pdf` for AI ingestion; produce a local `.mmd` from the PDF if one isn't already present.

## Formalization status

- Explicit recursive formulation: <present in `_summary.ipynb` | not yet stated>.
- `bellman-excerpt.md`: <committed | not committed> (product of the Matsya iteration loop).
- `dolo-plus-draft.yaml`: <committed | not committed>.
- `verification.md`: <committed | not committed>.
- `matsya-session.txt`: <committed | not committed>.

## Known model features requiring attention in a formalization pass

- <feature 1>: <what's awkward and why; how you worked around it or plan to>.
- <feature 2>: ...
- <feature 3>: ...

## Common next tasks (grounded)

1. <task 1, tied to a specific file or section>.
2. <task 2>.
3. <task 3>.

## Workflow reminders

- **Matsya session:** use `topics2026-<slug>` for new work on this item.
- **Paper verification:** Matsya output must be checked against the paper PDF (or `.mmd`), not only against the ballpark `_summary.ipynb`.
- **When flagging workarounds in YAML:** use inline `# workaround:` or `# unresolved:` comments rather than silently fudging non-canonical syntax.

AI-assisted drafting (recommended). Once your formalization layer is present, ask a coding agent (Claude Code, Cursor) to draft AGENTS.md from your item’s files:

Read <citekey>.md, <citekey>_intro.ipynb, <citekey>_summary.ipynb, bellman-excerpt.md, dolo-plus-draft.yaml, and verification.md in this directory. Draft an AGENTS.md following the template in the repo-root CONTRIBUTING.md. Do not invent content — if a section lacks a grounded source in these files, write TBD for that section and explain what you would need.

Then review carefully. Agents occasionally invent plausible-sounding “next tasks” or “workarounds” that are not grounded in your verification notes. Rewrite anything you cannot trace to a specific file. The point of AGENTS.md is that a later agent can trust it; that trust is wasted if you pass through hallucinations.

Repo-level artifacts (maintained centrally, not per item)

Content-form conventions for LLM legibility

Model structure as first-class data (stretch)

For items with a committed dolo-plus-draft.yaml, a generated model.json extracts the stage(s) into a programmatic form. This lets retrieval agents answer structural queries like “find all ballpark items with an EGM-compatible interior stage” or “which items have Markov-chain employment states.” The extractor is maintained centrally; contributors do not hand-write model.json.

AI provenance (optional)

If AI tools materially shaped the formalization layer, add ai-provenance.md documenting which tools played which role and linking the session artifacts. This gives both credit and traceability.


What does not belong in a ballpark item


Authorship and provenance

The intro notebook carries provenance as visible section content, not buried frontmatter:

**Original ballpark author:** <name>, <YYYY-MM-DD>
**Updated by:** <name>, <YYYY-MM-DD>
**Superseded by:** <link to REMARK / DemARK if promoted>

When you revise an existing item, add (do not overwrite) an Updated by line. When an item is promoted to REMARK or DemARK, add a Superseded by pointer rather than deleting the ballpark entry — the ballpark retains historical interest.


Ballpark tiers

Ballpark items progress through three tiers of increasing formalization completeness — analogous in spirit to REMARK’s standard/published distinction but scoped entirely to pre-implementation work. The ballpark’s job is to land a well-specified model ready for a coder; the implementation step (working reproduce.sh, CITATION.cff, binder/environment.yml) happens in REMARK / DemARK, not here.

Each tier is a plateau with a concrete, reviewable qualifying checklist. Contributors can stop at any tier indefinitely.

TierOne-line characterizationTypical effort from the previous tier (AI-assisted, PhD-course-assignment units)
DraftPaper identified, claimed, and minimally cataloged.≈ 1 weekly assignment (from zero / from a wanted-ballpark issue).
PrimerA reader can understand the paper and its context without reading the paper.≤ 2 weekly assignments (from Draft).
FormalizedThe model is stated in modular-DDSL form, with a dolo-plus YAML draft.≤ 2 weekly assignments (from Primer).

Each name presupposes the tier below it: a primer is a completed introductory treatment of what a draft only sketches; a formalized specification is the rigorous re-expression of what the primer states informally. Rank order is unambiguous from the names alone.

(A pre-tier state, Wanted, is an open issue labeled wanted-ballpark with bibliographic info. It has no directory.)

Draft

“I am claiming this paper and committing to minimal cataloging.”

Qualifying checklist:

Draft is the minimum mergeable contribution. It converts a wanted-ballpark issue into a claimed directory.

Primer

“A graduate student can orient themselves around this paper without reading it.”

Qualifying checklist — everything in Draft, plus:

Primer is the current aspirational target for the typical legacy-slideware refactor. Benhabib_et_al_2019 is the reference instance of this tier.

Formalized

“The model has been translated into a modular-DP specification ready for a coder.”

Qualifying checklist — everything in Primer, plus:

Formalized is the ballpark’s top tier. A Formalized item is ready to be picked up by a coder (human or agent) and promoted to REMARK or DemARK — the implementation work happens there, not here.

Beyond Formalized: promotion out of the ballpark

Once a Formalized item has working code reproducing paper results, it is eligible for promotion to REMARK (for substantial replications) or DemARK (for demonstrations). REMARK itself has a tiering (standard vs. published-with-DOI); those criteria are documented at the REMARK repo and are not this repository’s concern.

When an item is promoted, add a Superseded by pointer in _intro.ipynb rather than deleting the ballpark entry — the entry retains historical and pedagogical interest.

Promotion mechanics within the ballpark

Review policy

Review requirements depend on the target tier.

Automated checks (CI)

A .github/workflows/ballpark-check.yml action (forthcoming in a follow-up PR) will run per-tier checks and post a status on the PR. A contributor’s checklist tick is not sufficient at any tier — CI must also pass.

Per-tier mechanical gates the CI will enforce:

What CI does not check at Formalized (and therefore what the human reviewer is responsible for): the Bellman equation being correct, the perch decomposition being correct, the YAML workarounds being defensible, and verification.md genuinely comparing against the published paper.

Badges

Each item’s rendered page carries a tier badge (Draft / Primer / Formalized) at the top. Catalog cards show the badge so visitors can filter by tier (e.g. “show me all Primer items that need promotion to Formalized” — a natural call-to-contribute).

The badge derives from the tier: frontmatter field; the MyST build pipeline renders it automatically. Contributors do not hand-insert badge markdown.

Effort calibration (for contributors and instructors)

Effort is expressed in PhD-course-assignment units assuming AI-assisted workflow (Cursor + Claude + Matsya). These estimates are generous upper bounds:

StepUpper bound
→ Draft1 weekly assignment
Draft → Primer≤ 2 weekly assignments
Primer → Formalized≤ 2 weekly assignments
Total from zero to Formalized≤ 5 weekly assignments

These estimates guide course-project scoping: a full semester leaves ample room for a student to take a paper all the way to Formalized and start on the replication step (which then belongs in REMARK, not here).


Pre-merge checklist

The target tier determines the checklist. Copy the target tier’s qualifying checklist from the section above into your PR body and tick each box with a file-line citation. In addition, every PR (regardless of tier) must confirm:


Submitting

  1. Fork the repo and branch from master with a descriptive name (e.g. add-<citekey> or refactor-<citekey>).

  2. Commit the item in its own directory under models/We-Would-Like-In-Econ-ARK/<citekey>/ (or empirical/<citekey>/).

  3. Open a PR titled Add <citekey> or Refactor <citekey>.

  4. In the PR description, state which layers you produced and which you intentionally skipped.