Testing Guide#

Purpose#

The test setup separates quick correctness checks from expensive benchmark runs and tutorial/notebook validation.

The goals are:

  • fast PR feedback for changed code

  • reproducible baselines for selected regression tests

  • scheduled heavy runs (large and benchmark)

  • smoke checks to keep tutorials and notebooks up-to-date

Folder Structure#

The tests directory is organized by intent first, then by package/backend:

  • tests/unit/

    • small, focused unit tests

    • no heavy runtime expectations

    • split into zoomy_core and zoomy_jax

  • tests/regression/

    • behavior/regression checks on small canonical cases

    • includes tutorial import smoke checks

    • split into zoomy_core and zoomy_jax

  • tests/benchmarks/

    • expensive performance-oriented tests

    • disabled by default in local/PR runs

  • tests/scripts/

    • script-style test scenarios (recent SWE v2 benchmark/check scripts)

    • imported by pytest smoke/regression tests

  • tests/results/baselines/

    • tiny reference artifacts used for regression comparisons

    • created automatically if missing or when ZOOMY_CREATE_BASELINES=1

  • tests/common/

    • shared test helpers (baseline storage, utilities)

  • tests/notebooks/

    • notebook smoke list and notebook testing support files

  • tests/old/

    • archived legacy tests kept for historical reference

    • excluded from active pytest discovery

  • tests/reporting/

    • test-report generation scripts (HTML/JUnit)

    • notebook validation and jupytext compile checks

Markers#

Important pytest markers:

  • small: quick tests for PR/local iteration

  • tutorial: tutorial smoke checks (orthogonal intent marker)

  • jax, numpy: backend-specific grouping

  • core, amrex, petsc, firedrake: runtime/container-specific grouping

  • large, benchmark: expensive tests (scheduled/manual)

Default local fast run:

pytest tests -m "small or tutorial"

Recommended stack-selective runs:

pytest tests -m "small and core"
pytest tests -m "small and jax"
pytest tests -m "small and amrex"
pytest tests -m "small and petsc"
pytest tests -m "small and firedrake"

Tutorial checks should always combine tutorial with a runtime marker:

pytest tests -m "small and tutorial and core"
# or:
pytest tests -m "small and tutorial and jax"

Baseline Workflow#

Some regression tests compare against compact baseline files in tests/results/baselines.

  • If baseline exists: test compares current output against baseline.

  • If baseline missing: baseline is created.

  • To refresh baselines intentionally:

ZOOMY_CREATE_BASELINES=1 pytest tests -m small

CI Workflows#

Containers workflow#

.github/workflows/build-containers.yml (workflow name Containers)

  • Builds and pushes stack images to GHCR (zoomy_core, zoomy_jax, zoomy_firedrake, dev bases, placeholders, …) when Dockerfiles, install/*.yml, or selected library pyproject.toml files change.

Smart test workflow#

.github/workflows/tests-report.yml (workflow name Smart Tests)

  • Runtime: each stack job logs into GHCR, pulls the matching image, bind-mounts the repo at /workspace, runs pip install -e for the relevant library/* packages inside the container, then tests/reporting/generate_test_report.py (so CI exercises the checked-out tree against the image’s solver stack).

  • path-aware test selection on PRs per runtime group:

    • tutorial tests run inside their runtime group job (no dedicated tutorial runtime lane)

    • core / jax

    • amrex

    • dmplex / fenicsx (split paths; dmplex and firedrake share one container)

    • firedrake

    • PRs that only touch containers, install/, or the Containers workflow hit the infra path and run all stack jobs.

  • After images: when Containers finishes successfully, Smart Tests is triggered again via workflow_run, checking out the same commit (workflow_run.head_sha) so pulls of :latest match the images just pushed. Pushes that match the Smart Tests path filters can still start a run in parallel; the workflow_run pass is the one guaranteed to see fresh GHCR tags for container-only changes.

  • scheduled and manual large / benchmark runs: one job per stack (same backends as small), merged into test-reports-large-bundle

  • manual run with optional large test toggle

  • HTML + JUnit per stack job; follow-up jobs merge stack artifacts into test-reports-small-bundle and test-reports-large-bundle so docs can download two artifacts (small vs large, each with per-stack folders).

  • Render Webpage downloads the latest completed bundles of those names before building the book, and can also run after Smart Tests via workflow_run (so the usual chain is Containers → Smart Tests → Render Webpage when all succeed).

  • Optional extra dependency pins for local or legacy setups live under tests/requirements/*.txt; the Smart Tests workflow does not use those files on the GitHub runner (dependencies come from the container images plus editable installs of the repo).

Notebook workflow#

.github/workflows/notebooks.yml

  • PR: validate changed notebooks + jupytext temporary conversion compile check

  • schedule/manual: validate all notebooks

  • optional smoke execution using tests/notebooks/smoke_notebooks.txt

Notebook Policy#

  • Source of truth remains .ipynb (for docs publishing).

  • No paired .py notebook files are committed.

  • jupytext is used only transiently in checks:

    • convert notebook content to temporary Python text

    • run compile/syntax check

    • discard temporary files

This keeps notebook docs authoritative while still improving maintainability and CI diagnostics.