Test automation playbook

Build resilient pipelines: device abstraction, state machines, structured logging, and automated report generation.

1) Architecture

1.1 Device layer — unified drivers + simulators

Goal: One interface for real hardware and CI simulators.

Pattern: IDevice interface → concrete UsbPump, TcpSensor + SimPump, SimSensor.

Runtime selection: env flag DEVICE_BACKEND=sim|real.
Dependency injection: pass device handles into controllers (no globals).

# devices/base.py
class Pump:
    async def prime(self, volume_ml: float) -> None: ...
    async def dispense(self, volume_ml: float) -> None: ...
    async def status(self) -> dict: ...

# devices/usb_pump.py / devices/sim_pump.py implement Pump

CI: default to Sim* backends; simulate latency, jitter, and faults.

1.2 Controller layer — state machines with timeouts & retries

Each procedure = explicit FSM (states, transitions, guards). Built‑ins: per‑step timeout, bounded retries, exponential backoff, and abort on hazard.

# controllers/procedure.py
from enum import Enum, auto

class S(Enum): IDLE=auto(); PRIME=auto(); DISPENSE=auto(); VERIFY=auto(); DONE=auto(); FAIL=auto()

async def run(ctx):
    s=S.IDLE
    while True:
        if s is S.IDLE:
            s=S.PRIME
        elif s is S.PRIME:
            await with_retry(ctx.pump.prime, volume_ml=2.0, timeout=5, retries=3)
            s=S.DISPENSE
        elif s is S.DISPENSE:
            await with_retry(ctx.pump.dispense, volume_ml=10.0, timeout=10, retries=2)
            s=S.VERIFY
        elif s is S.VERIFY:
            ok = await verify_volume(ctx)
            s = S.DONE if ok else S.FAIL
        elif s in (S.DONE, S.FAIL):
            return s

1.3 Data layer — schemas & versioning

Single source of truth for samples, configs, results. Version every schema; keep migrations in repo.

# schemas/config.schema.json (v3)
$schema: "https://json-schema.org/draft/2020-12/schema"
title: "RunConfig"
version: 3
type: object
properties:
  run_id: {type: string}
  sample_id: {type: string}
  target_volume_ml: {type: number}
  device_profile: {type: string, enum: [sim, real]}
required: [run_id, sample_id, target_volume_ml]

runs/
  2025-11-03T10-12-22Z_run-8421/
    config.v3.json
    results.v2.json
    artifacts/
      logs.ndjson
      traces/
      attachments/

2) Reliability

2.1 Idempotent steps & checkpoints

Idempotency key per step (e.g., run_id:step_name:index) to avoid double‑actions. Checkpointing: write state.json after each successful step (atomic rename).

def checkpoint(run_dir, step_name, payload):
    tmp = run_dir/".state.json.tmp"
    json.dump({"step": step_name, **payload}, tmp.open("w"))
    tmp.replace(run_dir/"state.json")

Resume logic: on start, read last checkpoint and jump to the next state.

2.2 Structured logs + metrics + alerts

Logs: newline‑delimited JSON. Always include run_id, step, device, ts, level.

{"ts":"2025-11-03T10:12:28Z","level":"INFO","run_id":"8421","step":"DISPENSE","ml":10.0,"lat_ms":842}

Metrics: counters, gauges, histograms. Alerts on SLO breaches and error rate spikes.

- alert: HighFailureRate
  expr: sum(rate(procedure_fail_total[5m])) / sum(rate(procedure_start_total[5m])) > 0.05
  for: 10m
  labels: {severity: page}

2.3 Golden tests & fuzzing

# Golden
expected = Path("goldens/result_v2.json").read_text()
assert normalize(actual_json) == normalize(expected)

# Fuzz (hypothesis)
from hypothesis import given, strategies as st
@given(st.text(min_size=0, max_size=1024))
def test_parser_never_crashes(s):
    parse_csv_maybe(s)  # should not raise

3) Reporting & Distribution

3.1 Generate signed PDFs/CSV

Render PDF from HTML with plots; CSV as machine‑readable results. Attach a detached signature & manifest.

SHA256  reports/run-8421.pdf  9a1e...
SHA256  reports/run-8421.csv  4b7c...

3.2 Automate distribution & archival

Bundle trace.tgz with config, logs, results, reports, manifest, and signature. Upload to object storage; notify Slack/Email. Apply lifecycle rules (e.g., 180 days).

4) CI/CD wiring

4.1 GitHub Actions (example)

name: pipeline
on: [push, workflow_dispatch]
jobs:
  test:
    runs-on: ubuntu-latest
    env: { DEVICE_BACKEND: sim }
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.12' }
      - run: pip install -r requirements.txt
      - name: Unit + golden + fuzz smoke
        run: |
          pytest -q tests/unit
          pytest -q tests/golden
          pytest -q -k "fuzz and smoke"
      - name: Build report artifact
        run: python tools/build_report.py --run-id ${{ github.run_id }}
      - uses: actions/upload-artifact@v4
        with:
          name: trace-bundle
          path: runs/**/artifacts/*

4.2 Gates

✅ All device simulators pass.
✅ Golden diffs clean (or explicitly updated in PR).
✅ Coverage ≥ threshold on controllers & parsers.
✅ Lint + typecheck green.

5) Minimal templates (drop‑in)

Structured logging helper

import json, sys, time
def log(event, **kw):
    kw.setdefault("ts", time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()))
    sys.stdout.write(json.dumps({"event": event, **kw}) + "\n")

Retry with timeout

import asyncio
async def with_retry(fn, *, timeout, retries, backoff=0.5, **kw):
    for i in range(retries + 1):
        try:
            return await asyncio.wait_for(fn(**kw), timeout)
        except Exception:
            if i == retries: raise
            await asyncio.sleep(backoff * (2 ** i))

Report builder (skeleton)

# tools/build_report.py
def build(run_dir):
    data = json.load(open(run_dir/"results.v2.json"))
    html = render_html(data)       # your template
    pdf_path = pdf_from_html(html) # your engine
    csv_path = write_csv(data)
    write_manifest_and_sign([pdf_path, csv_path])

6) Checklists

Device layer

Real & simulated drivers implement same interface
Fault injection knobs (latency, drop, corrupt)
Hardware feature flags in config schema

Controllers

Explicit FSM per procedure
Timeouts + retries + backoff per step
Idempotency keys + checkpoints

Data & logs

Versioned schemas + migrations
NDJSON logs with run_id, step
Metrics: latency histograms, error counters

Testing

Golden tests for critical paths
Fuzz tests for parsers
CI default to simulators

Reporting

PDF + CSV rendered and signed
Trace bundle packaged, uploaded, retained
Notifications sent with links

7) Example “happy path” flow

Receive RunConfig v3 → validate schema.
Spin up Sim* or real devices via DI.
Execute controller FSM with checkpoints.
Emit NDJSON logs + metrics.
Persist results.v2.json.
Generate PDF/CSV + signatures.
Bundle traces → upload → notify → archive with retention policy.

Want a tailored playbook for your lab? Request a workshop.