Test automation playbook
Build resilient pipelines: device abstraction, state machines, structured logging, and automated report generation.
1) Architecture
1.1 Device layer — unified drivers + simulators
Goal: One interface for real hardware and CI simulators.
Pattern: IDevice interface → concrete UsbPump, TcpSensor + SimPump, SimSensor.
- Runtime selection: env flag
DEVICE_BACKEND=sim|real. - Dependency injection: pass device handles into controllers (no globals).
# devices/base.py
class Pump:
async def prime(self, volume_ml: float) -> None: ...
async def dispense(self, volume_ml: float) -> None: ...
async def status(self) -> dict: ...
# devices/usb_pump.py / devices/sim_pump.py implement Pump
CI: default to Sim* backends; simulate latency, jitter, and faults.
1.2 Controller layer — state machines with timeouts & retries
Each procedure = explicit FSM (states, transitions, guards). Built‑ins: per‑step timeout, bounded retries, exponential backoff, and abort on hazard.
# controllers/procedure.py
from enum import Enum, auto
class S(Enum): IDLE=auto(); PRIME=auto(); DISPENSE=auto(); VERIFY=auto(); DONE=auto(); FAIL=auto()
async def run(ctx):
s=S.IDLE
while True:
if s is S.IDLE:
s=S.PRIME
elif s is S.PRIME:
await with_retry(ctx.pump.prime, volume_ml=2.0, timeout=5, retries=3)
s=S.DISPENSE
elif s is S.DISPENSE:
await with_retry(ctx.pump.dispense, volume_ml=10.0, timeout=10, retries=2)
s=S.VERIFY
elif s is S.VERIFY:
ok = await verify_volume(ctx)
s = S.DONE if ok else S.FAIL
elif s in (S.DONE, S.FAIL):
return s
1.3 Data layer — schemas & versioning
Single source of truth for samples, configs, results. Version every schema; keep migrations in repo.
# schemas/config.schema.json (v3)
$schema: "https://json-schema.org/draft/2020-12/schema"
title: "RunConfig"
version: 3
type: object
properties:
run_id: {type: string}
sample_id: {type: string}
target_volume_ml: {type: number}
device_profile: {type: string, enum: [sim, real]}
required: [run_id, sample_id, target_volume_ml]
runs/
2025-11-03T10-12-22Z_run-8421/
config.v3.json
results.v2.json
artifacts/
logs.ndjson
traces/
attachments/
2) Reliability
2.1 Idempotent steps & checkpoints
Idempotency key per step (e.g., run_id:step_name:index) to avoid double‑actions. Checkpointing: write state.json after each successful step (atomic rename).
def checkpoint(run_dir, step_name, payload):
tmp = run_dir/".state.json.tmp"
json.dump({"step": step_name, **payload}, tmp.open("w"))
tmp.replace(run_dir/"state.json")
Resume logic: on start, read last checkpoint and jump to the next state.
2.2 Structured logs + metrics + alerts
Logs: newline‑delimited JSON. Always include run_id, step, device, ts, level.
{"ts":"2025-11-03T10:12:28Z","level":"INFO","run_id":"8421","step":"DISPENSE","ml":10.0,"lat_ms":842}
Metrics: counters, gauges, histograms. Alerts on SLO breaches and error rate spikes.
- alert: HighFailureRate
expr: sum(rate(procedure_fail_total[5m])) / sum(rate(procedure_start_total[5m])) > 0.05
for: 10m
labels: {severity: page}
2.3 Golden tests & fuzzing
# Golden
expected = Path("goldens/result_v2.json").read_text()
assert normalize(actual_json) == normalize(expected)
# Fuzz (hypothesis)
from hypothesis import given, strategies as st
@given(st.text(min_size=0, max_size=1024))
def test_parser_never_crashes(s):
parse_csv_maybe(s) # should not raise
3) Reporting & Distribution
3.1 Generate signed PDFs/CSV
Render PDF from HTML with plots; CSV as machine‑readable results. Attach a detached signature & manifest.
SHA256 reports/run-8421.pdf 9a1e...
SHA256 reports/run-8421.csv 4b7c...
3.2 Automate distribution & archival
Bundle trace.tgz with config, logs, results, reports, manifest, and signature. Upload to object storage; notify Slack/Email. Apply lifecycle rules (e.g., 180 days).
4) CI/CD wiring
4.1 GitHub Actions (example)
name: pipeline
on: [push, workflow_dispatch]
jobs:
test:
runs-on: ubuntu-latest
env: { DEVICE_BACKEND: sim }
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: '3.12' }
- run: pip install -r requirements.txt
- name: Unit + golden + fuzz smoke
run: |
pytest -q tests/unit
pytest -q tests/golden
pytest -q -k "fuzz and smoke"
- name: Build report artifact
run: python tools/build_report.py --run-id ${{ github.run_id }}
- uses: actions/upload-artifact@v4
with:
name: trace-bundle
path: runs/**/artifacts/*
4.2 Gates
- ✅ All device simulators pass.
- ✅ Golden diffs clean (or explicitly updated in PR).
- ✅ Coverage ≥ threshold on controllers & parsers.
- ✅ Lint + typecheck green.
5) Minimal templates (drop‑in)
Structured logging helper
import json, sys, time
def log(event, **kw):
kw.setdefault("ts", time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()))
sys.stdout.write(json.dumps({"event": event, **kw}) + "\n")
Retry with timeout
import asyncio
async def with_retry(fn, *, timeout, retries, backoff=0.5, **kw):
for i in range(retries + 1):
try:
return await asyncio.wait_for(fn(**kw), timeout)
except Exception:
if i == retries: raise
await asyncio.sleep(backoff * (2 ** i))
Report builder (skeleton)
# tools/build_report.py
def build(run_dir):
data = json.load(open(run_dir/"results.v2.json"))
html = render_html(data) # your template
pdf_path = pdf_from_html(html) # your engine
csv_path = write_csv(data)
write_manifest_and_sign([pdf_path, csv_path])
6) Checklists
Device layer
- Real & simulated drivers implement same interface
- Fault injection knobs (latency, drop, corrupt)
- Hardware feature flags in config schema
Controllers
- Explicit FSM per procedure
- Timeouts + retries + backoff per step
- Idempotency keys + checkpoints
Data & logs
- Versioned schemas + migrations
- NDJSON logs with run_id, step
- Metrics: latency histograms, error counters
Testing
- Golden tests for critical paths
- Fuzz tests for parsers
- CI default to simulators
Reporting
- PDF + CSV rendered and signed
- Trace bundle packaged, uploaded, retained
- Notifications sent with links
7) Example “happy path” flow
- Receive RunConfig v3 → validate schema.
- Spin up Sim* or real devices via DI.
- Execute controller FSM with checkpoints.
- Emit NDJSON logs + metrics.
- Persist results.v2.json.
- Generate PDF/CSV + signatures.
- Bundle traces → upload → notify → archive with retention policy.
Want a tailored playbook for your lab? Request a workshop.