← All work

Bruno

Read paper

Bruno is a semi-autonomous coordination agent for scientific research teams, co-authored with Andreas Haupt as part of Prototyping Intelligent Organizations. Rather than generating hypotheses or running experiments, Bruno covers the coordination layer: it ingests state from the tools a lab already uses — Slack, GitHub, Overleaf, Weights & Biases, calendars, transcripts — maintains a project-scoped model of tasks, decisions, and artifacts, and surfaces that state through Slack and dashboards. Its action space is restricted by design to messages, dashboards, and human-confirmed task mutations; it has no write access to code, manuscripts, datasets, or instruments. We argue this constraint is the load-bearing design choice for deploying agents in high-stakes scientific workflows.

SlackGitHubOverleafW&BCal / EmailUploadsIngest layerread-only adapters · event + poll triggersTask graphDecision logArtifact indexFailure logModel routerpolicy: routine loop vs. plan revisionSmall modelroutine loopFrontier modelplan revisionSlack msg / canvasDashboardState mutationhuman reads + replies
Data flows from the lab's tools through a read-only ingest layer into a project-scoped state store. A model router splits routine state-keeping (small model) from plan revision (frontier model); the only write paths are Slack messages, dashboards, and human-confirmed state mutations.

Accepted at ICML 2026

Bruno: A Constrained Coordination Agent for Scientific Research Teams, accepted at the ICML 2026 AI for Science Workshop with Andreas Haupt.

Constrained by Design

No write access to code, manuscripts, datasets, or instruments — actions are limited to messages, dashboards, and human-confirmed task mutations.

Cross-Tool Coordination

Links a finished W&B sweep to its todo, logs closing PRs, and flags Overleaf edits — one project-scoped model across the tools a lab already uses.

Four-Layer Architecture

Read-only ingest adapters feed a typed, project-scoped state store (task graph, decision log, artifact index, failure log); a model router runs routine state-keeping on small models and reserves a frontier model for plan revision; output goes to Slack and dashboards.

Evaluation Roadmap

The paper proposes a longitudinal, within-team evaluation — a planned quarter-long pilot in a graduate computational biology course measuring transactive memory, shared mental models, and perceived coordination effectiveness, rather than one-shot benchmarks.