When to Speak (and When Not To)

Jun 12, 2026 · 9 min read

Think about the worst colleague you've ever had in a group chat. Either they replied to everything, narrating their own work, asking questions they could have answered themselves, pinging the channel for things that mattered to no one. Or they were the opposite: silent, present but unreachable, the person you had to address three times before anything happened.

Agents deployed into team channels become one of these two colleagues by default. After two years of running research agents inside the Slack workspaces of working labs, I've come to think the turn-taking problem — when to speak, when to stay quiet, when to act without announcing it — is harder than most of what we call capability, and almost completely unmeasured.

The old problem under new machinery

There is a whole literature for this, but the useful version fits in a few ideas.

Conversation analysts studying ordinary talk noticed that turn-taking is not just politeness. It is the machinery by which a group decides who has the floor, who is being addressed, whether the current turn is complete, and who is entitled to go next. Grice gave the compressed rule for good conversational contribution: say what is needed, say what is true, say what is relevant, and say it clearly. Clark and Brennan added the part every working team recognizes: communication is not done when words are emitted; it is done when enough common ground exists for coordinated action to continue.

Catherine Cramton's work on dispersed collaboration makes the Slack version especially sharp. Distributed teams fail when mutual knowledge is missing: people do not know what others know, cannot tell what silence means, and attribute gaps to carelessness rather than context. Human-AI interaction research says the same thing in design language. Horvitz's mixed-initiative work treats timing, uncertainty, and the cost of interruption as first-class variables. Amershi and colleagues' guidelines emphasize making AI behavior visible, invocable, dismissible, correctable, and calibrated to context.

That is the frame I wish I had when we started. The question is not "should the agent reply?" The question is what kind of conversational move the situation calls for.

A longer taxonomy of speaking

1. Claiming the turn

One morning at 5:05 AM, a researcher typed a plain instruction into a project channel: download this file to drive as well. The agent had been doing exactly this kind of work in that channel for days. But the message didn't mention the agent by name, the routing layer decided it was ambient human chatter, and nothing happened. No reply, no error, no acknowledgment. The researcher discovered the gap hours later.

This is a failure to claim the turn. The message was not syntactically addressed to the agent, but it was pragmatically addressed to it. In a working group, address is not just a mention. It is role, recent history, task ownership, and the shape of the verb. If you have been downloading files all week and someone says "download this one too," that turn is yours.

The mistake is designing invocation as a wake word problem. In teams, invocation is often elliptical. A good collaborator recognizes the half-addressed instruction because they know the work they have been doing.

2. Giving receipt

There is a weaker move than answering: receipt. Humans do this constantly. "Got it." A nod. A check mark. A glance that says the message landed. Receipt does not mean the work is done and does not even mean the speaker agrees. It means the channel no longer has to wonder whether silence means absence.

Agents need this move badly because silent non-action is almost impossible to debug. If an agent chooses not to act, there should be some cheap trace of that choice. Not every ambient message deserves a reply in the channel, but low-confidence routing decisions should be inspectable somewhere. The point is not to make agents chattier. It is to distinguish quiet judgment from downtime.

3. Grounding before action

Grounding is the work of making sure enough shared understanding exists to keep going. The key word is enough. A system that demands explicit confirmation after every interpretation feels safe to its designers and tedious to its users. A system that never checks its interpretation is fast until it is wrong.

The rule we converged on is risk-weighted grounding. If the next step is reversible, cheap, and consistent with the stated goal, continue. If it changes canonical state, spends money, deletes data, messages an outsider, or narrows an irreversible search path, ground first. "I'm going to write these 13 affiliation-matched h-indexes and hold back three ambiguous matches" is useful grounding. "Would you like me to continue?" after the goal has been restated four times is not.

4. Continuing under stable intent

I recently watched one of our agents work through enriching a database, a long multi-session task it was genuinely good at. But it ended turn after turn with a question. Would you like me to continue with the extraction, or focus on the remaining lookups? My replies, in order: the goal restated, then "do it", then "yes". The agent already knew the answer to every question it asked.

That is not humility. It is a bad handoff of initiative. Once a user has set a goal and the next step is obvious, continuing is the respectful move. Asking permission feels safe to the agent because it transfers responsibility back to the human. In a group channel, it also spends the attention of everyone watching.

The design distinction is between permission and steering. Ask for permission when the action changes scope or risk. Ask for steering when there are multiple plausible paths with different tradeoffs. Do not ask when the only honest reason is that the agent has reached the end of a turn and wants a conversationally comfortable exit.

5. Reporting progress

Progress reports are not status theater. They are useful when they preserve state someone is about to lose: what was attempted, what succeeded, what failed, what remains, and what evidence backs the claim. They are noise when they narrate work that nobody needs to coordinate around.

A good progress update has a compression ratio. It should leave the team with a better model of the task than before the message arrived. "Still working" rarely does that. "37 biographies filled, 20 h-index blanks remain, common-name cases held back until affiliation match" does. The second message creates shared state; the first merely asks to be seen.

6. Interrupting

Interruptions should be reserved for divergence and loss. The plan says one thing, the observed work says another. A deadline is about to pass. Two people are duplicating work. A background job has failed silently. A claimed result conflicts with the artifact on screen. The agent has information that the team does not have and the cost of waiting is higher than the cost of breaking attention.

This is the core bet of Bruno, the coordination agent we wrote up for ICML: not that an agent should talk more, but that it should speak when stated plans drift from execution. Interruption is justified when the agent holds state the group is about to lose. Most messages do not meet that bar.

7. Escalating uncertainty

Uncertainty is not a reason to stop by default. It is a reason to choose a smaller action. The h-index task made this concrete. Some values had strong affiliation-confirmed matches and could be written. Some values were plausible but not verified. One result looked like an h-index of 388, which was almost certainly a citation-count parse error. The right behavior was not to ask the user about every field and not to write everything blindly. It was to write the verified values, hold back the ambiguous ones, and explain the boundary.

Escalation should happen at the edge between safe partial progress and risky mutation. If the agent can keep moving without corrupting shared state, it should. If the only available next step would create a false fact, it should stop and say exactly why.

8. Repairing the record

In ordinary conversation, repair is the move where someone fixes a misunderstanding, restates a garbled phrase, or corrects themselves. In agent work, repair matters because the transcript becomes infrastructure. People make later decisions from what the agent said earlier.

So an agent should speak when its earlier state becomes false. If it said four h-indexes remained and later discovers there were forty-two, that is not a minor bookkeeping issue. It should repair the record directly: "I gave a wrong remaining count; the table shows 42 blanks. I am continuing from the table, not from my prior summary." The repair is not self-flagellation. It is version control for shared reality.

9. Handing off

Long-running work needs handoff messages. Not "I am done for now," but enough state for another worker, another turn, or a human reviewer to resume without reconstructing the whole session. The shape is simple: current objective, completed artifacts, open decisions, blockers, and the next concrete action.

Most agents are bad at this because they treat a handoff like a final answer. It is closer to a lab notebook entry. The reader is not asking whether the prose sounds complete. They are asking whether they can trust the state and pick up the work.

10. Staying quiet

Silence is a move too. The agent should stay quiet when it has no new state, no decision is needed, and the work can continue without consuming the channel. It should stay quiet when the team is negotiating values or ownership rather than asking for computation. It should stay quiet when a human is thinking out loud and the cost of a premature answer is derailing the conversation.

But silence should be designed, not accidental. Quiet work belongs in traces, dashboards, checklists, and artifacts. Channel speech is for coordination. The mistake is treating the visible chat message as the only way an agent can exist.

11. Keeping one voice

The worst version of over-speaking is not verbosity. It is incoherence. We watched two execution threads race after a timeout and post conflicting answers to the same question minutes apart. To the infrastructure those were continuations. To the team they were one colleague contradicting themselves in public.

Whatever the system is doing underneath — retries, background jobs, model switches, tool failures — the group should experience one coherent participant. One voice per thread. One current plan. One place where state is reconciled before speech. Two answers to one question is worse than one wrong answer because it converts a mistake into distrust.

Speaking is not binary

The first thing deployment taught us is that "respond or don't" is the wrong model. Our routing layer eventually grew five distinct lanes: a quick reply that costs nothing, a short answer without spinning up a full working session, a normal interactive turn, a durable background job, and a follow-up folded quietly into work already in progress. Most of the judgment isn't whether to speak but which register to speak in. A question about something the agent already knows should not launch a research session. A steering comment on running work should not spawn a second worker — it should merge into the first.

The second thing it taught us is that turn-taking is a context problem, not a politeness problem. The fix for the ignored 5:05 AM message wasn't a more eager trigger. It was giving the routing decision more memory: who had been doing what work in that channel, which agents were active, what the last few days of activity looked like. With that context, "download this file to drive as well" is obviously addressed to the agent that's been downloading files all week. Without it, the message is noise. Humans do this effortlessly — you know a half-sentence across the lab bench is for you because you know what you've been doing together. Agents have to be given that knowledge explicitly.

Rules we now design by

A few principles survived contact with real teams.

Speak when you hold state someone is about to lose. The best reason for an agent to interject is divergence: the plan says one thing, the observed work says another, and nobody else is positioned to notice. This is the core bet of Bruno, the coordination agent we wrote up for ICML — it drafts decision logs from meetings and flags when stated plans drift from execution. That message is worth an interruption. Most messages aren't.

Never ask a question you can answer from context. If the user has stated the goal and the goal hasn't changed, continuing is the respectful move. Asking permission feels safe to the agent and costs the human a turn. Across a long project those turns are the difference between a colleague and a chore.

Distinguish silence from absence. If an agent decides a message isn't for it, that decision should be cheap to audit — and when confidence is low, the right output is the lightest possible acknowledgment, not nothing. An emoji-weight signal that says I saw this and judged it not mine turns an invisible failure into a correctable one.

Match the register to the cost of being wrong. Routine state can flow to a dashboard nobody is forced to read. Plans on a cadence. Mutations only with confirmation. Interruptions reserved for divergence and loss. The channel is the scarcest resource the team has; treat writing to it as spending.

One voice per thread. Whatever the infrastructure is doing — retries, timeouts, parallel workers — the team should experience a single coherent colleague. Two answers to one question is worse than one wrong answer, because it converts a mistake into distrust.

The human mirror

None of these rules are only about AI. They're a description of the colleague everyone wants: the one who speaks up the moment the experiment diverges from the plan, stays quiet while you think, answers the half-addressed question because they know it's theirs, and never makes you manage their need to be seen working. Linguists have formalized parts of this — say as much as the situation requires and no more — but mostly it's craft, learned by every good lab member through years of being gently shushed.

We spend enormous effort evaluating what agents say and almost none evaluating when. Yet in every deployment we've run, the when is what determined whether a team kept the agent. Capability got the agent invited into the channel. Turn-taking decided whether it got to stay.