Automated Failure Diagnosis for Multi-Agent Systems: A Step-by-Step Guide
Introduction
Multi-agent systems powered by large language models (LLMs) are increasingly used to tackle complex collaborative tasks, but failures remain common and notoriously difficult to debug. When your multi-agent system malfunctions, you're left sifting through endless interaction logs to determine which agent caused the failure and at what point—a process researchers have dubbed “automated failure attribution.” A team from Penn State, Duke, Google DeepMind, and other institutions developed a benchmark dataset (Who&When) and several automated attribution methods, accepted as a spotlight at ICML 2025. This guide walks you through applying those methods to your own multi-agent system, helping you pinpoint failures quickly and move from manual log archaeology to efficient, data-driven diagnosis.

What You Need
- Python 3.8+ environment
- Access to multi-agent system logs – interaction records in JSON format (example format provided with the dataset)
- Git and Hugging Face account (to download the Who&When dataset)
- PyTorch and Transformers libraries
- Basic understanding of LLM multi-agent architectures (e.g., agent roles, message passing)
Step-by-Step Guide
Step 1: Download the Who&When Dataset and Code
Begin by obtaining the benchmark dataset and open-source code from the official repositories. This ensures you have the right reference data and attribution tools.
- Clone the GitHub repository:
git clone https://github.com/mingyin1/Agents_Failure_Attribution - Install required dependencies:
pip install -r requirements.txt - Download the dataset from Hugging Face:
huggingface-cli download Kevin355/Who_and_When --local-dir ./data
Step 2: Understand the Benchmark Structure
The Who&When dataset contains multi-agent interaction logs annotated with ground-truth failure points: which agent was responsible and at which turn the failure originated. Study the dataset structure to map your own logs appropriately.
- Each log entry includes: task description, agent identities, sequence of agent messages, and a binary failure label with the responsible agent and turn index.
- Familiarize yourself with the three attribution methods provided baseline: DirectPrompt (ask an LLM to explain), TraceEval (evaluate each agent’s contribution), and the proposed AttributionLM (fine-tuned for this task).
Step 3: Prepare Your Own Multi-Agent Logs
To diagnose failures in your system, you need to format its logs in the same schema as the benchmark. This ensures the attribution methods can process them.
- Export your system’s interaction data: each turn should capture the agent that spoke, the message content, and a timestamp or turn number.
- For each task execution, create a JSON object with keys:
task,agents(list of agent names),turns(list of turn objects withagentandmessage), andfailure(boolean, set tofalsefor unlabeled data). - Save your logs in a folder (e.g.,
./my_logs/).
Step 4: Run Automated Failure Attribution
Use the provided scripts to apply the attribution method of your choice to your logs. The code includes a command-line interface for each method.

- Choose a method:
DirectPromptis lightweight but less accurate;AttributionLMoffers the best performance if you can run inference on a GPU. - Run the attribution script:
python run_attribution.py --method AttributionLM --logs ./my_logs --output ./results - The script returns a JSON file per log with
responsible_agentandfailure_turnpredictions.
Step 5: Interpret the Results and Debug
Now you have a clear hypothesis: which agent likely caused the failure and at what step. Use this to focus your debugging efforts.
- Check the predicted failure turn in your original logs: examine the messages exchanged around that point.
- Verify whether the identified agent made an incorrect reasoning step or misunderstood another agent’s output.
- If the prediction seems off, cross-reference with the attribution confidence score (output by the model). Consider re-running with a different method (e.g.,
TraceEval) to triangulate.
Step 6: Improve Your System Based on Findings
The ultimate goal is to fix the failure and prevent recurrence. The attribution gives you a starting point for system iteration.
- Adjust the problematic agent’s prompt, role description, or information-processing pipeline.
- Add guardrails: require agents to confirm intermediate results before proceeding.
- Re-run your multi-agent system on the same task and verify the failure no longer occurs.
Tips for Success
- Start with the benchmark logs to test your pipeline before analyzing your own data.
- Use the
AttributionLMmodel if you have access to a GPU; it outperforms prompt-based methods significantly. - Log everything – the more context you capture per turn (e.g., internal reasoning), the better the attribution will work.
- Combine with manual review for critical systems; the model provides a hypothesis, not a guarantee.
- Contribute back: if you discover new failure patterns, consider augmenting the Who&When dataset to help the community.
Related Articles
- How to Measure Nuclear Reactions at Record-Low Energies for Astrophysical Research
- Microsoft Unleashes Agentic AI Platform for R&D, Claims Breakthrough in Scientific Discovery
- 5 Steps a California Startup Is Taking to Shield Earth from Rogue Asteroids
- How to Assess Why Physics-Based Weather Models Still Beat AI for Extreme Events
- How to Map the Milky Way's Star-Forming Edge Using Stellar Age Data
- The Stealthy Sabotage of Fast16: A Pre-Stuxnet Cyber Weapon
- How to Pack a Mars Parachute Using the Donut Bag Method
- Rethinking Reality: Could Consciousness Be More Fundamental Than Quantum Physics?