Multi-Robot Task Planning for Multi-Object Retrieval Tasks with Distributed On-Site Knowledge via Large Language Models

Kento Murata1 · Shoichi Hasegawa1 · Tomochika Ishikawa1 · Yoshinobu Hagiwara3,4 · Akira Taniguchi2 · Lotfi El Hafi4 · Tadahiro Taniguchi4,5

Abstract

We study cooperative object search by multiple robots that receive natural-language instructions including multi-object or context-dependent goals (e.g., “find an apple and a banana”). Our framework integrates a large language model (LLM) with a spatial concept model that provides room names and room-wise object presence probabilities learned on each robot’s assigned area. With a tailored prompting strategy, the LLM infers required items from ambiguous commands, decomposes them into subtasks, and allocates them to robots that are most likely to succeed given their local knowledge. In experiments, the method achieved 47/50 successful allocations, outperforming random (28/50) and commonsense-only allocation (26/50), and was validated qualitatively on real mobile manipulators.

Overview

Overview: language instructions are decomposed into subtasks and allocated to robots using on-site spatial knowledge.
Natural-language instructions → decomposition → knowledge-aware allocation → execution.

Each robot maintains on-site knowledge that links places (e.g., kitchen, bedroom) with object–room presence probabilities learned in its assigned area. Given a user instruction, an LLM performs task decomposition and assigns subtasks to robots expected to have higher success probabilities under local knowledge. Subtasks are executed by a skill sequence (navigation → object_detection → pick → place) with feedback and replanning.

Method

Each robot learns on-site knowledge via a spatial concept model that links places (e.g., kitchen, bedroom) and object occurrence probabilities. The pipeline has four stages: (1) task decomposition from language, (2) knowledge-aware subtask allocation, (3) sequential action planning (navigation → object_detection → pick → place), and (4) execution with feedback loops (FlexBE).

Four-stage pipeline: knowledge acquisition, decomposition & allocation, action planning, and feedback execution.
Four-stage pipeline: knowledge acquisition → decomposition & allocation → action planning → execution with feedback.

Experiments

Evaluation environment – first floor
Evaluation environment – First floor (5 rooms, 12 objects).
Evaluation environment – second floor
Evaluation environment – Second floor (5 rooms, 12 objects).

We evaluate allocation accuracy across instruction types (random, hard-to-predict, commonsense, mixed). The proposed method reaches 47/50 correct allocations versus random 28/50 and commonsense-only 26/50.

Bar chart of allocation success counts across instruction types.
Allocation success counts across instruction types (random/hard/commonsense/mixed).

Supplement (Beyond the Paper)

Resources

Code availability: not publicly released at this time.

Demo Video

Demonstration of our multi-robot task planning framework.

BibTeX

@article{Murata2025MultiRobotTaskPlanning,
  title   = {Multi-Robot Task Planning for Multi-Object Retrieval Tasks with Distributed On-Site Knowledge via Large Language Models},
  author  = {Murata, Kento and Hasegawa, Shoichi and Ishikawa, Tomochika and Hagiwara, Yoshinobu and Taniguchi, Akira and El Hafi, Lotfi and Taniguchi, Tadahiro},
  journal = {arXiv preprint arXiv:2509.12838},
  year    = {2025},
  note    = {Project page: https://kentomurata0610.github.io/multi-robot-task-planning/}
}

Please cite the arXiv version until the conference review is complete.

Acknowledgments

Partially supported by JST Moonshot (JPMJMS2011), JSPS KAKENHI (JP25K15292, JP23K16975), and JST Challenging Research Program for Next-Generation Researchers (JPMJSP2101).