DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models

Yongdong Wang1,*, Runze Xiao1, Jun Younes Louhi Kasahara1, Ryosuke Yajima1, Keiji Nagatani1, Atsushi Yamashita2, Hajime Asama3
1Graduate School of Engineering, The University of Tokyo
2Graduate School of Frontier Sciences, The University of Tokyo
3Tokyo College, The University of Tokyo
*Corresponding author: wangyongdong@robot.t.u-tokyo.ac.jp
This project is part of the Moonshot Café Project
Paper Appendix Video Real Robot

Abstract

Large Language Models (LLMs) have demonstrated significant reasoning capabilities in robotic systems. However, their deployment in multi-robot systems remains fragmented and struggles to handle complex task dependencies and parallel execution. This study introduces the DART-LLM (Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models) system, designed to address these challenges. DART-LLM utilizes LLMs to parse natural language instructions, decomposing them into multiple subtasks with dependencies to establish complex task sequences, thereby enhancing efficient coordination and parallel execution in multi-robot systems. The system includes the QA LLM module, Breakdown Function modules, Actuation module, and a Vision-Language Model (VLM)-based object detection module, enabling task decomposition and execution from natural language instructions to robotic actions. Experimental results demonstrate that DART-LLM excels in handling long-horizon tasks and collaborative tasks with complex dependencies. Even when using smaller models like Llama 3.1 8B, the system achieves good performance, highlighting DART-LLM’s robustness in terms of model size.

Results

"L1-T1: Inspect a puddle"

"L1-T2: Clear an obstacle"

"L2-T1: Excavate soil"

"L2-T2: Transport soil to the dump truck's initial position"

"L3-T1: Clear the obstacle, then dig soil"

"L3-T2: Clear the obstacle, then inspect the puddle"

Real Robot Experiments

"L1-T1: Inspect a puddle (Real Robot) - Top view"

"L1-T1: Inspect a puddle (Real Robot) - Camera view"

"L2-T1: Excavate soil (Real Robot) - Top view"

"L2-T1: Excavate soil (Real Robot) - Camera view"