Abstract
Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities in robotics; however, their deployment in multi-robot systems remains limited and unstructured, particularly in managing task dependencies. This study introduces DART-LLM, which employs Directed Acyclic Graphs (DAGs) to model task dependencies, enabling the effective decomposition of natural language instructions into interdependent subtasks for coordinated multi-robot execution. The DART-LLM framework integrates a Question-Answering (QA) LLM module for instruction parsing and dependency-aware task decomposition, a Breakdown Function module for task parsing and robot assignment, an Actuation module for task execution, and a Vision-Language Model (VLM)-based object detector module for environmental perception, achieving end-to-end task execution. Experimental results across 3 task complexity levels demonstrate that DART-LLM achieves state-of-the-art performance, significantly outperforming the baseline across all evaluation metrics. Among the tested models, DeepSeek-r1 achieves the highest success rate, while Llama3.1 demonstrates superior reliability in response time. Additionally, ablation studies have confirmed that incorporating explicit dependency modeling improves performance across all models, particularly enhancing the reasoning capabilities of smaller models.
Results
"L1-T1-001: Inspect a puddle"
"L1-T2-001: Clear an obstacle"
"L2-T1-001: Excavate soil"
"L2-T2-001: Transport soil to the dump truck's initial position"
"L3-T1-001: Clear the obstacle, then dig soil"
"L3-T2-001: Clear the obstacle, then inspect the puddle"
Real Robot Experiments
"L1-T1-001: Inspect a puddle (Real Robot) - Top view"
"L1-T1-001: Inspect a puddle (Real Robot) - Camera view"
"L2-T1-001: Excavate soil (Real Robot) - Top view"
"L2-T1-001: Excavate soil (Real Robot) - Camera view"