LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
Yunsheng Ma*, Can Cui*, Xu Cao*, Wenqian Ye, Peiran Liu, Juanwu Lu, Amr Abdelraouf, Rohit Gupta,
Kyungtae Han, Aniket Bera, James M. Rehg and Ziran Wang
A novel framework for planning in the field of autonomous driving, rethinking the task as a
code-generation process that leverages established behavioral primitives.
Main Contributions:
1. We introduce the first benchmark for evaluating LLM-based agents in autonomous driving
contexts.
2. We propose a framework including an open simulator for executing policies, equipped with
scoring mechanisms.
3. Our LaMPilot provides LLM agents with a set of APIs that cover essential driving functions,
where safety criteria are
loaded into the APIs.
4. Our LaMPilot enables the translation of natural language instructions into actionable
strategies through code generation.
LaMPilot Framework Architecture
LaMPilot framework architecture. This diagram shows the workflows of the LaMPilot framework, illustrating the process from initialization to evaluation. Once initialized, a structured language generator interprets the simulator environment to establish the language context. The central LLM handles input prompts, utilizing predefined APIs and generating LMP. The LMP is then executed in the simulator. The executed policy is assessed by the evaluator, which provides a final score for the policy.Autonomous driving function modules and message flow
The dataset is divided into three sets: training (3,900 samples), validation (500 samples), and testing (500 samples). The dataset contains a variety of instructions that reflect the diversity and unpredictability of real-world requirements. These instructions are categorized by maneuver types, such as turning, lane changing, and overtaking, as well as scenario types, including both highway and intersection settings. The right figure shows a distribution of the first four words in the instructions, highlighting the diversity of instructions in our dataset.