Learning Autonomous Driving Tasks via Human Feedbacks with Large Language Models

Findings of the Association for Computational Linguistics: EMNLP 2024

Abstract

Traditional autonomous driving systems have mainly focused on making driving decisions without human interaction, overlooking human-like decision-making and human preference required in complex traffic scenarios. To bridge this gap, we introduce a novel framework leveraging Large Language Models (LLMs) for learning human-centered driving decisions from diverse simulation scenarios and environments that incorporate human feedback. Our contributions include a GPT-4-based programming planner that integrates seamlessly with the existing CARLA simulator to understand traffic scenes and react to human instructions. Specifically, we build a human-guided learning pipeline that incorporates human driver feedback directly into the learning process and stores optimal driving programming policy using Retrieval Augmented Generation (RAG). Impressively, our programming planner, with only 50 saved code snippets, can match the performance of baseline extensively trained reinforcement learning (RL) models. Our paper highlights the potential of an LLM-powered shared-autonomy system, pushing the frontier of autonomous driving system development to be more interactive and intuitive.

Method

MY ALT TEXT
An overview of the proposed programming planner. Initially, the Large Language Model (LLM) is given a prompt composed of human instruction, driving context, and system message. It then conducts chain-of-thought reasoning to generate language model programs, which serve as driving policies. This policy code is executed in the CARLA simulator to complete the driving task as specified in the instructions. Verified code is then added to the knowledge database for future reference, utilizing Retrieval Augmented Generation.

Results

Driving performance of expert drivers on CARLA leaderboard testing routes. Our planner learned from human feedback performs on par with the RL baselines. The driving score (DS) is calculated by multiplying the route completion percentage (RC) by the infraction penalty (IP). The "Navigation" column specifies the format of the navigation information. Here, "Waypoint" refers to a predetermined sequence of GPS-style coordinates and enum route instructions, while "Human Instruction" refers to natural language navigation commands like "turn right at the next light", which is a more challenging setting. MY ALT TEXT

Video Demo

BibTeX


          @inproceedings{ma_learning_2024,
            title = {Learning {Autonomous} {Driving} {Tasks} via {Human} {Feedback} with {Large} {Language} {Models}},
            booktitle = {{EMNLP} {Findings}},
            author = {Ma, Yunsheng and Cao, Xu and Ye, Wenqian and Cui, Can and Mei, Kai and Wang, Ziran},
            year = {2024},
          }