A Reinforcement Learning-Based Approach for Vehicle Platoon Route Optimization in Last-Mile Delivery

Nagore Bravo Julián, Alain Andres, Imanol Echeverria, Roberto Santana

Abstract: Classic challenges in combinatorial optimization, such as the Traveling Salesman Problem (TSP) and the Vehicle Routing Problem (VRP), have practical applications in planning, logistics, and transportation. Traditionally, these problems have been extensively studied using exact, heuristic, and meta-heuristic methods. However, the issue of generating high-quality solutions in real time persists, as these methods require starting from scratch each time a new problem needs to be solved. This is where the use of Reinforcement Learning (RL) emerges as a promising alternative for solving combinatorial optimization problems in real time. Time is a crucial factor in real-world routing problems, where conditions can change rapidly and solutions must adapt efficiently to these variations.

Due to the advent of autonomous vehicles with platooning capabilities that can replace typical delivery methods, last-mile logistics is undergoing a profound transformation. Consequently, new challenges and multiple objectives are introduced in vehicle routing problems.

Unlike traditional meta-heuristic optimization algorithms, DRL-based approaches provide flexibility in the face of environmental changes, shifting the challenge not only to how the problem is formulated but also to how the policy is generated. While there have been successful attempts to implement RL solutions for vehicle routing problems, this work contributes to advancing the challenge resolution and generating a policy that allows for platoon adjustment. Specifically, we address the problem for last-mile delivery in real scenarios.