Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

portfolio

publications

Collaborative exploration and reinforcement learning between heterogeneously skilled agents in environments with sparse rewards

Published in International Joint Conference on Neural Networks, IJCNN, 2021

Abstract: A critical goal in Reinforcement Learning is the minimization of the time needed for an agent to learn to solve a given environment. In this context, collaborative reinforcement learning refers to the improvement of this learning process through the interaction between agents, which usually yields better results than training each agent in isolation. Most studies in this area have focused on the case with homogeneous agents, namely, agents equally skilled for undertaking their task. By contrast, heterogeneity among agents could arise due to the particular capabilities on how they sense the environment and/or the actions they could perform. Those differences eventually hinder the learning process and information sharing between agents. This issue becomes even more complicated to address over hard exploration scenarios where the extrinsic rewards collected from the environment are sparse. This work sheds light on the impact of leveraging collaborative learning strategies between heterogeneously skilled agents over hard exploration scenarios. Our study gravitates on how to share and exploit knowledge between the agents so as to mutually improve their learning procedures, further considering mechanisms to cope with sparse rewards. We assess the performance of these strategies via extensive simulations over modifications of the ViZDooM environment, which allow examining their benefits and drawbacks when dealing with agents endowed with different behavioral policies. Our results uncover the inherent problems of not considering the skill heterogeneity of the agents in the knowledge sharing strategy, and unleash a manifold of research directions aimed at circumventing these noted issues.

Download here

An Evaluation Study of Intrinsic Motivation Techniques Applied to Reinforcement Learning over Hard Exploration Environments

Published in International Cross Domain Conference for Machine Learning and Knowledge Extraction, CD-MAKE, 2022

Abstract: In the last few years, the research activity around reinforcement learning tasks formulated over environments with sparse rewards has been especially notable. Among the numerous approaches proposed to deal with these hard exploration problems, intrinsic motivation mechanisms are arguably among the most studied alternatives to date. Advances reported in this area over time have tackled the exploration issue by proposing new algorithmic ideas to generate alternative mechanisms to measure the novelty. However, most efforts in this direction have overlooked the influence of different design choices and parameter settings that have also been introduced to improve the effect of the generated intrinsic bonus, forgetting the application of those choices to other intrinsic motivation techniques that may also benefit of them. Furthermore, some of those intrinsic methods are applied with different base reinforcement algorithms (e.g. PPO, IMPALA) and neural network architectures, being hard to fairly compare the provided results and the actual progress provided by each solution. The goal of this work is to stress on this crucial matter in reinforcement learning over hard exploration environments, exposing the variability and susceptibility of avant-garde intrinsic motivation techniques to diverse design factors. Ultimately, our experiments herein reported underscore the importance of a careful selection of these design aspects coupled with the exploration requirements of the environment and the task in question under the same setup, so that fair comparisons can be guaranteed.

Download here

Collaborative training of heterogeneous reinforcement learning agents in environments with sparse rewards: what and when to share?

Published in Neural Computing and Applications (S.I.: Human-aligned Reinforcement Learning for Autonomous Agents and Robots), 2022

Abstract: In the early stages of human life, babies develop their skills by exploring different scenarios motivated by their inherent satisfaction rather than by extrinsic rewards from the environment. This behavior, referred to as intrinsic motivation, has emerged as one solution to address the exploration challenge derived from reinforcement learning environments with sparse rewards. Diverse exploration approaches have been proposed to accelerate the learning process over single- and multi-agent problems with homogeneous agents. However, scarce studies have elaborated on collaborative learning frameworks between heterogeneous agents deployed into the same environment, but interacting with different instances of the latter without any prior knowledge. Beyond the heterogeneity, each agent’s characteristics grant access only to a subset of the full state space, which may hide different exploration strategies and optimal solutions. In this work we combine ideas from intrinsic motivation and transfer learning. Specifically, we focus on sharing parameters in actor-critic model architectures and on combining information obtained through intrinsic motivation with the aim of having a more efficient exploration and faster learning. We test our strategies through experiments performed over a modified ViZDooM’s My Way Home scenario, which is more challenging than its original version and allows evaluating the heterogeneity between agents. Our results reveal different ways in which a collaborative framework with little additional computational cost can outperform an independent learning process without knowledge sharing. Additionally, we depict the need for modulating correctly the importance between the extrinsic and intrinsic rewards to avoid undesired agent behaviors.

Download here

Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation

Published in IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, IEEE ADPRL, 2022

Abstract: Reinforcement Learning has emerged as a strong alternative to solve optimization tasks efficiently. The use of these algorithms highly depends on the feedback signals provided by the environment in charge of informing about how good (or bad) the decisions made by the learned agent are. Unfortunately, in a broad range of problems the design of a good reward function is not trivial, so in such cases sparse reward signals are instead adopted. The lack of a dense reward function poses new challenges, mostly related to exploration. Imitation Learning has addressed those problems by leveraging demonstrations from experts. In the absence of an expert (and its subsequent demonstrations), an option is to prioritize well-suited exploration experiences collected by the agent in order to bootstrap its learning process with good exploration behaviors. However, this solution highly depends on the ability of the agent to discover such trajectories in the early stages of its learning process. To tackle this issue, we propose to combine imitation learning with intrinsic motivation, two of the most widely adopted techniques to address problems with sparse reward. In this work intrinsic motivation is used to encourage the agent to explore the environment based on its curiosity, whereas imitation learning allows repeating the most promising experiences to accelerate the learning process. This combination is shown to yield an improved performance and better generalization in procedurally-generated environments, outperforming previously reported self-imitation learning methods and achieving equal or better sample efficiency with respect to intrinsic motivation in isolation.

Download here

Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments

Published in Adaptive and Learning Agents Workshop, AAMAS Conference, 2023

Abstract: One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently during online RL training both consistently improve the sample-efficiency while converging to optimal policies. Furthermore, we show that pre-training a policy from as few as two trajectories can make the difference between learning an optimal policy at the end of online training and not learning at all. Our findings motivate the widespread adoption of IL for pre-training and concurrent IL in procedurally generated environments whenever offline trajectories are available or can be generated.

Download here

Enhanced Generalization through Prioritization and Diversity in Self-Imitation Reinforcement Learning over Procedural Environments with Sparse Rewards

Published in IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, IEEE ADPRL, 2023

Abstract: Exploration poses a fundamental challenge in Reinforcement Learning (RL) with sparse rewards, limiting an agent’s ability to learn optimal decision-making due to a lack of informative feedback signals. Self-Imitation Learning (self-IL) has emerged as a promising approach for exploration, leveraging a replay buffer to store and reproduce successful behaviors. However, traditional self-IL methods, which rely on high-return transitions and assume singleton environments, face challenges in generalization, especially in procedurally-generated (PCG) environments. Therefore, new self-IL methods have been proposed to rank which experiences to persist, but they replay transitions uniformly regardless of their significance, and do not address the diversity of the stored demonstrations. In this work, we propose tailored self-IL sampling strategies by prioritizing transitions in different ways and extending prioritization techniques to PCG environments. We also address diversity loss through modifications to counteract the impact of generalization requirements and bias introduced by prioritization techniques. Our experimental analysis, conducted over three PCG sparse reward environments, including MiniGrid and ProcGen, highlights the benefits of our proposed modifications, achieving a new state-of-the-art performance in the MiniGrid-MultiRoom-N12-S10 environment.

Download here

supervision

Trajectory Planning for a Robotic Arm Using Reinforcement Learning

Published:

  • Original title: “Planificación de trayectorias de un brazo robótico mediante aprendizaje por refuerzo”
  • Student: Sergio Garcia Ferreira
  • Directors: Alain Andres (TECNALIA), Jon Azpiazu (TECNALIA), Eduardo Zamudio (VIU)
Click here to read the full abstract Reinforcement Learning has brought about a transformation in robotics, thanks to its ability to develop efficient control techniques through autonomous learning. In particular, Reinforcement Learning has proven to be successful in tasks such as reaching objects with robotic arms. In this work, a solution is developed for training this task in simulated environments, and an experimental setup is established to compare the performance of various model-free algorithms. It is demonstrated that PPO achieves the best results, while SAC exhibits instability in environments with Dense rewards. Furthermore, it is concluded that a Sparse reward is sufficient to solve the task in environments with a precision of 5 cm.

Anomaly Detection in Wind Turbines: Analysis of Data Imbalance and Temporal Effects

Published:

  • Original title: “Detección de anomalías en turbinas eólicas: Análisis del desequilibrio de los datos y los efectos de la temporalidad”
  • Student: Ane San Jose
  • Directors: Uxue Mori (UPV/EHU), Amaia Abanda (TECNALIA), Alain Andres (TECNALIA)
Click here to read the full abstract In engineering and systems monitoring, an anomaly is defined as a rare event. Detecting these anomalies, which deviate significantly from expected or normal system behavior, is crucial for identifying issues in various contexts, including computer systems and industrial processes. This project focuses on detecting anomalies in the operation of wind turbines used for renewable energy generation. Environmental and operational factors can affect turbine performance, leading to occasional anomalies. Early detection is vital for ensuring efficiency and safety, enabling predictive maintenance, optimizing performance, and extending turbine lifespan.

TBD

Published:

  • Student: Cloe Atxurra Alves
  • Directors: Jon Azpiazu (TECNALIA), Alain Andres (TECNALIA), Aitziber Mancisidor Barinagarrementeria (UPV/EHU)
Click here to read the full abstract TBD

A Reinforcement Learning-Based Approach for Vehicle Platoon Route Optimization in Last-Mile Delivery

Published:

  • Original title: “Enfoque basado en Aprendizaje por Refuerzo para la Optimización de Rutas de Vehículos en Pelotón en la Última Milla”
  • Student: Nagore Bravo Julián
  • Directors: Imanol Echeverria (TECNALIA), Alain Andres (TECNALIA), Roberto Santana (UPV/EHU)
Click here to read the full abstract Classic challenges in combinatorial optimization, such as the Traveling Salesman Problem (TSP) and the Vehicle Routing Problem (VRP), have practical applications in planning, logistics, and transportation. Traditionally, these problems have been extensively studied using exact, heuristic, and meta-heuristic methods. However, the issue of generating high-quality solutions in real time persists, as these methods require starting from scratch each time a new problem needs to be solved. This is where the use of Reinforcement Learning (RL) emerges as a promising alternative for solving combinatorial optimization problems in real time. Time is a crucial factor in real-world routing problems, where conditions can change rapidly and solutions must adapt efficiently to these variations.

TBD

Published:

  • Student: Unai Ruiz González
  • Directors: Alain Andres (TECNALIA), Javier Del Ser (UPV/EHU)
Click here to read the full abstract TBD

talks

Horizontes Expandidos en IA: Más Allá del Aprendizaje Automático Tradicional

Published:

Enmarcada dentro de la asignatura de “Aprendizaje Automático Avanzado”, impartí una charla con el objetivo de enseñar a los alumnos los retos y oportunidades que nos enfrentamos en un centro tecnológico, especialmente en el ámbito de la Inteligencia Artificial (IA). Durante la charla, abordé temas cruciales como el desarrollo de algoritmos innovadores, la implementación de técnicas avanzadas de IA y el impacto de estas tecnologías en la industria y la sociedad.

teaching

Web Engineering

Undergraduate course, University of Deusto, Department, 2024

TBD