Posts by Collection

publications

Collaborative exploration and reinforcement learning between heterogeneously skilled agents in environments with sparse rewards

Authors: Alain Andres* , Esther Villar-Rodriguez , Javier Del Ser

Published in International Joint Conference on Neural Networks, IJCNN, 2021

Abstract: A critical goal in Reinforcement Learning is the minimization of the time needed for an agent to learn to solve a given environment. In this context, collaborative reinforcement learning refers to the improvement of this learning process through the interaction between agents, which usually yields better results than training each agent in isolation. Most studies in this area have focused on the case with homogeneous agents, namely, agents equally skilled for undertaking their task. By contrast, heterogeneity among agents could arise due to the particular capabilities on how they sense the environment and/or the actions they could perform. Those differences eventually hinder the learning process and information sharing between agents. This issue becomes even more complicated to address over hard exploration scenarios where the extrinsic rewards collected from the environment are sparse. Read more

An Evaluation Study of Intrinsic Motivation Techniques Applied to Reinforcement Learning over Hard Exploration Environments

Authors: Alain Andres* , Esther Villar-Rodriguez , Javier Del Ser

Published in International Cross Domain Conference for Machine Learning and Knowledge Extraction, CD-MAKE, 2022

Abstract: In the last few years, the research activity around reinforcement learning tasks formulated over environments with sparse rewards has been especially notable. Among the numerous approaches proposed to deal with these hard exploration problems, intrinsic motivation mechanisms are arguably among the most studied alternatives to date. Read more

Collaborative training of heterogeneous reinforcement learning agents in environments with sparse rewards: what and when to share?

Authors: Alain Andres* , Esther Villar-Rodriguez , Javier Del Ser

Published in Neural Computing and Applications (S.I.: Human-aligned Reinforcement Learning for Autonomous Agents and Robots), 2022

Abstract: In the early stages of human life, babies develop their skills by exploring different scenarios motivated by their inherent satisfaction rather than by extrinsic rewards from the environment. This behavior, referred to as intrinsic motivation, has emerged as one solution to address the exploration challenge derived from reinforcement learning environments with sparse rewards. Read more

Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation

Authors: Alain Andres* , Esther Villar-Rodriguez , Javier Del Ser

Published in IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, IEEE ADPRL, 2022

Abstract: Reinforcement Learning has emerged as a strong alternative to solve optimization tasks efficiently. The use of these algorithms highly depends on the feedback signals provided by the environment in charge of informing about how good (or bad) the decisions made by the learned agent are. Unfortunately, in a broad range of problems the design of a good reward function is not trivial, so in such cases sparse reward signals are instead adopted. The lack of a dense reward function poses new challenges, mostly related to exploration. Imitation Learning has addressed those problems by leveraging demonstrations from experts. In the absence of an expert (and its subsequent demonstrations), an option is to prioritize well-suited exploration experiences collected by the agent in order to bootstrap its learning process with good exploration behaviors.However, this solution highly depends on the ability of the agent to discover such trajectories in the early stages of its learning process. Read more

Evolutionary Multi-Objective Quantization of Randomization-Based Neural Networks

Authors: Javier Del Ser* , Alain Andres , Miren Nekane Bilbao , Ibai Laña , Jesus L Lobo

Published in IEEE Symposium Series on Computational Intelligence (SSCI), 2023

Abstract: The deployment of Machine Learning models on hardware devices has motivated a notable research activity around different strategies to alleviate their complexity and size. This is the case of neural architecture search or pruning in Deep Learning. This work places its focus on simplifying randomization-based neural networks by discovering fixed-point quantization policies that optimally balance the trade-off between performance and complexity reduction featured by these models. Read more

Enhanced Generalization through Prioritization and Diversity in Self-Imitation Reinforcement Learning over Procedural Environments with Sparse Rewards

Authors: Alain Andres* , Daochen Zha , Javier Del Ser

Published in IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, IEEE ADPRL, 2023

Abstract: Exploration poses a fundamental challenge in Reinforcement Learning (RL) with sparse rewards, limiting an agent’s ability to learn optimal decision-making due to a lack of informative feedback signals. Self-Imitation Learning (self-IL) has emerged as a promising approach for exploration, leveraging a replay buffer to store and reproduce successful behaviors. However, traditional self-IL methods, which rely on high-return transitions and assume singleton environments, face challenges in generalization, especially in procedurally-generated (PCG) environments. Therefore, new self-IL methods have been proposed to rank which experiences to persist, but they replay transitions uniformly regardless of their significance, and do not address the diversity of the stored demonstrations. Read more

Exploring Data Augmentation and Active Learning Benefits in Imbalanced Datasets

Authors: Luis Moles* , Alain Andres , Goretti Echegaray , Fernando Boto

Published in Mathematics, 2024

Abstract: Despite the increasing availability of vast amounts of data, the challenge of acquiring labeled data persists. This issue is particularly serious in supervised learning scenarios, where labeled data are essential for model training. In addition, the rapid growth in data required by cutting-edge technologies such as deep learning makes the task of labeling large datasets impractical. Active learning methods offer a powerful solution by iteratively selecting the most informative unlabeled instances, thereby reducing the amount of labeled data required. However, active learning faces some limitations with imbalanced datasets, where majority class over-representation can bias sample selection. To address this, combining active learning with data augmentation techniques emerges as a promising strategy. Nonetheless, the best way to combine these techniques is not yet clear. Read more

Advancing towards Safe Reinforcement Learning over Sparse Environments with Out-of-Distribution Observations: Detection and Adaptation Strategies

Authors: Aitor Martinez-Seras* , Alain Andres , Javier Del Ser

Published in International Joint Conference on Neural Networks, IJCNN, 2024

Abstract: Safety in AI-based systems is among the highest research priorities, particularly when such systems are deployed in real-world scenarios subject to uncertainties and unpredictable inputs. Among them, the presence of long-tailed stimuli (Out-of-Distribution data, OoD) has captured much interest in recent times, giving rise to many proposals over the years to detect unfamiliar inputs to the model and adapt its knowledge accordingly. Read more

Single Agent Formulation for Reinforcement Learning Based Routing of Urban Last Mile Logistics with Platooning Vehicles

Authors: Nagore Bravo , Imanol Echeverria , Alain Andres , Ibai Lana*

Published in IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), 2024

Abstract: Last mile logistics are in the midst of a deep transformation thanks to the advent of autonomous vehicles with platooning capabilities that can take the place of typical delivery methods. Platooning brings to the vehicle routing problems new constraints and multiple objectives that are addressed in this paper with a Reinforcement Learning approach. In opposition to traditional metaheuristic optimization algorithms, Reinforcement Learning provides flexibility in the face of changing environment, shifting the challenge to the way in which the problem is formulated. While there have been successful attempts to implement RL solutions to vehicle routing problems, including some sort of optional platooning, our main contribution is funded in the application to this platooning vehicle routing problems for last mile delivery, considering all their particularities and proposing a formulation framework for this kind of problems. Read more

Fostering Intrinsic Motivation in Reinforcement Learning with Pretrained Foundation Models

Authors: Alain Andres* , Javier Del Ser

Published in Intrinsically Motivated Open-ended Learning workshop at NeurIPS, 2024

Abstract: Exploration remains a significant challenge in reinforcement learning, especially in environments where extrinsic rewards are sparse or non-existent. The recent rise of foundation models, such as CLIP, offers an opportunity to leverage pretrained, semantically rich embeddings that encapsulate broad and reusable knowledge. In this work we explore the potential of these foundation models not just to drive exploration, but also to analyze the critical role of the episodic novelty term in enhancing exploration effectiveness of the agent. We also investigate whether providing the intrinsic module with complete state information – rather than just partial observations – can improve exploration, despite the difficulties in handling small variations within large state spaces. Read more

Words as Beacons: Guiding RL Agents with High-Level Language Prompts

Authors: Unai Ruiz-Gonzalez* , Alain Andres , Pedro G Bascoy , Javier Del Ser

Published in Open-World Agents workshop at NeurIPS, 2024

Abstract: Sparse reward environments in reinforcement learning (RL) pose significant challenges for exploration, often leading to inefficient or incomplete learning processes. To tackle this issue, this work proposes a teacher-student RL framework that leverages Large Language Models (LLMs) as “teachers” to guide the agent’s learning process by decomposing complex tasks into subgoals. Read more

On the Black-box Explainability of Object Detection Models for Safe and Trustworthy Industrial Applications

Authors: Alain Andres* , Aitor Martinez-Seras , Ibai Laña , Javier Del Ser

Published in Results in Engineering, 2024

Abstract: In the realm of human-machine interaction, artificial intelligence has become a powerful tool for accelerating data modeling tasks. Object detection methods have achieved outstanding results and are widely used in critical domains like autonomous driving and video surveillance. However, their adoption in high-risk applications, where errors may cause severe consequences, remains limited. Explainable Artificial Intelligence methods aim to address this issue, but many existing techniques are model-specific and designed for classification tasks, making them less effective for object detection and difficult for non-specialists to interpret. Read more

Evaluating Reinforcement Learning-Based Neural Controllers for Quadcopter Navigation in Windy Conditions

Authors: Alain Andres* , Aritz D Martinez , Sümer Tunçay , Ignacio Carlucho

Published in Engineering Applications of Artificial Intelligence, 2025

Abstract: Accurate quadcopter navigation under windy conditions remains challenging for traditional control methods, especially in the presence of unpredictable wind gusts and strict navigational constraints. This paper evaluates Deep Reinforcement Learning (DRL) based controllers under such conditions, analysing the impact of wind domain randomisation, multi-goal training, enhanced state representations with explicit wind information, and the use of temporal data to capture affecting dynamics over time. Read more

Comparative Evaluation of RL and MPC for 6DoF AUV Control

Authors: Sümer Tunçay* , Alain Andres , Ignacio Carlucho

Published in Towards Autonomous Robotic Systems, TAROS, 2025

Abstract: Autonomous Underwater Vehicles (AUVs) require precise and robust control strategies for 3D pose regulation in dynamic underwater environments. In this study, we present a comparative evaluation of model-free and model-based control methods for AUV position control. Specifically, we analyze the performance of neural network controllers trained by three Reinforcement Learning (RL) algorithms—Proximal Policy Optimization (PPO), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Soft Actor-Critic (SAC)—alongside a Model Predictive Control (MPC) baseline. We train our RL methods in a simplified AUV simulator implemented in PyTorch, while our evaluation is done in a realistic marine robotics simulator called Stonefish. Read more

Towards Surgical Task Automation: Actor-Critic Models Meet Self-Supervised Imitation Learning

Authors: Jingshuai Liu* , Alain Andres , Yonghang Jiang , Yuning Du , Xichun Luo , Wenmiao Shu , Sotirios Tsaftaris

Published in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2025

Abstract: Surgical robot task automation has recently attracted great attention due to its potential to benefit both surgeons and patients. Reinforcement learning (RL) based approaches have demonstrated promising ability to perform automated surgical manipulations on various tasks. To address the exploration challenge, expert demonstrations can be utilized to enhance the learning efficiency via imitation learning (IL) approaches. However, the successes of such methods normally rely on both states and action labels. Unfortunately, action labels can be hard to capture or their manual annotation is prohibitively expensive owing to the requirement for expert knowledge. Emulating expert behaviour using noisy or inaccurate labels poses significant risks, including unintended surgical errors that may result in patient discomfort or, in more severe cases, tissue damage. It therefore remains an appealing and open problem to leverage expert data composed of pure states into RL. Read more

D-CRISP: Explaining Object Detectors by combining Randomized and Segment-based Perturbations

Authors: Alain Andres* , Javier Del Ser

Published in European Conference on Artificial Intelligence, ECAI, 2025

Abstract: Explaining the decisions issued by Machine Learning models for object detection tasks is essential in high-stakes decision making scenarios, such as medical image processing and vehicular perception for autonomous driving. Despite the proliferation of post-hoc perturbation-based methods for generating visual explanations, most eXplainable AI (XAI) approaches rely exclusively on either random image masking or selective segmentation-based occlusion, missing the opportunity to synergistically leverage both strategies in a complementary fashion. In this paper we address this gap by proposing D-CRISP (Detector-Combining Randomized Input and Segment Perturbations), a novel post-hoc explanation method for object detection models. D-CRISP unifies both random and region-based occlusions derived from image segmentation, producing multiscale saliency maps that capture both granular (pixel-level) and semantic (region-level) cues about the objects detected by the model. Read more

supervision

Trajectory Planning for a Robotic Arm Using Reinforcement Learning

Supervisors: Alain Andres , Jon Azpiazu , Eduardo Zamudio

Student: Sergio Garcia Ferreira (M.S.) , 2022-2023

Abstract: Reinforcement Learning has brought about a transformation in robotics, thanks to its ability to develop efficient control techniques through autonomous learning. In particular, Reinforcement Learning has proven to be successful in tasks such as reaching objects with robotic arms. In this work, a solution is developed for training this task in simulated environments, and an experimental setup is established to compare the performance of various model-free algorithms. It is demonstrated that PPO achieves the best results, while SAC exhibits instability in environments with Dense rewards. Furthermore, it is concluded that a Sparse reward is sufficient to solve the task in environments with a precision of 5 cm. Read more

Anomaly Detection in Wind Turbines: Analysis of Data Imbalance and Temporal Effects

Supervisors: Amaia Abanda , Alain Andres , Uxue Mori

Student: Ane San Jose (B.S.) , 2023-2024

Abstract: In engineering and systems monitoring, an anomaly is defined as a rare event. Detecting these anomalies, which deviate significantly from expected or normal system behavior, is crucial for identifying issues in various contexts, including computer systems and industrial processes. Read more

A Reinforcement Learning-Based Approach for Vehicle Platoon Route Optimization in Last-Mile Delivery

Supervisors: Alain Andres , Imanol Echeverria , Roberto Santana

Student: Nagore Bravo Julián (B.S.) , 2023-2024

Abstract: Classic challenges in combinatorial optimization, such as the Traveling Salesman Problem (TSP) and the Vehicle Routing Problem (VRP), have practical applications in planning, logistics, and transportation. Traditionally, these problems have been extensively studied using exact, heuristic, and meta-heuristic methods. However, the issue of generating high-quality solutions in real time persists, as these methods require starting from scratch each time a new problem needs to be solved. This is where the use of Reinforcement Learning (RL) emerges as a promising alternative for solving combinatorial optimization problems in real time. Time is a crucial factor in real-world routing problems, where conditions can change rapidly and solutions must adapt efficiently to these variations. Read more

Language as a Beacon: Guiding Reinforcement Learning with High-Level Language Prompts for Better Exploration and Generalization

Supervisors: Alain Andres , Javier Del Ser

Student: Unai Ruiz González (M.S.) , 2023-2024

Abstract: In the last decade, artificial intelligence has undergone a remarkable revolution, highlighted by the advances of Large Language Models (LLMs). Concurrently, the field of Reinforcement Learning (RL) has grown exponentially, expanding from robotics to personalized recommendation systems. In this context, the convergence of both fields seems promising. This convergence has the potential to further enhance agents’ ability to learn efficiently, redefining the possibilities of artificial intelligence application and, consequently, its impact on everyday life. Read more

talks

Expanded Horizons in AI: Beyond Traditional Machine Learning

As part of the “Advanced Machine Learning” course, I gave a talk aimed at teaching students about the challenges and opportunities we face in a technological center, especially in the field of artificial intelligence (AI). During the talk, I covered crucial topics such as the development of innovative algorithms, the implementation of advanced AI techniques, and the impact of these technologies on industry and society. Read more

Impact of Artificial Intelligence on Industry

The podcast, available in Spanish, can be watched/listened to here. In it, we discuss the following topics within the context of process automation:

  • Is Artificial Intelligence an option or a necessity in today’s world?
  • Differences between Generative Artificial Intelligence and traditional Artificial Intelligence.
  • Benefits and risks of Artificial Intelligence: Security, technological dependency, and regulation.
Read more

teaching

Web Engineering (2024-2025)

Undergraduate course, University of Deusto, Faculty of Engineering , 2024

Course focused on the design and development of modern web applications, covering client-server architectures, languages and frameworks (HTML5, CSS3, JavaScript, Python/Django), as well as accessibility, security, and performance aspects. It also strengthens written communication as a transversal skill. Read more

Object Oriented Programming (2024-2025)

Undergraduate course, University of Deusto, Faculty of Engineering , 2025

Course focused on the design, development, debugging, and testing of applications using Java as a general-purpose object-oriented programming language. Key topics include inheritance, polymorphism, generic collections, exception handling, and unit testing. Read more

Reinforcement Learning (2025-2026)

Undergraduate course, University of Deusto, Faculty of Engineering , 2025

Elective course introducing the principles and techniques of Reinforcement Learning, covering both tabular methods (Q-Learning, SARSA) and Deep RL approaches (DQN, PPO, SAC), with hands-on practice in Python. Current trends such as Imitation Learning, Multi-Agent RL, and Offline RL are also explored. Read more

Web Engineering (2025-2026)

Undergraduate course, University of Deusto, Faculty of Engineering , 2025

Course focused on the design and development of modern web applications, covering client-server architectures, languages and frameworks (HTML5, CSS3, JavaScript, Python/Django), as well as accessibility, security, and performance aspects. It also strengthens written communication as a transversal skill. Read more