Pankayaraj

Currently, I am a PhD student at the Department of Computer Science, University of Maryland working with Prof. Furong Huang . Before that I had worked as a research engineer at CARE AI Lab, Singapore Management University under Prof. Pradeep Varakantham on constrained reinforcement learning.

I have completed my Bachelors of Science in Computer Engineering at University of Peradeniya . I have worked at Sri Lanka Technological Campus both as a research Assistant and a research intern under the supervision of Prof. D.H.S. Maithripala . During my Undergraduate studies I have won the best final year project research thesis award.

Email  /  CV  /  Google Scholar  /  Github

profile photo
I. Research Summary

In general my research interests are on Reinforcement Learning(RL). My past experiences in the field of RL span across continual RL, multi agent RL, bayesian RL, bandits. Currently, I am interested in imitation, constrained RL and explainability in RL. Below you can find my past publications and projects.

II. Publications (Selected)

Reinforcement Learning

2022 - 2022: Constrained Reinforcement Learning

Constrained Reinforcement Learning in Hard Exploration Problems with Hierarchies
Pankayaraj, Pradeep Varakantham ,
37th AAAI Conference on Artificial Intelligence Washington, D.C. USA , 2022 Acceptance Rate: 19.6%
GitHub / Paper

In this work, we propose a method to incorporate and satisfy constraints at every time step in a hierarchical reinforcement learning (HRL) framework. In particular, we propose a way to incorporate backward value functions into an options-based HRL framework. This incorporation depends on the fact that there exists a steady distribution in the HRL framework. To this end under some assumptions, we prove the existence of such a stationary distribution for the markov decision process at every level of the hierarchy. Furthermore, empirically we show the importance of our proposal in terms of efficient exploration as normally the exploration gets curtailed as constraint satisfaction becomes a focal point agent

2021 - 2021: Continual Reinforcement Learning

Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning
Pankayaraj, Natalia Díaz-Rodríguez, Javier Del Ser,
Cognitive Computation journal. Accepted in 2023. Impact Factor: 5.4
GitHub / Paper

In this work, we investigate the means of using curiosity on replay buffers to improve offline multi-task continual reinforcement learning when tasks, which are defined by the non-stationarity in the environment, are non labeled and \emph{not evenly exposed to the learner in time

2020 - 2020: Multi Agent Reinforcement Learning

Temporally Aware Multi-Agent Reinforcement Learning in Sparsely Connected Cooperative Environments
Pankayaraj, Yuvini Sumanasekera, Chandima Samarasinghe, Dhammika Elkaduwe, Upul Jayasinghe , D.H.S Maithripala,
ESCaPe (Symposium), Sri Lanka, 2020. Best Paper Award
Researchgate / GitHub

In this work, we propose a model which exploitsthe inherent graph-like structure of multi-agent networks to facilitate the learning of more robustbehaviour strategies by capturing the spatial dependencies and temporal dynamics of the underlying graph.


Multi Arm Bandits

2019 - 2020: Multi Agent Multi Arm Bandit Research

A Decentralized Policy with Logarithmic Regret for a Class of Multi-Agent Multi-Armed Bandit Problems with Option Unavailability Constraints and Stochastic Communication Protocols
Pankayaraj, J. M. Berg, D.H.S Maithripala,
59th IEEE Conference on Decision and Control(IEEE CDC), Jeju Island, Republic of Korea 2020 Acceptance Rate: 52.7%
arXiv / IEEE

This paper considers a multi-armed bandit (MAB) problem in which multiple mobile agents receive rewards by sampling from a collection of spatially dispersed bandits. The goal is to formulate a decentralized policy for each agent, in order to maximize the total cumulative reward over all agents, subject to option availability and inter-agent communication constraints.

A Decentralized Communication Policy for Multi Agent Multi Armed Bandit Problems
Pankayaraj, D.H.S Maithripala,
European Control Conference(ECC), Saint Petersburg, Russia 2020 Acceptance Rate: 58% ,
arXiv / IEEE / GitHub

This paper proposes a novel policy for a group of agents to, individually as well as collectively, solve a multi armed bandit (MAB) problem. The policy relies solely on the information that an agent has obtained through sampling of the options on its own and through communication with neighbors.


Computer Vision Based Applications

2018 - 2019: Sleep Apnea Detection

Non-contact Infant Sleep Apnea Detection
Gihan Jayatilaka, Harshana Weligampola, Suren Sritharan, Pankayaraj, Roshan Ragel, Isuru Nawinne,
ICIIS, Sri Lanka, 2019
arXiv / IEEE / GitHub

We propose a non invasive solution for this problem based on video processing. The infant is observed by a video camera which is connected to a single board computer (Raspberry pi) which analyzes the video feed to diagnose breathing anomalies. The camera is turned to a proper orientation for the observation using a robotic arm.

III. Academic Volunteering
Peer Reviewer : Journal IEEE Transactions on Communications [Impact Factor: 5.69(2018)]
IV. Projects (These are my undergrad projects)
RL based Quadcopter Control
Report

In recent years, extensive research has been carried out in the field of autonomous aerial vehicle control, motivated by the rapid advancements in Machine Learning (ML). In particular, Reinforcement Learning (RL) has gained immense interest in developing control algorithms given its ability to learn useful behavior by dynamically interacting with the environment, without the need for an explicit teacher. In this work, we examine the use of RL methods on vision-based quadcopter control in both single-agent and multi-agent simulated environments. Specifically, the DQN algorithm was investigated in the single-agent setting and the MADDPG algorithm in the multi-agent setting. The control task in each of these settings was to navigate through the environment by avoiding obstacles to reach the specified goals. Thus, each of the aforementioned algorithms were evaluated on their ability to perform this control task.

Multi Agent Reinformcent Learning with Sparse Communication
Report \ GitHub

In recent years, the consensus among adaptive agents within multi-agent systems (MAS) has been an emerging area of research in the field of autonomous control. Reinforcement Learning (RL) has gained immense interest in this line of work as it aims to learn optimal cooperative policies through trial and error by dynamically interacting with the environment. However, in practice, connectivity within the multi-agent network may be sparse and the agents are often subjected to partial observability. This can result in the learning of sub-optimal policies. In this work, we consider the problem of learning optimal policies in cooperative multi-agent environments in the face of partial observability and sparse connectivity. The proposed model exploits the inherent graph-like structure of multi-agent systems. Graph Neural Networks (GNNs) are utilized to extract spatial dependencies and temporal dynamics of the underlying graph. Such spatio-temporal information is exploited to generate better state representations so as to facilitate the learning of more robust policies. This model builds on the previously explored spatial modelling in MARL.

Bayesian RL based Recommendation System
Report \ GitHub

When it comes to user customization it is essential to capture users preferences in an optimal manner so that the user can be served based on his past preferences. The concept behind this work is to formulate an a methodology for an online advertising shop to customize it’s advertisement presentation using the existing algorithms in the literature. The task of the algorithm is to find the next shop to suggest for the user on his time line based on his past preferences. Users preferences will be captured by the ratings he give for a shop when it is shown in his time line and by the fact weather he marks some shop as visited. Most part of the final suggested algorithm follows the 2003 paper named Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning

Python based Bayesian Optimization library
Report \ GitHub \ PyPi

In probability theory multi arm bandit problem or N-arm bandit problem is a problem in which a gambler at a row machine have to choose which machine to play and how many time to play it given a limited number of turns to choose. When chosen a machine would give a particular amount of reward which is either deterministic or probabilistic. Thus to accumulate an optimal amount of reward the gambler should choose a an optimal solution without knowing the reward structure behind the machine. As the problem moved away from the discrete arms got extended as a continuous variable with a K dimension the problem got extended as continuous bandit problem. Since the no of bandits became infinite to reduce the complexity the problem was formulated with deterministic rewards where the rewards of each arm were considered as a correlated function. As the scope of these problems narrowed down to the bayesian thinking they were named as bayesian optimization. They can be considered as a problem where we are supposed to optimize a function with certain bounds with as few samples as possible.In this work we provide a python based library for the above mentioned bayesian optimization problem

SitNShop
Report \ GitHub

The concept behind this project is to design and implement a web page to connect the local customers with the local shop owners by building a platform for advertisements. This project is build on the basis of providing an interactive interface for both users and shop owners with the ability to convey the information about them as much as possible while focusing also on the development of a capable algorithm to capture the preference of the customer dynamically.

Non Contact Sleep Apnea Detection
Report \ GitHub

Sleep Apnea is a serious disorder caused by the interruption of breathing during sleep. This can cause the people to stop breathing for several time even hundreds if not treated properly. It can affect people of any age. But when the babies are affected with the condition they tend to not get up and keep on sleeping which may risk their lives. We propose a non invasive solution for this problem based on video processing. The infant is observed by a video camera which is connected to a single board computer (Raspberry pi) which analyzes the video feed to diagnose breathing anomalies. The camera is turned to a proper orientation for the observation using a robotic arm.

V. References

1. Prof. Furong Huang
Assistant Professor
University of Maryland, Department of Computer Science
furongh@umd.edu

1. Prof. Pradeep Varakantham
Lee Kuan Yew Fellow
Professor of Computer Science
School of Computing and Information Systems, Singapore Management University
pradeepv@smu.edu.sg

2. Dr. D.H.S Maithripala
Senior Lecturer
Department of Mechanical Engineering, University of Peradeniya, Sri Lanka.
smaithri@pdn.ac.lk


Credits: Jon Barron
Last Updated: 28-Dec-2021