# Evolving Rewards to Automate Reinforcement Learning

@article{Faust2019EvolvingRT, title={Evolving Rewards to Automate Reinforcement Learning}, author={Aleksandra Faust and Anthony G. Francis and Dar Mehta}, journal={ArXiv}, year={2019}, volume={abs/1905.07628} }

Many continuous control tasks have easily formulated objectives, yet using them directly as a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many classical control tasks guide RL training using complex rewards, which require tedious hand-tuning. We automate the reward search with AutoRL, an evolutionary layer over standard RL that treats reward tuning as hyperparameter optimization and trains a population of RL agents to find a reward that maximizes the task… Expand

#### Figures, Tables, and Topics from this paper

#### 18 Citations

Learning Robotic Manipulation Skills Using an Adaptive Force-Impedance Action Space

- Computer Science
- ArXiv
- 2021

This work proposes to factor the learning problem in a hierarchical learning and adaption architecture to get the best of both worlds in real-world robotics, and combines these components through a bio-inspired action space that is called AFORCE. Expand

RL-DARTS: Differentiable Architecture Search for Reinforcement Learning

- Computer Science
- ArXiv
- 2021

Throughout this training process, it is shown that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies. Expand

Learning to Win, Lose and Cooperate through Reward Signal Evolution

- Computer Science
- ArXiv
- 2021

A general framework for optimizing N goals given n reward signals is introduced and it is demonstrated that such an approach allows agents to learn high-level goals such as winning, losing and cooperating from scratch without prespecified reward signals in the game of Pong. Expand

Neural Architecture Evolution in Deep Reinforcement Learning for Continuous Control

- Computer Science, Mathematics
- ArXiv
- 2019

Experiments show that the proposed Actor-Critic Neuroevolution algorithm often outperforms the strong Actor- Critic baseline and is capable of automatically finding topologies in a sample-efficient manner which would otherwise have to be found by expensive architecture search. Expand

LIEF: Learning to Influence through Evaluative Feedback

- 2021

We present a multi-agent reinforcement learning framework where rewards are not only generated by the environment but also by other peers in it through inter-agent evaluative feedback. We show that… Expand

Meta-learning curiosity algorithms

- Computer Science, Mathematics
- ICLR
- 2020

This work proposes a strategy for encoding curiosity algorithms as programs in a domain-specific language and searching, during a meta-learning phase, for algorithms that enable RL agents to perform well in new domains. Expand

AutoRL-TSP: Sistema de Aprendizado por Reforço Automatizado para o Problema do Caixeiro Viajante

- Physics
- 2020

The AutoML (Automated Machine Learning) aims at developing techniques to automate the entire machine learning process to obtain a system that fits the problem conditions. In this sense, one of the… Expand

Learning to Seek: Autonomous Source Seeking on a Nano Drone Microcontroller with Deep Reinforcement Learning

- Engineering
- 2019

Nano drones are uniquely equipped for fully autonomous applications due to their agility, low cost, and small size. However, their constrained form factor limits flight time, sensor payload, and… Expand

Learning to Seek: Deep Reinforcement Learning for Phototaxis of a Nano Drone in an Obstacle Field

- Computer Science
- 2019

This work deploys a deep reinforcement learning model, capable of direct paths even with noisy sensor readings, and demonstrates efficient light seeking by reaching the goal in simulation in 65% fewer steps and with 60% shorter paths, compared to a baseline random walker algorithm. Expand

Effective, interpretable algorithms for curiosity automatically discovered by evolutionary search

- 2020

We take the hypothesis that curiosity is a mechanism found by evolution that encourages meaningful exploration early in an agent’s life in order to expose it to experiences that enable it to obtain… Expand

#### References

SHOWING 1-10 OF 32 REFERENCES

Continuous control with deep reinforcement learning

- Computer Science, Mathematics
- ICLR
- 2016

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs. Expand

BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning

- Computer Science
- 2019 International Conference on Robotics and Automation (ICRA)
- 2019

The Backward Reachability Curriculum (BaRC) begins policy training from states that require a small number of actions to accomplish the task, and expands the initial state distribution backwards in a dynamically-consistent manner once the policy optimization algorithm demonstrates sufficient performance. Expand

Evolution-Guided Policy Gradient in Reinforcement Learning

- Computer Science, Mathematics
- NeurIPS
- 2018

Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into theEA population periodically to inject gradient information into the EA. Expand

Hindsight Experience Replay

- Computer Science, Mathematics
- NIPS
- 2017

A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum. Expand

Residual Policy Learning

- Computer Science, Engineering
- ArXiv
- 2018

It is argued that RPL is a promising approach for combining the complementary strengths of deep reinforcement learning and robotic control, pushing the boundaries of what either can achieve independently. Expand

Reverse Curriculum Generation for Reinforcement Learning

- Computer Science
- CoRL
- 2017

This work proposes a method to learn goal-oriented tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved, and generates a curriculum of start states that adapts to the agent's performance, leading to efficient training on goal- oriented tasks. Expand

Soft Actor-Critic Algorithms and Applications

- Computer Science, Mathematics
- ArXiv
- 2018

Soft Actor-Critic (SAC), the recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework, achieves state-of-the-art performance, outperforming prior on-policy and off- policy methods in sample-efficiency and asymptotic performance. Expand

Learning to Walk via Deep Reinforcement Learning

- Computer Science, Mathematics
- Robotics: Science and Systems
- 2019

A sample-efficient deep RL algorithm based on maximum entropy RL that requires minimal per-task tuning and only a modest number of trials to learn neural network policies is proposed and achieves state-of-the-art performance on simulated benchmarks with a single set of hyperparameters. Expand

Proximal Policy Optimization Algorithms

- Computer Science
- ArXiv
- 2017

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective… Expand

Learning to Navigate the Web

- Computer Science, Mathematics
- ICLR
- 2019

DQN, deep reinforcement learning agent, with Q-value function approximated with a novel QWeb neural network architecture is trained with the ability of the agent to generalize to new instructions on World of Bits benchmark, on forms with up to 100 elements, supporting 14 million possible instructions. Expand