rlR: Define Custom Task to solve

Environment class

If you want to use this package for your self defined task, you need to implement your own R6 class to represent the environment which must inherit the rlR::Environment Class. You could define other public and private members as you like which do not collide with the names in rlR::Environment. Type the following to have a look at the documentation of rlR::Environment

help(topic="Environment", package = "rlR")

A toy Example

env = rlR:::EnvToy$new()

rlR:::EnvToy is an R6 class which inherit rlR::Environment.

class(env)
## [1] "EnvToy"      "Environment" "R6"

There are 3 methods you must override when defining your own Environment class.

env$initialize  # the fields 'act_cnt' and  'state_dim' must be defined here
## function (...) 
## {
##     self$act_cnt = c(2)
##     self$state_dim = c(4)
## }
## <environment: 0x5575e6d5bd78>
env$reset  # The return must be a  list with fields state(must be an array), reward = NULL, done = FALSE, and info = list()
## function () 
## {
##     return(list(state = array(rnorm(self$state_dim), dim = self$state_dim), 
##         reward = NULL, done = FALSE, info = list()))
## }
## <environment: 0x5575e6d5bd78>
env$step  # The return must be a list with fields state(must be an array), reward(numeric), done(Boolean), and info (list of anything or empty list)
## function (action) 
## {
##     return(list(state = array(rnorm(self$state_dim), dim = self$state_dim), 
##         reward = 1, done = TRUE, info = list()))
## }
## <environment: 0x5575e6d5bd78>

Testing

Afterwards you could choose one of the available Agents to check if the newly defined environments works.

agent = initAgent("AgentDQN", env)
agent$learn(3)
## Episode: 1 finished with steps:1, rewards:1.000000 global step 1 
## Last 100 episodes average reward 1.000000 
## Epsilon0.999000 
## rand steps:1 
## replaymem size GB:1.66147947311401e-06 
## learning rate: 0.000250000011874363  
## Episode: 2 finished with steps:1, rewards:1.000000 global step 2 
## Last 100 episodes average reward 1.000000 
## Epsilon0.998002 
## rand steps:1 
## replaymem size GB:3.28570604324341e-06 
## learning rate: 0.000999000505544245  
## Episode: 3 finished with steps:1, rewards:1.000000 global step 3 
## Last 100 episodes average reward 1.000000 
## Epsilon0.997004 
## rand steps:1 
## replaymem size GB:4.91738319396973e-06 
## learning rate: 0.000998002011328936