vignettes/define_custom_environments.Rmd
define_custom_environments.Rmd
If you want to use this package for your self defined task, you need to implement your own R6 class to represent the environment which must inherit the rlR::Environment
Class. You could define other public and private members as you like which do not collide with the names in rlR::Environment
. Type the following to have a look at the documentation of rlR::Environment
help(topic="Environment", package = "rlR")
env = rlR:::EnvToy$new()
rlR:::EnvToy
is an R6 class which inherit rlR::Environment
.
class(env)
## [1] "EnvToy" "Environment" "R6"
There are 3 methods you must override when defining your own Environment class.
env$initialize # the fields 'act_cnt' and 'state_dim' must be defined here
## function (...)
## {
## self$act_cnt = c(2)
## self$state_dim = c(4)
## }
## <environment: 0x5575e6d5bd78>
env$reset # The return must be a list with fields state(must be an array), reward = NULL, done = FALSE, and info = list()
## function ()
## {
## return(list(state = array(rnorm(self$state_dim), dim = self$state_dim),
## reward = NULL, done = FALSE, info = list()))
## }
## <environment: 0x5575e6d5bd78>
env$step # The return must be a list with fields state(must be an array), reward(numeric), done(Boolean), and info (list of anything or empty list)
## function (action)
## {
## return(list(state = array(rnorm(self$state_dim), dim = self$state_dim),
## reward = 1, done = TRUE, info = list()))
## }
## <environment: 0x5575e6d5bd78>
Afterwards you could choose one of the available Agents to check if the newly defined environments works.
agent = initAgent("AgentDQN", env)
agent$learn(3)
## Episode: 1 finished with steps:1, rewards:1.000000 global step 1
## Last 100 episodes average reward 1.000000
## Epsilon0.999000
## rand steps:1
## replaymem size GB:1.66147947311401e-06
## learning rate: 0.000250000011874363
## Episode: 2 finished with steps:1, rewards:1.000000 global step 2
## Last 100 episodes average reward 1.000000
## Epsilon0.998002
## rand steps:1
## replaymem size GB:3.28570604324341e-06
## learning rate: 0.000999000505544245
## Episode: 3 finished with steps:1, rewards:1.000000 global step 3
## Last 100 episodes average reward 1.000000
## Epsilon0.997004
## rand steps:1
## replaymem size GB:4.91738319396973e-06
## learning rate: 0.000998002011328936