IEEE Robotics & Automation Magazine - March 2016 - 98

PGPE Policy Updates
In our work, we use PGPE-based policy updates [6], [33]. Policy parameter w is stochastically sampled from prior distribution p (w | t) with hyperparameter t. In other words, the
policy is deterministic, but its parameter is stochastic.
In PGPE, hyperparameter t is optimized to maximize expected return J (t) (Table 1, 1.1). Optimal hyperparameter
)
t) is given by t := argmax t J (t) . In practice, a gradient
method is used to find t ): t # t + fDt, where
Dt = d t J (t) is the derivative of J with respect to t (Table 1,
1.2) and f is the learning rate. We approximated derivative
d t J (t) by the empirical average (Table 1, 1.3).

Phase 1: Interact with Robot and Collect Data

Policy
Generator

Action

1 Parameter

Policy

wn

State
Reward

Efficient Reuse of Previous Experiences

Importance Weight
The original PGPE can be considered an on-policy algorithm
[27], where the data collected from the current policy are used
to estimate the policy gradients. However, to reuse the previous experiences, we need to evaluate the current policy with
the data collected by the previous policies. To do this, we need
an off-policy algorithm through which the data-collecting policy and the policy to be updated are different. Therefore, we
use an off-policy version of the PGPE algorithm. In this offpolicy method, importance weighting [8] is used to evaluate
the previously collected data (experience)
from the current policy's point of view.
This method is called an IW-PGPE [16],
[24], [26], [34].
The basic idea of importance
weighting is to weight samples drawn
from a sampling distribution to match
Robot
the target distribution. For PGPE, importance weight v was defined for current hyperparameter t that was used
in previous experiences:

2 Repeat Until t = T

3 Repeat N Times

Phase 2: Add the Experienced Data to Database
Di
Add
Database

(b)
Phase 3: Update the Hyperparameter of a Policy

Policy
Generator

2 Dti

1t

3 vi

Di

4 Dt

Di-1

Di-L
Database

5 Repeat S Times
(c)
Figure 1. The flowchart of the proposed method: (a) Phase I, (b) Phase II, and
(c) Phase III.

98

*

IEEE ROBOTICS & AUTOMATION MAGAZINE

*

march 2016

p (w l | t )
.
p (w l | t l )

(4)

This weight indicates how much the
previous experience contributed to the
current policy update. The approximated derivative of the expected return is
then weighted by this importance
weight for reusing the previous experiences. Table 1 shows the weighted derivatives using the previous experiences,
where w ln represents a policy parameter
generated from previous hyperparameter tl and hln represents the trajectory
of the previous experiences.

(a)

Hyperparameter
Generated Policy Parameter
Acquired Reward

v (w l ) =

Learning Procedure
The learning procedure of our proposed
method is shown in Figure 1, which repeats from Phases 1 to 3 until the learning performance is converged:
● Phase 1: Collect data in a real environment.
● Phase 2: Add the collected data to a
database.
● Phase 3: Update the hyperparameters
of the current policy using the stored
data in the database.
In Phase 1, 1 policy parameter w n is
sampled from prior distribution p (w | t).
2 Then, a trajectory is acquired from the
real environment using the policy with the
sampled policy parameter. In 3 , 1 and



Table of Contents for the Digital Edition of IEEE Robotics & Automation Magazine - March 2016

IEEE Robotics & Automation Magazine - March 2016 - Cover1
IEEE Robotics & Automation Magazine - March 2016 - Cover2
IEEE Robotics & Automation Magazine - March 2016 - 1
IEEE Robotics & Automation Magazine - March 2016 - 2
IEEE Robotics & Automation Magazine - March 2016 - 3
IEEE Robotics & Automation Magazine - March 2016 - 4
IEEE Robotics & Automation Magazine - March 2016 - 5
IEEE Robotics & Automation Magazine - March 2016 - 6
IEEE Robotics & Automation Magazine - March 2016 - 7
IEEE Robotics & Automation Magazine - March 2016 - 8
IEEE Robotics & Automation Magazine - March 2016 - 9
IEEE Robotics & Automation Magazine - March 2016 - 10
IEEE Robotics & Automation Magazine - March 2016 - 11
IEEE Robotics & Automation Magazine - March 2016 - 12
IEEE Robotics & Automation Magazine - March 2016 - 13
IEEE Robotics & Automation Magazine - March 2016 - 14
IEEE Robotics & Automation Magazine - March 2016 - 15
IEEE Robotics & Automation Magazine - March 2016 - 16
IEEE Robotics & Automation Magazine - March 2016 - 17
IEEE Robotics & Automation Magazine - March 2016 - 18
IEEE Robotics & Automation Magazine - March 2016 - 19
IEEE Robotics & Automation Magazine - March 2016 - 20
IEEE Robotics & Automation Magazine - March 2016 - 21
IEEE Robotics & Automation Magazine - March 2016 - 22
IEEE Robotics & Automation Magazine - March 2016 - 23
IEEE Robotics & Automation Magazine - March 2016 - 24
IEEE Robotics & Automation Magazine - March 2016 - 25
IEEE Robotics & Automation Magazine - March 2016 - 26
IEEE Robotics & Automation Magazine - March 2016 - 27
IEEE Robotics & Automation Magazine - March 2016 - 28
IEEE Robotics & Automation Magazine - March 2016 - 29
IEEE Robotics & Automation Magazine - March 2016 - 30
IEEE Robotics & Automation Magazine - March 2016 - 31
IEEE Robotics & Automation Magazine - March 2016 - 32
IEEE Robotics & Automation Magazine - March 2016 - 33
IEEE Robotics & Automation Magazine - March 2016 - 34
IEEE Robotics & Automation Magazine - March 2016 - 35
IEEE Robotics & Automation Magazine - March 2016 - 36
IEEE Robotics & Automation Magazine - March 2016 - 37
IEEE Robotics & Automation Magazine - March 2016 - 38
IEEE Robotics & Automation Magazine - March 2016 - 39
IEEE Robotics & Automation Magazine - March 2016 - 40
IEEE Robotics & Automation Magazine - March 2016 - 41
IEEE Robotics & Automation Magazine - March 2016 - 42
IEEE Robotics & Automation Magazine - March 2016 - 43
IEEE Robotics & Automation Magazine - March 2016 - 44
IEEE Robotics & Automation Magazine - March 2016 - 45
IEEE Robotics & Automation Magazine - March 2016 - 46
IEEE Robotics & Automation Magazine - March 2016 - 47
IEEE Robotics & Automation Magazine - March 2016 - 48
IEEE Robotics & Automation Magazine - March 2016 - 49
IEEE Robotics & Automation Magazine - March 2016 - 50
IEEE Robotics & Automation Magazine - March 2016 - 51
IEEE Robotics & Automation Magazine - March 2016 - 52
IEEE Robotics & Automation Magazine - March 2016 - 53
IEEE Robotics & Automation Magazine - March 2016 - 54
IEEE Robotics & Automation Magazine - March 2016 - 55
IEEE Robotics & Automation Magazine - March 2016 - 56
IEEE Robotics & Automation Magazine - March 2016 - 57
IEEE Robotics & Automation Magazine - March 2016 - 58
IEEE Robotics & Automation Magazine - March 2016 - 59
IEEE Robotics & Automation Magazine - March 2016 - 60
IEEE Robotics & Automation Magazine - March 2016 - 61
IEEE Robotics & Automation Magazine - March 2016 - 62
IEEE Robotics & Automation Magazine - March 2016 - 63
IEEE Robotics & Automation Magazine - March 2016 - 64
IEEE Robotics & Automation Magazine - March 2016 - 65
IEEE Robotics & Automation Magazine - March 2016 - 66
IEEE Robotics & Automation Magazine - March 2016 - 67
IEEE Robotics & Automation Magazine - March 2016 - 68
IEEE Robotics & Automation Magazine - March 2016 - 69
IEEE Robotics & Automation Magazine - March 2016 - 70
IEEE Robotics & Automation Magazine - March 2016 - 71
IEEE Robotics & Automation Magazine - March 2016 - 72
IEEE Robotics & Automation Magazine - March 2016 - 73
IEEE Robotics & Automation Magazine - March 2016 - 74
IEEE Robotics & Automation Magazine - March 2016 - 75
IEEE Robotics & Automation Magazine - March 2016 - 76
IEEE Robotics & Automation Magazine - March 2016 - 77
IEEE Robotics & Automation Magazine - March 2016 - 78
IEEE Robotics & Automation Magazine - March 2016 - 79
IEEE Robotics & Automation Magazine - March 2016 - 80
IEEE Robotics & Automation Magazine - March 2016 - 81
IEEE Robotics & Automation Magazine - March 2016 - 82
IEEE Robotics & Automation Magazine - March 2016 - 83
IEEE Robotics & Automation Magazine - March 2016 - 84
IEEE Robotics & Automation Magazine - March 2016 - 85
IEEE Robotics & Automation Magazine - March 2016 - 86
IEEE Robotics & Automation Magazine - March 2016 - 87
IEEE Robotics & Automation Magazine - March 2016 - 88
IEEE Robotics & Automation Magazine - March 2016 - 89
IEEE Robotics & Automation Magazine - March 2016 - 90
IEEE Robotics & Automation Magazine - March 2016 - 91
IEEE Robotics & Automation Magazine - March 2016 - 92
IEEE Robotics & Automation Magazine - March 2016 - 93
IEEE Robotics & Automation Magazine - March 2016 - 94
IEEE Robotics & Automation Magazine - March 2016 - 95
IEEE Robotics & Automation Magazine - March 2016 - 96
IEEE Robotics & Automation Magazine - March 2016 - 97
IEEE Robotics & Automation Magazine - March 2016 - 98
IEEE Robotics & Automation Magazine - March 2016 - 99
IEEE Robotics & Automation Magazine - March 2016 - 100
IEEE Robotics & Automation Magazine - March 2016 - 101
IEEE Robotics & Automation Magazine - March 2016 - 102
IEEE Robotics & Automation Magazine - March 2016 - 103
IEEE Robotics & Automation Magazine - March 2016 - 104
IEEE Robotics & Automation Magazine - March 2016 - 105
IEEE Robotics & Automation Magazine - March 2016 - 106
IEEE Robotics & Automation Magazine - March 2016 - 107
IEEE Robotics & Automation Magazine - March 2016 - 108
IEEE Robotics & Automation Magazine - March 2016 - 109
IEEE Robotics & Automation Magazine - March 2016 - 110
IEEE Robotics & Automation Magazine - March 2016 - 111
IEEE Robotics & Automation Magazine - March 2016 - 112
IEEE Robotics & Automation Magazine - March 2016 - 113
IEEE Robotics & Automation Magazine - March 2016 - 114
IEEE Robotics & Automation Magazine - March 2016 - 115
IEEE Robotics & Automation Magazine - March 2016 - 116
IEEE Robotics & Automation Magazine - March 2016 - 117
IEEE Robotics & Automation Magazine - March 2016 - 118
IEEE Robotics & Automation Magazine - March 2016 - 119
IEEE Robotics & Automation Magazine - March 2016 - 120
IEEE Robotics & Automation Magazine - March 2016 - Cover3
IEEE Robotics & Automation Magazine - March 2016 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2023
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2022
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2021
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2020
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2019
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2018
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2017
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2016
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2015
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2014
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2013
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2012
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_june2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_march2011
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_december2010
https://www.nxtbook.com/nxtbooks/ieee/roboticsautomation_september2010
https://www.nxtbookmedia.com