IEEE Systems, Man and Cybernetics Magazine - April 2023 - 27
AD by integrat ing facial and
acoust ic features, leveraging
short-term and long-term audiovisual
features.
Realistic interaction scenarios
need to be captured wherein the
humanoid robot's nonverbal behavior
induces the human's nonverbal
behavior using facial and audio
features. For instance, humans
should look at a picture when the
robot indicates, the robot should
look at the human partner when
discussing a painting, or humans
should be allowed to talk to one another and talk to the
robot. However, the existing datasets do not support this
kind of scenario. They were recorded in a meeting room
with fixed participants in either human-to-human or
human-to-robot settings [17], [18]. As a result, the area has
not been widely explored and remains stagnant. This study
presents an audiovisual spatiotemporal annotated dataset
called E-MuMMER, built by extending the MuMMER dataset
recorded in mixed human-to-human and human-torobot
open settings with variable participants. Figure 1
shows a sample of labeled face frames.
Today, numerous tasks, such as image classification
Studies in human-
robot interactions
indicate that humans
tend to speak to a
computer more loudly
and slowly than when
speaking to humans.
◆ We built an E-MuMMER dataset
consisting of spatiotemporal
annotations of spoken
activity through extending
the existing HRI dataset,
MuMMER, recorded in humanto-human
and human-to-robot
settings.
◆ We propose a novel two-streambased
deep learning framework
for AD that combines facial and
audio features considering
long-term and short-term temporal
features.
◆ We propose a back-end network that consists of audiovisual
CA, BLF, and SA to learn the audiovisual intermodality
interaction.
◆ The ablation experiments reveal that using BLF outperforms
the deterministic fusion approach (concatenation)
by a 1% accuracy performance gain.
Simultaneously using the CA and SA modules to learn
intermodality interaction has significantly improved
the prediction performance. However, using each attention
module separately does not show a significant performance
gain.
Following the introduction, this article is organized as
[19], object detection [20], natural language processing [21],
active speaker detection [16], and speech recognition [22]
are using deep learning for better feature representation.
However, previous AD was widely explored using statistical
and rule-based approaches. These approaches are only
suitable for specific tasks and do not generalize to other situations,
e.g., different movements and communication
expressions or a different number of participants. Minth
et al. [2] tried to address the problem by proposing a deep
learning framework that takes different human cues, specifically
eye gazes and transcripts of an utterance corpus,
into account to predict the conversational addressee from a
specific speaker's view in various real-life conversational
scenarios. However, the detection is performed from a
third-party angle and uses an artificially generated utterance
from a static image.
We propose a novel end-to-end, two-stream-based
deep learning framework called ADNet that uses the proposed
dataset to leverage facial and audio features.
ADNet makes predictions by considering both short-term
and long-term audiovisual features. We hypothesize that
robust information can be extracted by the intermodal
synchronization of face and speech throughout an utterance,
exploring the audiovisual CA mechanism to capture
intermodality cues, and leveraging the SA module on
these features to find the important features and longterm
speaking evidence that could improve model prediction
performance.
In summary, the contributions of this article are discussed
as follows:
follows: The " Related Work " section discusses the previously
proposed dataset and related works. The " Dataset "
section presents the MuMMER dataset and the newly built
dataset. The " ADNet " section discusses the newly developed
framework. The " Experiments " section presents the
conducted experiments and a discussion. The final section
concludes this article.
Related Work
This section briefly discusses the existing AD dataset, why
we need a new dataset, how previous AD worked, and the
audiovisual fusion approach.
Figure 1. An example of labeled faces in E-MuMMER.
A green box indicates " addressing the robot " (label
" 0 " ), and a yellow box indicates " addressing another
subject " (label " 1 " ).
April 2023 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE 27
IEEE Systems, Man and Cybernetics Magazine - April 2023
Table of Contents for the Digital Edition of IEEE Systems, Man and Cybernetics Magazine - April 2023
IEEE Systems, Man and Cybernetics Magazine - April 2023 - Cover1
IEEE Systems, Man and Cybernetics Magazine - April 2023 - Cover2
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 1
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 2
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 3
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 4
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 5
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 6
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 7
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 8
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 9
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 10
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 11
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 12
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 13
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 14
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 15
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 16
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 17
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 18
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 19
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 20
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 21
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 22
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 23
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 24
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 25
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 26
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 27
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 28
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 29
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 30
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 31
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 32
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 33
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 34
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 35
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 36
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 37
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 38
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 39
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 40
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 41
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 42
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 43
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 44
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 45
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 46
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 47
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 48
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 49
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 50
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 51
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 52
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 53
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 54
IEEE Systems, Man and Cybernetics Magazine - April 2023 - Cover3
IEEE Systems, Man and Cybernetics Magazine - April 2023 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/smc_202310
https://www.nxtbook.com/nxtbooks/ieee/smc_202307
https://www.nxtbook.com/nxtbooks/ieee/smc_202304
https://www.nxtbook.com/nxtbooks/ieee/smc_202301
https://www.nxtbook.com/nxtbooks/ieee/smc_202210
https://www.nxtbook.com/nxtbooks/ieee/smc_202207
https://www.nxtbook.com/nxtbooks/ieee/smc_202204
https://www.nxtbook.com/nxtbooks/ieee/smc_202201
https://www.nxtbook.com/nxtbooks/ieee/smc_202110
https://www.nxtbook.com/nxtbooks/ieee/smc_202107
https://www.nxtbook.com/nxtbooks/ieee/smc_202104
https://www.nxtbook.com/nxtbooks/ieee/smc_202101
https://www.nxtbook.com/nxtbooks/ieee/smc_202010
https://www.nxtbook.com/nxtbooks/ieee/smc_202007
https://www.nxtbook.com/nxtbooks/ieee/smc_202004
https://www.nxtbook.com/nxtbooks/ieee/smc_202001
https://www.nxtbook.com/nxtbooks/ieee/smc_201910
https://www.nxtbook.com/nxtbooks/ieee/smc_201907
https://www.nxtbook.com/nxtbooks/ieee/smc_201904
https://www.nxtbook.com/nxtbooks/ieee/smc_201901
https://www.nxtbook.com/nxtbooks/ieee/smc_201810
https://www.nxtbook.com/nxtbooks/ieee/smc_201807
https://www.nxtbook.com/nxtbooks/ieee/smc_201804
https://www.nxtbook.com/nxtbooks/ieee/smc_201801
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1017
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0717
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0417
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0117
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1016
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0716
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0416
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0116
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1015
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0715
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0415
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0115
https://www.nxtbookmedia.com