IEEE Systems, Man and Cybernetics Magazine - April 2023 - 32

Recall TPR TP FN
TP
()=
=
+
+
TNR TN FP
TN
.
(2)
(3)
F-Measure and Balance Accuracy
The F-measure (Fm) is a metric that combines precision
and recall in a single score, which is the harmonic mean of
the precision and recall. Balance accuracy (Bal.Accu) normalizes
TP and TN predictions by the number of positive
and negative samples and divides their sum by two.
Though most previous works adopt accuracy as their evaluation
metrics, we use Bal.Accu [37] to evaluate our model
performance due to its merits in evaluating imbalanced
classes in the dataset. The formal definition of Fm and
Bal.Accu [36] is as follows:
Fm 2)a
=
Ba .lAccu =a
Precision Recall
Precision Recall
)
+
TPR TNR
2
+
k.
k
(4)
(5)
VSE
Ev
CA
Eav
SA
ASE
Ea
CA
BLF
Figure 5. The end-to-end audiovisual AD. The front-end network
consists of the visual stream encoder (VSE) and audio stream
encoder (ASE). The back-end network consists of the CA module on
each stream, followed by BLF, SA, and AD.
Prediction
ADNet
Multimodal speech perception has caught
research attention recently, with a significant
breakthrough in the audiovisual
approach using learning methods [14], [15],
[16]. Inspired by the recently proposed
TALKNET [38], proposed for active speaker
detection, we proposed ADNet. ADNet is
a two-stream-based, end-to-end framework
that inputs variable temporal lengths (segments)
of cropped face regions and corresponding
audio segments as the input and
predicts if the person is addressing a robot
or another person in each video frame. As
shown in Figure 5, ADNet consists of the
front-end network, a feature representation,
and the back-end network, the AD
classifier. The front-end network consists
of an audio stream encoder (ASE) and a
x5
ResNet-18
V-TCN
Visual Temporal
Network
Conv3D
Visual Frontend
The Front-End Network
DS-Conv1D, ReLU, and BN
Conv1D
Pointwise Addition
Figure 6. The VSE framework: the visual front end,
which contains the Conv3D and ResNet-18, and the
visual temporal network. Conv1D: 1D convolutional
layer; Conv3D: 3D convolutional layer; V-TCN:
DS-Conv1D: depthwise separable convolutional
layer; ReLU: rectified linear unit; visual temporal
convolutional module; x5: multiply(times) by five.
32 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE April 2023
The VSE
The VSE inputs a stack of N consecutive 112 112#
grayscale
cropped face regions. It aims to learn the long-term
representation of facial regions' movement. The stream
contains two submodels: the visual front end and visual
temporal network, shown in Figure 6. The objective is to
encode the visual stream into a sequence of visual embeddings
Ev
that have the same time resolution. For the visual
front-end network, we adopted 3D-ResNet, the widely used
visual encoding network in various problems, such as lipreading,
audiovisual speech enhancement, and active
visual stream encoder (VSE). The ASE and VSE input
frame-based audio and visual signals and encode them into
audio and video embeddings representing the temporal
context. The back-end network consists of three modules:
1) the CA module to associate the visual and audio content
dynamically, 2) the BLF to fuse the two modalities, and 3)
the SA module to monitor the addressee activities from the
context at the utterance level. The network accepts varying
lengths of input as well as cropped faces ranging from a
single frame to any size and predicts the addressee in each
frame at a time.

IEEE Systems, Man and Cybernetics Magazine - April 2023

Table of Contents for the Digital Edition of IEEE Systems, Man and Cybernetics Magazine - April 2023

IEEE Systems, Man and Cybernetics Magazine - April 2023 - Cover1
IEEE Systems, Man and Cybernetics Magazine - April 2023 - Cover2
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 1
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 2
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 3
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 4
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 5
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 6
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 7
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 8
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 9
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 10
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 11
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 12
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 13
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 14
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 15
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 16
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 17
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 18
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 19
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 20
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 21
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 22
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 23
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 24
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 25
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 26
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 27
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 28
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 29
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 30
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 31
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 32
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 33
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 34
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 35
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 36
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 37
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 38
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 39
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 40
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 41
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 42
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 43
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 44
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 45
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 46
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 47
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 48
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 49
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 50
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 51
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 52
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 53
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 54
IEEE Systems, Man and Cybernetics Magazine - April 2023 - Cover3
IEEE Systems, Man and Cybernetics Magazine - April 2023 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/smc_202310
https://www.nxtbook.com/nxtbooks/ieee/smc_202307
https://www.nxtbook.com/nxtbooks/ieee/smc_202304
https://www.nxtbook.com/nxtbooks/ieee/smc_202301
https://www.nxtbook.com/nxtbooks/ieee/smc_202210
https://www.nxtbook.com/nxtbooks/ieee/smc_202207
https://www.nxtbook.com/nxtbooks/ieee/smc_202204
https://www.nxtbook.com/nxtbooks/ieee/smc_202201
https://www.nxtbook.com/nxtbooks/ieee/smc_202110
https://www.nxtbook.com/nxtbooks/ieee/smc_202107
https://www.nxtbook.com/nxtbooks/ieee/smc_202104
https://www.nxtbook.com/nxtbooks/ieee/smc_202101
https://www.nxtbook.com/nxtbooks/ieee/smc_202010
https://www.nxtbook.com/nxtbooks/ieee/smc_202007
https://www.nxtbook.com/nxtbooks/ieee/smc_202004
https://www.nxtbook.com/nxtbooks/ieee/smc_202001
https://www.nxtbook.com/nxtbooks/ieee/smc_201910
https://www.nxtbook.com/nxtbooks/ieee/smc_201907
https://www.nxtbook.com/nxtbooks/ieee/smc_201904
https://www.nxtbook.com/nxtbooks/ieee/smc_201901
https://www.nxtbook.com/nxtbooks/ieee/smc_201810
https://www.nxtbook.com/nxtbooks/ieee/smc_201807
https://www.nxtbook.com/nxtbooks/ieee/smc_201804
https://www.nxtbook.com/nxtbooks/ieee/smc_201801
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1017
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0717
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0417
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0117
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1016
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0716
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0416
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0116
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1015
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0715
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0415
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0115
https://www.nxtbookmedia.com