Systems, Man & Cybernetics - April 2016 - 29

◆

◆

◆

◆

◆

small block, and fusing separate results together. In
fact, in the fields of high-performance computing and
very large database, this strategy has been used for
many years.
Parallelization. Parallelization indicates that large problems are divided into smaller ones, which can then be
solved individually at the same time. There are several
different forms of parallel computing, such as bit level,
instruction level, and task parallelism. It is noteworthy
that parallelization cannot decrease workload but can
reduce working hours. It is not such a case that each
problem/algorithm can be parallelized well. It depends
strongly on the nature and structure of the problem.
Incremental learning. Incremental learning gradually
improves the parameters in learning algorithms by
using only new cases rather than using all available
cases (existing ones plus new ones). Incremental learning is a step-by-step learning process. Training is conducted only on the new incoming data blocks. One
data block is used for training only once. It is focusing
on batch data or streaming data. The major defect of
incremental learning is that the algorithm is required
to have good memory. For the data blocks trained
already, its knowledge is considered as being remembered well and saved within the model. It is an obviously a limitation of this strategy [12].
Sampling. Sampling is an old technique in probability
and statistics. There are many typical results of sampling, theoretically and technically. Commonly used
sampling methods include simple random sampling,
systematic sampling, stratified sampling, cluster sampling, quota sampling, minimum-maximum sampling,
etc. [13]. Essentially, sampling technology is to study
the relation between a sample and the population. A
traditional sampling course does not focus on the
large-scale data set. With the coming of the big data
era, many new difficulties emerge.
Granular computing. A recent study [14] reveals that
granular computing (GrC) [15] is a general computation theory for effectively using granules such as classes, clusters, subsets, groups, and intervals to build an
efficient computational model for complex applications with huge amounts of data. Intuitively, GrC is to
reduce the data size into different levels of granularity.
Under certain circumstance, some big data problems
can be readily solved in such a way.
Feature selection. Feature selection [16] is a kind of
dimensionality reduction method that aims to obtain a
representative subset that has fewer features in
comparison with the original feature space. Highdimensional data belongs to the big data area. When
the scale of features is too large (for example, over 100
trillion features), some unexpected difficulties may
emerge during the process of feature selection. The latest study [17] introduced how to scale to ultrahigh
dimensional feature selection task on big data.

1) Divide
and
Conquer

7) Hierarchical
Classes
6) Feature
Selection
Strategies
in Big Data
Analytics

2) Parallelization

5) Granular
Computing

3) Incremental
Learning
4) Sampling

Figure 2. Seven fundamental strategies for big data

analytics.

Uncertainty-Based Big Data Learning
During recent years, one can view a rapid growth in the
hybrid study that integrates uncertainty and learning from
data (e.g., [18]-[24]). The representation, measure, modeling, and handling of uncertainty embedded in the entire
process of data analytics have a significant impact on the
performance of learning from big data. Without properly
dealing with these uncertainties, the performance of learning strategies may be greatly degraded.
Uncertainty Definition
Presently, there is no general definition for uncertainty
that fits any situation. We usually consider the uncertainty
under a specific background. Five types of uncertainty are
mentioned: Shannon entropy (SE) [21], classification
entropy (CE) [23], fuzziness [18] [19], nonspecificity [22],
and rough degree [24].
◆ Shannon entropy. Given a random variable X =
" x 1, x 2, f, x n , a nd it s probabi l it y d i st r ibut ion
P = " p 1, p 2, f, p n ,, the random uncertainty is measured by Shannon entropy:
SE ^ P h = - | p i log 2 ^ p i h .
n

i=1

1
When p 1 = p 2 = g = p n = n , SE ^ P h attains its maximum of 1.
◆ Classification entropy. For a two-class problem, there is a
data set S of which each sample can be defined as positive
class or negative class. Classification entropy means the
impurity of the class distribution in S and is defined as
CE 2 ^ P h = - <
Ap ri l 2016

S+
S+
SSF,
log 2
+
log 2
S
S
S
S

IEEE SyStEmS, man, & CybErnEtICS magazInE

29



Table of Contents for the Digital Edition of Systems, Man & Cybernetics - April 2016

Systems, Man & Cybernetics - April 2016 - Cover1
Systems, Man & Cybernetics - April 2016 - Cover2
Systems, Man & Cybernetics - April 2016 - 1
Systems, Man & Cybernetics - April 2016 - 2
Systems, Man & Cybernetics - April 2016 - 3
Systems, Man & Cybernetics - April 2016 - 4
Systems, Man & Cybernetics - April 2016 - 5
Systems, Man & Cybernetics - April 2016 - 6
Systems, Man & Cybernetics - April 2016 - 7
Systems, Man & Cybernetics - April 2016 - 8
Systems, Man & Cybernetics - April 2016 - 9
Systems, Man & Cybernetics - April 2016 - 10
Systems, Man & Cybernetics - April 2016 - 11
Systems, Man & Cybernetics - April 2016 - 12
Systems, Man & Cybernetics - April 2016 - 13
Systems, Man & Cybernetics - April 2016 - 14
Systems, Man & Cybernetics - April 2016 - 15
Systems, Man & Cybernetics - April 2016 - 16
Systems, Man & Cybernetics - April 2016 - 17
Systems, Man & Cybernetics - April 2016 - 18
Systems, Man & Cybernetics - April 2016 - 19
Systems, Man & Cybernetics - April 2016 - 20
Systems, Man & Cybernetics - April 2016 - 21
Systems, Man & Cybernetics - April 2016 - 22
Systems, Man & Cybernetics - April 2016 - 23
Systems, Man & Cybernetics - April 2016 - 24
Systems, Man & Cybernetics - April 2016 - 25
Systems, Man & Cybernetics - April 2016 - 26
Systems, Man & Cybernetics - April 2016 - 27
Systems, Man & Cybernetics - April 2016 - 28
Systems, Man & Cybernetics - April 2016 - 29
Systems, Man & Cybernetics - April 2016 - 30
Systems, Man & Cybernetics - April 2016 - 31
Systems, Man & Cybernetics - April 2016 - 32
Systems, Man & Cybernetics - April 2016 - 33
Systems, Man & Cybernetics - April 2016 - 34
Systems, Man & Cybernetics - April 2016 - 35
Systems, Man & Cybernetics - April 2016 - 36
Systems, Man & Cybernetics - April 2016 - 37
Systems, Man & Cybernetics - April 2016 - 38
Systems, Man & Cybernetics - April 2016 - 39
Systems, Man & Cybernetics - April 2016 - 40
Systems, Man & Cybernetics - April 2016 - 41
Systems, Man & Cybernetics - April 2016 - 42
Systems, Man & Cybernetics - April 2016 - 43
Systems, Man & Cybernetics - April 2016 - 44
Systems, Man & Cybernetics - April 2016 - 45
Systems, Man & Cybernetics - April 2016 - 46
Systems, Man & Cybernetics - April 2016 - 47
Systems, Man & Cybernetics - April 2016 - 48
Systems, Man & Cybernetics - April 2016 - 49
Systems, Man & Cybernetics - April 2016 - 50
Systems, Man & Cybernetics - April 2016 - 51
Systems, Man & Cybernetics - April 2016 - 52
Systems, Man & Cybernetics - April 2016 - 53
Systems, Man & Cybernetics - April 2016 - 54
Systems, Man & Cybernetics - April 2016 - 55
Systems, Man & Cybernetics - April 2016 - 56
Systems, Man & Cybernetics - April 2016 - Cover3
Systems, Man & Cybernetics - April 2016 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/smc_202110
https://www.nxtbook.com/nxtbooks/ieee/smc_202107
https://www.nxtbook.com/nxtbooks/ieee/smc_202104
https://www.nxtbook.com/nxtbooks/ieee/smc_202101
https://www.nxtbook.com/nxtbooks/ieee/smc_202010
https://www.nxtbook.com/nxtbooks/ieee/smc_202007
https://www.nxtbook.com/nxtbooks/ieee/smc_202004
https://www.nxtbook.com/nxtbooks/ieee/smc_202001
https://www.nxtbook.com/nxtbooks/ieee/smc_201910
https://www.nxtbook.com/nxtbooks/ieee/smc_201907
https://www.nxtbook.com/nxtbooks/ieee/smc_201904
https://www.nxtbook.com/nxtbooks/ieee/smc_201901
https://www.nxtbook.com/nxtbooks/ieee/smc_201810
https://www.nxtbook.com/nxtbooks/ieee/smc_201807
https://www.nxtbook.com/nxtbooks/ieee/smc_201804
https://www.nxtbook.com/nxtbooks/ieee/smc_201801
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1017
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0717
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0417
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0117
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1016
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0716
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0416
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0116
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1015
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0715
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0415
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0115
https://www.nxtbookmedia.com