Systems, Man & Cybernetics - April 2016 - 28

1) Complex
Data
Representation
6) Ubiquitous
Uncertainty

2) Super-High
Dimensionality
Challenges in
Big Data
Analytics

5) Unscalable
Computation
Ability

3) Massive
Classes
4) Weak
Relation

Figure 1. the six main challenges in big data

analytics.

processing; it is indispensable. But due to big
data's multimodality, it is very difficult to uniformly represent various types of data. It means that
using existing methodologies to handle big data is
almost impossible. It brings the first challenge of
big data analytics.
◆ Super-high dimensionality. Big data in specific
domains, especially in bio-informatics or life science
computing areas, is often extra-high dimensional. The
problem is that existing algorithms are not wellscalable to high-dimensional data. Usually, with the
increase of the data dimension, the required amounts
of time or memory go up exponentially. This is the socalled curse of dimensionality. Zhai et al. [4] gave a
detailed description of the rapid change of the data
set's dimensions in the field of scientific research over
the past 25 years. Many machine learning and data
mining algorithms are designed based on a distance
measure in a metric space, for instance, the popular
k-nearest neighbor. Studies [5] and [6] show that, in a
high-dimension space, the distance measure has a very
strange phenomenon; that is, some fixed points are the
nearest neighbors of every case in the space. It is
called hubness, which indicates that the distance formula has been ineffective and invalid.
◆ Massive classes. In the big data era, we have to deal
with classification tasks with thousands of classes,
such as the large-scale recognition problem. The existing classifiers seem to be qualified for the classification
tasks, but their performance is seriously downgraded.
Study [7] clearly describes the scale of the problem.
◆ Weak relation. A relation is more general than a mapping [8], [9], and finding a relation is more difficult than
finding a mapping when conducting big data analytics.
For example, the labels may be missing or cases may be
28

IEEE SyStEmS, man, & CybErnEtICS magazInE A pri l 2016

labeled erroneously in classification tasks. The high
expense for labeling cases leads to the weakly supervised problem. Traditionally, we need to find a mapping
from a set of cases to another set. In most situations in a
big data setting, we only need to find a relation between
two subsets of cases. This is because sometimes in a big
data setting, we may not need an exact mapping, and
often, it is impossible to find such a precise mapping.
◆ Unscalable computation ability. The current computational ability is not scalable to the big data problem.
Existing learning algorithms cannot adapt themselves
well to the new big data settings. It means both the
problem complexity and computational ability
increase remarkably in the big data era, but the
increase of computational ability does not match well
against the increase of problem complexity. When a
data set is changing from a regular size to a large size
with many type attributes, some frequently used data
mining and machine learning algorithms, such as a
support vector machine, a neural network, a decision
tree, C-means, and C-modes, will not work well. In
many domains, a learning/mining algorithm is recognized as being effective for big data only if its complexity is linear or quasi-linear.
◆ Ubiquitous uncertainty. Uncertainty exists in every
phase of big data learning [10]. For example, big data
often has much noise, and most attribute values of a
case in big data are missing (e.g., there are 80%~90%
missing links in social networks and over 90% missing
attribute values for a doctor diagnosis in clinic and
health fields). Some traditional learning algorithms have
obviously not been valid for processing the data with
90% missing values, and, therefore, how to design the
new learning algorithm to tackle the large-scale missing
data is difficult. Moreover, there are many models that
can be selected for big data processing. Due to the growing uncertainty existing in the selection process, choosing an appropriate model based on the formulated
uncertainty is another big challenge. The third difficulty
is how to well represent the data uncertainty and how to
take it into the mining process in the data analytics
phase. From normal-sized data to big data, does the
uncertainty increase or decrease? It depends. For example, for the mean of a random variable, uncertainty will
decrease due to the large numbers theorem, but for the
model selection problem, it will increase.
Current Strategies of Big Data Analytics
Fundamental strategies (shown in Figure 2) for big data
analytics may include divide-and-conquer, parallelization,
incremental learning, sampling, granular computing, feature selection, and hierarchical classes.
◆ Divide-and-conquer. Just as M. Jordan highlighted in
[11], divide-and-conquer is one of the fundamental
strategies of processing big data. It has three basic procedures: going from big to small, processing in every



Table of Contents for the Digital Edition of Systems, Man & Cybernetics - April 2016

Systems, Man & Cybernetics - April 2016 - Cover1
Systems, Man & Cybernetics - April 2016 - Cover2
Systems, Man & Cybernetics - April 2016 - 1
Systems, Man & Cybernetics - April 2016 - 2
Systems, Man & Cybernetics - April 2016 - 3
Systems, Man & Cybernetics - April 2016 - 4
Systems, Man & Cybernetics - April 2016 - 5
Systems, Man & Cybernetics - April 2016 - 6
Systems, Man & Cybernetics - April 2016 - 7
Systems, Man & Cybernetics - April 2016 - 8
Systems, Man & Cybernetics - April 2016 - 9
Systems, Man & Cybernetics - April 2016 - 10
Systems, Man & Cybernetics - April 2016 - 11
Systems, Man & Cybernetics - April 2016 - 12
Systems, Man & Cybernetics - April 2016 - 13
Systems, Man & Cybernetics - April 2016 - 14
Systems, Man & Cybernetics - April 2016 - 15
Systems, Man & Cybernetics - April 2016 - 16
Systems, Man & Cybernetics - April 2016 - 17
Systems, Man & Cybernetics - April 2016 - 18
Systems, Man & Cybernetics - April 2016 - 19
Systems, Man & Cybernetics - April 2016 - 20
Systems, Man & Cybernetics - April 2016 - 21
Systems, Man & Cybernetics - April 2016 - 22
Systems, Man & Cybernetics - April 2016 - 23
Systems, Man & Cybernetics - April 2016 - 24
Systems, Man & Cybernetics - April 2016 - 25
Systems, Man & Cybernetics - April 2016 - 26
Systems, Man & Cybernetics - April 2016 - 27
Systems, Man & Cybernetics - April 2016 - 28
Systems, Man & Cybernetics - April 2016 - 29
Systems, Man & Cybernetics - April 2016 - 30
Systems, Man & Cybernetics - April 2016 - 31
Systems, Man & Cybernetics - April 2016 - 32
Systems, Man & Cybernetics - April 2016 - 33
Systems, Man & Cybernetics - April 2016 - 34
Systems, Man & Cybernetics - April 2016 - 35
Systems, Man & Cybernetics - April 2016 - 36
Systems, Man & Cybernetics - April 2016 - 37
Systems, Man & Cybernetics - April 2016 - 38
Systems, Man & Cybernetics - April 2016 - 39
Systems, Man & Cybernetics - April 2016 - 40
Systems, Man & Cybernetics - April 2016 - 41
Systems, Man & Cybernetics - April 2016 - 42
Systems, Man & Cybernetics - April 2016 - 43
Systems, Man & Cybernetics - April 2016 - 44
Systems, Man & Cybernetics - April 2016 - 45
Systems, Man & Cybernetics - April 2016 - 46
Systems, Man & Cybernetics - April 2016 - 47
Systems, Man & Cybernetics - April 2016 - 48
Systems, Man & Cybernetics - April 2016 - 49
Systems, Man & Cybernetics - April 2016 - 50
Systems, Man & Cybernetics - April 2016 - 51
Systems, Man & Cybernetics - April 2016 - 52
Systems, Man & Cybernetics - April 2016 - 53
Systems, Man & Cybernetics - April 2016 - 54
Systems, Man & Cybernetics - April 2016 - 55
Systems, Man & Cybernetics - April 2016 - 56
Systems, Man & Cybernetics - April 2016 - Cover3
Systems, Man & Cybernetics - April 2016 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/smc_202110
https://www.nxtbook.com/nxtbooks/ieee/smc_202107
https://www.nxtbook.com/nxtbooks/ieee/smc_202104
https://www.nxtbook.com/nxtbooks/ieee/smc_202101
https://www.nxtbook.com/nxtbooks/ieee/smc_202010
https://www.nxtbook.com/nxtbooks/ieee/smc_202007
https://www.nxtbook.com/nxtbooks/ieee/smc_202004
https://www.nxtbook.com/nxtbooks/ieee/smc_202001
https://www.nxtbook.com/nxtbooks/ieee/smc_201910
https://www.nxtbook.com/nxtbooks/ieee/smc_201907
https://www.nxtbook.com/nxtbooks/ieee/smc_201904
https://www.nxtbook.com/nxtbooks/ieee/smc_201901
https://www.nxtbook.com/nxtbooks/ieee/smc_201810
https://www.nxtbook.com/nxtbooks/ieee/smc_201807
https://www.nxtbook.com/nxtbooks/ieee/smc_201804
https://www.nxtbook.com/nxtbooks/ieee/smc_201801
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1017
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0717
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0417
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0117
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1016
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0716
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0416
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0116
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1015
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0715
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0415
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0115
https://www.nxtbookmedia.com