IEEE Power & Energy Magazine - May/June 2018 - 21

```Tuesday

Wednesday

Thursday

Friday

Saturday

Sunday

1.0

1881

Demand (kWh)

Monday

0.5
0.0
0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24
Time of Day
Percentile

10

25

50

75

90

figure 3. The demand distribution of the least typical household out of the 500 smart meters included in the analysis.

Typical and Anomalous Households
to study the whole group of household demand distributions, we will first compute the differences in electricity consumption patterns between pairs of households. statistically
speaking, we call these differences distances. note that the
distance used here refers to the distance between two probability distributions rather than the physical distance between
two houses. one way to measure the distance between two
distributions is the Jensen-shannon divergence. We have 336
probability distributions per household, one for each half-hour
period of the week, so we have 336 Jensen-shannon distance
measures for each pair of households. We can measure the
overall distance between the distributions from two households by summing these 336 Jensen-shannon distance measures. in this way, we can find the distance between each pair
of households in the data set.
from these pairwise distances, we can compute a measure
of the typicality of a specific household by seeing how many
similar houses are nearby according to the Jensen-shannon divergence. if there are many households with similar
probability distributions, the typicality measure will be
high. But if there are few similar households, the typicality
measure will be low. this gives us a way to find anomalies in the data set, which are the smart meters corresponding to the least typical households. the most anomalous
(i.e., least typical) household is shown in figure 3. this is
may/june 2018

clearly a very strange demand distribution, with extremely
low demand almost all of the time, reflected by almost
overlapping percentiles.

Visualization via Embedding
the pairwise distances between households can also be used to
create a plot of all households together. if we compute 99 percentiles for 48 half hours per day and seven days a week, each
of the household distributions can be thought of as a vector
in K -dimensional space where K = 99 # 48 # 7 = 33, 264.
to easily visualize these, we need to project them onto a 2-D
space. there are several ways of doing this, such as principal component analysis (pca) and multidimensional scaling.
We have used a laplacian eigenmap method to keep the most
similar points in K-dimensional space as close as possible in
the 2-D space.
figure 4 shows a 2-D embedding of the 500 households in
this data set. the colors are taken from the measure of typicality, with the most typical 1% of points shown in red and the

Laplacian Embedding of Smart-Meter Distributions
2
Component 2

distribution) because the data set contains a large number of
zeros, making the distribution a mixture of discrete and continuous components. the high skewness of the data, and the
nonnegative nature of demand, makes it problematic to use
kernel density estimates.
there are several advantages to working with percentiles
rather than the data directly. problems with missing observations and the specific timing of household events (e.g.,
parties) are avoided, and attention is focused on the typical
behavior of a household throughout the week. although only
five percentiles are shown in figure 2, we actually compute
percentiles for probabilities 1, 2, ..., 99%.

0
4

−2

−4

5
3

2

1
−2

HDRs
50
1
99 >99
−1
0
Component 1

1

2

figure 4. A 2-D representation of the data from all 500
households. The most typical points are shown in red,
and the most unusual are shown in black. HDR: high
density region.
ieee power & energy magazine

21

21

```

