```H 0: µ ND − µSD < −∆ and µ ND − µSD > +∆ (not equivalent)
vs.
HA: −∆ < µ ND − µSD < +∆ (equivalent)
The threshold delta (∆) represents a range of difference
that is not large enough to have any clinical or functional
implications. The idea is that small differences in product
performance are not always functionally important.
In many cases in early product development, we know
the target of our product performance, but we rarely know
its specification until after our manufacturing capability in
producing the product is developed.
If the specifications are known in advance, delta must be
smaller than the original specification so the entire distribution of the new product performance still falls within
specification. The delta must capture the probability of in
product performance risk at the certain significance level
that may cause some dissatisfaction or harm to users if the
product performance does not conform to its specification.
We can now restate H 0 as follows:
H 0-1: µ ND − µSD < −∆ and H 0-2: µ ND − µSD > +∆
To prove equivalency of two designs, we reject H 0-1 to
determine the difference of two product performances
is larger than the minimum allowable difference (−∆). We
reject H 0-2 to determine the difference of two product
performances is smaller than the maximum allowable
difference (+∆). If we reject both of these hypotheses, then
the difference falls between the minimum and maximum
allowable difference.
In practice, the difference will almost never be zero. We
then can say that the difference is small and has no practical significance or no significantly different impact on the
customers who use the product.
This leads to the most basic form of equivalence
testing-the two one-sided test (TOST) shown in Figure 2-
which means we essentially must compute TOST statistics.
Because we perform a series of tests, we must reduce the
error in each individual test so the cumulative error of the
overall result is still within the alpha significance level.
We declare the two group means equivalent at the delta
level if and only if both are rejected. If, under a certain
confidence interval, the difference is completely contained
in the interval with endpoints −∆ and +∆, then we declare
equivalence. Unlike classical testing, we want to be able
to say the difference is very likely zero (beyond random
chance). If the delta decreases or increases, the test tends

to reject or pass the equivalency (HA: −∆ < µ ND − µSD < +∆, or
the area between broken lines), respectively.
If the test doesn't reject the null hypothesis, we can't
claim equivalency. The probability that the test will reject
when the hypothesis is false indicates the power of the
TOST. This probability increases with sample size. We want
this to happen, but it might not reject the null if the sample
size is too small, therefore sample size is important.
Sample size calculation also depends on the effect size,
such as the ratio between delta and standard deviation.
Choose a sample size between 30 to 50 samples for each
group to have good test power (>80%), but beyond that is
probably an unnecessary use of resources.3 Larger samples
are needed if the effect size is small.
The appropriate statistical method to verify a new
design's product performance is therefore to use equivalence testing in the following six steps:
1. Determine the equivalence value, delta (∆), so that if
FIGURE 2

Equivalence test designed to reject
a null regarding a difference
That is, µND − µSD < −∆, then to reject a difference of the
opposite kind, that is, µND − µSD > +∆. Having rejected both, the
small difference −∆ < µND − µSD < +∆ is the negligible effect size.
0.4
0.3
0.2

H0 − 1: μND − μSD < −Δ

HA − 1: μND − μSD > −Δ

0.1
Probability of occurrence

into the hypothesis structure as follows:

0
0.4
0.3
0.2

0
0.4

HA:
−Δ < μND − μSD < +Δ

0.3
0.2

H0 − 2: μND − μSD > +Δ

HA − 2: μND − μSD < +Δ

0.1

H0 − 2: μND − μSD > +Δ

H0 − 1: μND − μSD < −Δ

0.1
0

−6
−Δ

−4

−2
0
2
Product performance

4

6
+Δ

```
