Fig. 5. Steps needed for the classification of a harvester based on sound recording. (a) Using a conventional neural network. (b) Using a convolutional neural network. can have great effect on preparing the data to be suitable for training [47]. Normalization was done by dividing the above features by the total number of frequency samples N used: fi f i for i = 1,2, ...,6. N As a classifier, a neural network with one hidden layer with five nodes using backpropagation for training yielded accuracies of 78.26 when using 5000 samples of audio segments. The total time required for feature extraction and final classification was 2.5 ms using an Asus laptop with a 64-bit Intel i7 CPU @ 2.6 GHz and 16 GB of ram. We used the Matlab platform version 2017b using parallel processing via an NVIDIA GeForce GTX 960 M GPU card with 10 Gb of memory. Fig. 5a shows the steps taken. 98 Classification Based on a CNN The absolute value of the Short-Time Fourier Transform (STFT), known as the Spectogram (SG) was used to obtain images. A seven-layer CNN was used. The layers are: 1. Image input layer with 'zerocenter' normalization 2. Convolution layer, eight 8x8x3 convolutions with stride [1 1] and no padding 3. ReLU layer 4. Average Pooling layer with 2x2 average pooling with stride [2 2] and no padding 5. Fully Connected layer 6. Softmax 7. Classification Output layer, crossentropyex Using the same number of samples as before, the CNN achieved accuracies of 97.97 in an execution time below the IEEE Instrumentation & Measurement Magazine April 2021