The Cpa Qualification Method Based on the Gaussian Curve Fitting

The Correlation Power Analysis (CPA) attack is an attack on cryptographic devices, especially smart cards. The results of the attack are correlation traces. Based on the correlation traces, an evaluation is done to observe whether significant peaks appear in the traces or not. The evaluation is done manually, by experts. If significant peaks appear then the smart card is not considered secure since it is assumed that the secret key is revealed. We develop a method that objectively detects peaks and decides which peak is significant. We conclude that using the Gaussian curve fitting method, the subjective qualification of the peak significance can be objectified. Thus, better decisions can be taken by security experts. We also conclude that the Gaussian curve fitting method is able to show the influence of peak sizes, especially the width and height, to a significance of a particular peak.


INTRODUCTION
Cryptographic devices [1] are electronic devices that implement a cryptographic algorithm and that store keys. An example of a cryptographic device is a smart card. A smart card is a device that has the same size as a credit card. It is able to store data and to process data by using an integrated chip. To process data, the chip performs a cryptographic algorithm that employs a secret key. Any attempt to extract the keys stored in the cryptographic device in an unauthorized way is called an attack. One class of attacks that poses serious threat to the security of cryptographic devices are the side-channel attacks. A side-channel attack is an attack applying information gained from the physical implementation of a cryptographic device, for example timing information, power consumption, and electromagnetic leaks.
One type of side-channel attacks is Correlation Power analysis (CPA) attacks. This type of attack is a refinement of another type of side channel attacks called Differential Power Analysis (DPA) attack, that was first introduced in 1999 in [2]. The CPA attack, which was introduced [3] in 2004, is a multi-bit DPA taking into account the linear relationship between the power consumption curve and the Hamming model. In general, this attack exploits the fact that the power consumption of a cryptographic device depends on the data it processes and the operation it performs [1]. By conducting this attack, an attacker may obtain the secret keys used in the cryptographic algorithm employed by the device.
In this paper, we focus on the CPA attack on smart cards. CPA is relatively easy to be carried out and has a high success rate. It is not necessary for the attacker to have detailed knowledge about the smart cards. It is sufficient to know the steps of the cryptographic algorithm that is executed by the smart cards. That is why a lot of research is done to improve the security of smart cards against this attack.
The result of the CPA attack is represented by correlation traces [3]. Based on the correlation traces, an evaluation is done to observe whether significant peaks appear in the traces or not. If significant peaks appear then the smart card is not considered secure since it is assumed that the secret key is revealed. If there are no significant peaks, the smart card is secure. The higher and steeper the peaks, the stronger the attack and the less secure the smart card is.
The difficulty is to objectively decide whether a peak is significant enough to be called a peak. To support the decision making process, we develop a method to detect peaks and to decide which peak is significant.

THE CPA ATTACK
The CPA attack is based on two important concepts, i.e., leakage function and bit/byte trace.
A leakage function [4] is an abstraction used to represent the physical output of a side-channel, monitored by some measurement setup. The input of a leakage function is a plaintext that will be processed by a cryptographic device. In the CPA attack, the output of this leakage function is the power consumption of the cryptographic device sampled with a fixed sampling frequency while processing the input plaintext. In this project, the output of the leakage function is called a power trace.
Practically, a power trace from a smart card is obtained by measuring the power consumption of the smart card while processing a binary input. The power trace is not the end result of the process, but the intermediate result. For example, if a smart card employs some cryptographic algorithm with several rounds where each round uses one specific secret key, the power trace is taken after one round is finished. A byte trace [1] is an approach to monitor a predictable byte during the course of the process. In the context of a power analysis, the byte trace approach is applied to check leakage of some cryptographic device. The result of the byte trace approach is a correlation coefficient between the input and the power trace at one time. All the resulted correlation coefficient is called correlation traces.
The investigation to check whether the smart card is leaking is done based on the correlation trace plot (see Figure 1). If there is a high peak on the plot, it means that the investigated byte has a high correlation with the power consumption at the time point at which this high peak appears. This fact already shows that there is some information leaking from the smart card. More detailed explanation about the steps of the CPA attack can be found in [1].

RELATED WORKS
[5] discusses methods to evaluate and compare side-channel attacks. Some simple numerical examples of leakage function and some illustration how the functions could be evaluated and understood are given in [6]. The methods are based on two metrics: information theoretic and security metric. However, these two metrics cannot be used to solve our problem since the metrics need a lot of power traces, where each is obtained using different sets of input plaintexts. The more power traces are provided, the more accurate the results are. The fact is, power trace measurements are very expensive. Thus, carrying a lot of measurements to get results for one smart card is not practical for companies.
A method to detect peaks is also discussed in [7] by using short-time FFT. The method also includes noise removal techniques. The method is developed for MALDI data, which has different behavior from our data.
[8] and [9] introduces peak detection methods using wavelet transformation. The methods consider some characteristic shapes to identify peaks. However, the characteristic shapes introduced in this paper cannot be adapted in our problem.
A method to quantify peak is discussed in [10]. The method is developed for mass spectrum related to protein mixtures. The mass spectrum contains peaks corresponding to proteins in a sample. A statistical mixture model is developed to quantify peaks. However, the quantification mostly depends on peak height.

THE SIGNIFICANT PEAK DETECTION APPROACH
Our approach to determine whether a peak is significant or not consists of two main methods. We first develop a method to assign a score to each peak found in a correlation trace. This method is based on the Gaussian curve fitting method. Second, based on the resulted peak scores, we determine whether a peak is significant or not using the Absolute Score Distance computation and the clustering analysis.

The Gaussian Curve Fitting Method
We develop a method based on the Gaussian curve fitting method to give a score to each peak found in a correlation trace.
Since the correlation traces typically have too many sample points, we downsample it first. The resulted downsampled correlation trace is put in a vector called local_maxima. The main idea of this approach is to fit a curve to the local_maxima vector of each correlation trace and qualify each peak found in the new curve. We choose a sum of several Gaussian functions to fit our correlation trace local maxima. The Gaussian function is formulated as follows: with a is the height of the curve, b is the center of the curve, and c is the width of the curve.
Since the values of a and b can be obtained from the correlation trace, we only have to estimate c before we can apply the curve fitting function.
Suppose that we have a set of points {(x 1 , y 1 « (x n , y n )} that we want to fit to a curve. Consider Equation 1, suppose that a=y t and b=x m . The value of c is estimated by using the following steps: 1. Compute c i for each (x i , y i ) with given a and b using the following formula which is based on Equation 1.
After we get c, we are ready to apply the Gaussian Curve Fitting to our correlation traces. The general algorithm we use to apply the method is given below.
The input of the algorithm are X, which is a correlation trace with X(i) is the correlation coefficient for sample i, and window_size, the size of the sliding window. The output are the peak location, the peak height, the peak area, the peak inverse width, the scores of all peaks found, and the normalized scores of all peaks found. The steps of the algorithm are as follows: 1. Apply the sliding window of size window_size to the absolute value of the correlation trace X and find the global maximum of all values within the window. Slide the window without overlapping and repeat the same operation until the window reaches the last sample. Form a vector local_ maxima that contains the resulted global maximum. By performing this process, we replace all values within a window with the global maximum of values within the window. The global maximum is chosen to represent values within one window because we are interested in significant peaks. 2. Determine shorter length vectors from the vector local_maxima such that each smaller vector consists of at least one maximum and minimum values. Each smaller vector belongs to one Gaussian curve fitting function and should contain at least three members. The shorter length vectors are forms by using the following steps: (a) Suppose that local_maxima = {l 1 , l 2 « l q } with l i the ith member of the local_maxima vector and q the length of local_maxima. Suppose that we start from l r with l rí "l r . , by using the steps explained previously. 5. Determine the peak location for each shorter length vector. Since the position of the peak need not to be a sample point, we increase the resolution with a factor 10. 6. Obtain the peak properties, i.e., peak height and inverse width, from the value of a and c 1 , respectively. 7. Compute the area below each Gaussian function to get the peak area. Suppose that {l v , l v+1 «,l v+m } is a shorter length vector that belongs to one Gaussian function. The area A is computed as: 8. Since the peak properties found have different scale values, rescale each peak property so that the values are between 0 and 1. This can be done using the following way. Suppose Y p = {y 1,p , y 2,p , « y m,p } is a set of peak property values with m the number of peaks found, p is referring to a peak property, and i is referring to the ith peak found, rescaled_y i,p is computed as rescaled_y i,p = ) max( , p p i Y y 9. Suppose p 1 , p 2 , and p 3 are the three peak properties defined previously, namely the peak height, the peak area, and the peak inverse width. Also suppose that S={s 1 , s 2 « s m } is a set peak scores with m the number of peaks found and i is referring to the ith peak. Compute the peak score s i by using the following formula: The scores obtained are between 0 and 1. 10. Compute the normalized peak score norm_s i for the ith peak found as    To maintain the stability of the resulted scores that can reduce because of the data down sampling, we employ four different window sizes to get the scores. We start the score calculation from the highest window size to the lowest. Therefore, the result from this method is a matrix with each row consist of a set of four scores (each score obtained by applying one window size) for each peak found in the correlation trace.

The Peak Score Evaluation
After we obtain scores for all peaks found in a correlation trace, we would like to investigate whether the peak obtaining the highest score is a significant peak or not. We develop two methods to check properties of the highest score peak when it is compared with other peaks. The decision whether a peak is significant or not is made based on the results given by all three methods. Thus, the methods do not work independently. The methods are explained below.

Average score distance
One characteristic of a peak to be a significant peak is that the peak score should be a lot greater then those of the other peaks. Therefore, we compute the average score distance between the highest peak score with other peak scores in one correlation trace. The computation is done as follows.
Suppose that S i = {s i,1 , s i,2 , s i,3 , s i,4 } is a multivariate score of the ith peak after window 1 , window 2 , window 3 , and window 4 are applied, respectively. The average score distance is computed by the following steps:

Cluster analysis
We consider that cluster analysis is useful to show that a peak is significant or not. If a peak is significant, then we assume the peak score is really different with the rest of the scores. By applying a cluster analysis, we would like to show that if a peak is significant then its score becomes a unique member of a cluster, while the other scores are clustered in one different cluster.
In practice, we use cluster analysis on the multivariate peak scores obtained, to form two clusters of peak scores within one correlation trace. The peak score clustering is done using the Statgraphics &HQWXULRQ VRIWZDUH $W WKH PRPHQW ZH XVH :DUG ¶V method (see [11]) in clustering the peak scores, with Euclidean distance as a method to compute the distance between two peak scores. Other clustering method may also be used without significant result differences. If the highest peak score is a unique member of a cluster, then the possibility that the highest peak score is significant becomes more likely.
We consider that a significant peak should have a score of at least 0.50. The score of 0.50 is taken based on the idea of probability theory. A probability of 0.50 means there is an equal chance that an event to happen or not. We perform more analysis to the clustering analysis results using the following steps: 1. Consider the cluster containing the highest score peak 2. If the cluster has one member and the score of the member is higher than 0.50, then the member is a significant peak. 3. If the cluster has more than one members, check WKH VFRUHV RI DOO PHPEHUV ,I DOO PHPEHUV ¶ VFRUHV are higher than 0.50, then the highest score peak is a significant peak. If not, then the highest score peak is not significant.

EXPERIMENTAL RESULTS
We were provided with three data sets by Brightsight B.V., a security evaluator laboratory located in Delft, the Netherlands. The data sets were sampled from a smart card, with a sampling frequency of 500 MHz, while processing input plaintexts. The operation used in the process is a 16 rounds of an XOR operation defined as c = p + k, with c a ciphertext, p an input plaintext, and k a secret key. Each data set consists of power traces and 16 correlation traces taken from 250000 time points; each correlation trace obtained from each processing round.
The first data set, called Data_No Countermeasures, was obtained from a smart card without any countermeasures. The second and third data sets, called Data_Few Dummy Cycles and Data_More DummyCycles, respectively, are data sets obtained with some dummy cycles. Dummy cycles are processes that are more or less identical to each other. Practically, the dummy cycles do nothing and they are irrelevant to the process carried out by the smart card.
The dummy cycles are inserted randomly based on hardware random function, to make the smart card more secure.
We apply the Gaussian curve fitting method to the 16 correlation traces obtained from the byte trace approach of each data set. The results from this step for each correlation trace are a list of all peaks found in the trace along with the peak properties and the score for each peak. Table 1 shows the result using one window size, i.e., 1250 samples, on the first correlation trace of the Data_NoCountermeasures.
In Table 1, it is shown that there are 15 peaks found in the correlation trace. All the peak properties are normalized so that the values are between 0 and 1. The fifth peak is the highest scored one, with a score of 0.7073. We can observe that using the Gaussian curve fitting method, we can replace the original correlation trace with scores.  After that we also apply the evaluation methods to determine average score distances and to decide whether the highest score peak is significant or not. In this section, we provide the evaluation method results on peak scores computed for four window sizes, i.e., [2200 1500 800 100] samples. The results are given in Table 2, 3, and 4. Each table contains average score distance, significant peak decisions, the location of the highest score peak, and the height of the highest score peak, for 16 correlation traces taken from each Data_NoCountermeasures, Data_FewDummyCycles, and Data_MoreDummyCycles. The significant peak decisions give values 0 and 1. The value 0 indicates that the highest score peak is not significant, and the value 1 indicates the opposite.   Table 2 shows that among all 16 correlation traces of Data_NoCountermeasures, only one correlation trace does not have a significant peak. The other correlation traces have a signifiant peak with the average score distance generally higher than 0.80. We also observe that the peak locations in general are around the same point, which is the time point between 90000 and 97000. Based on the results, we conclude that the smart card is not secure.
From Table 3, we observe that most of the significant peaks disappear because of the dummy cycles addition. It also shows that the average score distance of the highest score peaks found in the Data_FewDummyCycles are mostly greater than 0.50 and most of them are not significant. The peak locations now are also not centralized in a certain time point range. This shows us that adding some dummy cycles improve the security of the smart card. The data Data_MoreDummyCycles was obtainned from the smart card with dummy cycles inserted in every 4 to 20 cycles. This means that the data contains more dummy cycles than the Data_Few DummyCycles. Consistent with this fact, the results on Table 4 show that now only one correlation trace has a significant peak with a rather low average score distance. This shows that, even though this countermeasure setting does not make the smart card completely secure, it is more secure than the other settings.

CONCLUDING REMARKS
We conclude that the Gaussian curve fitting method is able to give scores to each peak found in a correlation trace. The scores represent the original correlation trace. The average score distance is able to represent the peak significance by a number, while the cluster analysis method is able to represent the peak significance by showing to which cluster the highest peak score belongs to. Using the Gaussian curve fitting method, the subjective qualification of the peak significance can be objectified. Thus, better decisions can be taken by security experts. We also conclude that the Gaussian curve fitting method is able to show the influence of peak sizes, especially the width and height, to a significance of a particular peak.