Error Analysis of digitally converted Chart Data using Chart2Data

1. Error Sources of Chart Conversion

1.1. Discretization Error

Obviously no features in a plot smaller than one pixel can be resolved in the converted data. Assuming the data area is 1000 times 1000 pixels large and the ranges for x- and y-axis are 0 to 1 than the discretization error is 1/1000. When using this discretization error it is assumed that the data area is cut out accurately. The discretization by hand and ruler produces a discretization error of 1/100 assuming the plot is 10cm by 10cm large and using a ruler with 1mm as smallest length unit. Thus the discretization by image file is 10 times more accurate.

1.2. Error due to inaccurate Cut out

This error depends on the alignment of the plot meaning if the y-axis is absolutely vertical and the x-axis absolutely horizontal the cut out process can be done very accurately. In case the frame of the data area has a thickness of only one pixel, the cut out error is one pixel. Experience has shown that the cut out process can be done with an accuracy of 2 pixel.

1.3. Error due to rotated image.

In case of scanned plots it is likely that the plot is slightly rotated. To demonstrate this error it is assumed to discretize a plot of 1000 times 1000 pixels containing a line originate at x=0 and y=0 with a slope m=1. It is also assumed that this plot is rotated by 0.5° which is visible with the naked eye. The influence on the accuracy of the x- and y-values is investigated in Fig. 1.

It is obvious that the error due to rotation of the image is smallest in the center of the image. Thus the error is evaluated for the edges of the plot. The error in y-axis is also increasing significantly with increasing slope of the curve. Fortunately this error source can be compensated by counter rotation of the image using image processing software such as GIMP2. Thus this error source is neglected.

error_fig_1

Fig. 1: Systematic error due to rotated image.

1.4. Error due to distorted plots

Sometimes when old sheets or frequently copied sheets are used to digitize a plot it is possible that the plots are distorted so that the data area is not perfectly rectangular anymore. Obviously this leads to errors on one or both axes. If the distortion is too large it is not useful to apply this technique since the result would be inaccurate due to the large distortion error. A distortion error of 2 pixels is assumed by experience.

2. Error propagation

The discretization error, the error due to cut out, and the error due to distortion are independent and random for both axes. This kind of the error can add up or cancel out. In this case the error propagation for the absolute uncertainties is according to the following equation:

Error in x-axis:

error_eq_1

Error in y-axis:

error_eq_2

Where index 1 stands for the discretization error, index 2 represents the cut out error, and index 3 is a wild-card for the distortion error.

This leads to a combined absolute uncertainty for x and y of:

error_eq_3_4

To finally get a total error δ123 = δ123(x) + δ123(y) the error propagation for the total errors δ123(x) and δ123(y) cannot be expressed in general since the influence of δ123(x) on δ123(y) depends strongly on the shape of the digitized curve (geometric slope). The influence is depending on the geometric slope instead of the slope of the function because in the plot both axes can have a different scale. This is demonstrated in Fig. 1. In case of a linear curve with a geometric slope m=1 the error d123(x) will induce an error on the y-axis of the same magnitude (δ123(y) = δ123(x)) and the total error is δ123 = ((3 pixels)2 + (3 pixels)2)0.5 = 4.2 pixels. When considering a curve with a geometric slope m=2 the error on the y-axis is δ123(y) = 2δ123(x) (δ123 = 9.4 pixels). Thus for slowly varying curves the slope at the point of interest is a measure for the translation of δ123(x) to δ123(y).

3. Sensitivity Study

An actual conversion from chart to data is investigated to check the error propagation presented in the previous section. This is done by an investigation of a conversion of a reproducible function.

3.1. Conversion of a reproducible Sine-function

A plot of the function y=sin(x) is printed on a sheet. This sheet is scanned with an arbitrary rotation to demonstrate the compensation of the error due to rotated plots (see Fig. 2 and Fig. 3). After preparation of the scanned image as described at the beginning of this section the image is converted to data (see Fig. 4).

error_fig_2

Fig. 2: original Scan (rotated arbitrarily).

error_fig_3

Fig. 3: Rotation compensated by 1.0° rotation counter clock wise.

error_fig_4

Fig. 4: converted values of Scan compared to original function values.

The error sources for this conversion are: (1) the discretization error (1 pixel) and (2) the error due to inaccurate cut-out (2 pixels). The distortion error can be neglected because the plot is scanned from a freshly printed sheet.

With these error sources the error propagation is as follows:

error_eq_5_6

Geometric slope m=1:

error_eq_7

Geometric slope m=2.7:

error_eq_8

error_fig_5

Fig. 5: Absolute error of converted data.

Fig. 5 shows the absolute errors of the conversion from image to data for the scanned sine function. This plot was created by subtracting the scanned y-values from the y-values calculated by the function y(x) = sin(x).

As one can see the actual errors are within the range of the predicted errors (up to 6.3 pixels) of the error propagation for this specific conversion.

Another interesting thing about this plot is that the errors are actually large where the geometric slopes are large (see x =π, 2π, 3π).