3
<!-- This HTML file has been created by texi2html 1.54+ (gsl)
4
from ../gsl-ref.texi -->
6
<TITLE>GNU Scientific Library -- Reference Manual - Statistics</TITLE>
7
<!-- <LINK rel="stylesheet" title="Default Style Sheet" href="/css/texinfo.css" type="text/css"> -->
8
<link href="gsl-ref_21.html" rel=Next>
9
<link href="gsl-ref_19.html" rel=Previous>
10
<link href="gsl-ref_toc.html" rel=ToC>
14
<p>Go to the <A HREF="gsl-ref_1.html">first</A>, <A HREF="gsl-ref_19.html">previous</A>, <A HREF="gsl-ref_21.html">next</A>, <A HREF="gsl-ref_50.html">last</A> section, <A HREF="gsl-ref_toc.html">table of contents</A>.
18
<H1><A NAME="SEC327" HREF="gsl-ref_toc.html#TOC327">Statistics</A></H1>
20
<A NAME="IDX1673"></A>
21
<A NAME="IDX1674"></A>
22
<A NAME="IDX1675"></A>
23
<A NAME="IDX1676"></A>
24
<A NAME="IDX1677"></A>
25
<A NAME="IDX1678"></A>
26
<A NAME="IDX1679"></A>
27
<A NAME="IDX1680"></A>
28
<A NAME="IDX1681"></A>
29
<A NAME="IDX1682"></A>
33
This chapter describes the statistical functions in the library. The
34
basic statistical functions include routines to compute the mean,
35
variance and standard deviation. More advanced functions allow you to
36
calculate absolute deviations, skewness, and kurtosis as well as the
37
median and arbitrary percentiles. The algorithms use recurrence
38
relations to compute average quantities in a stable way, without large
39
intermediate values that might overflow.
43
The functions are available in versions for datasets in the standard
44
floating-point and integer types. The versions for double precision
45
floating-point data have the prefix <CODE>gsl_stats</CODE> and are declared in
46
the header file <TT>'gsl_statistics_double.h'</TT>. The versions for integer
47
data have the prefix <CODE>gsl_stats_int</CODE> and are declared in the header
48
files <TT>'gsl_statistics_int.h'</TT>.
54
<H2><A NAME="SEC328" HREF="gsl-ref_toc.html#TOC328">Mean, Standard Deviation and Variance</A></H2>
58
<DT><U>Statistics:</U> double <B>gsl_stats_mean</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
59
<DD><A NAME="IDX1683"></A>
60
This function returns the arithmetic mean of <VAR>data</VAR>, a dataset of
61
length <VAR>n</VAR> with stride <VAR>stride</VAR>. The arithmetic mean, or
62
<I>sample mean</I>, is denoted by \Hat\mu and defined as,
67
\Hat\mu = (1/N) \sum x_i
71
where x_i are the elements of the dataset <VAR>data</VAR>. For
72
samples drawn from a gaussian distribution the variance of
73
\Hat\mu is \sigma^2 / N.
79
<DT><U>Statistics:</U> double <B>gsl_stats_variance</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
80
<DD><A NAME="IDX1684"></A>
81
This function returns the estimated, or <I>sample</I>, variance of
82
<VAR>data</VAR>, a dataset of length <VAR>n</VAR> with stride <VAR>stride</VAR>. The
83
estimated variance is denoted by \Hat\sigma^2 and is defined by,
88
\Hat\sigma^2 = (1/(N-1)) \sum (x_i - \Hat\mu)^2
92
where x_i are the elements of the dataset <VAR>data</VAR>. Note that
93
the normalization factor of 1/(N-1) results from the derivation
94
of \Hat\sigma^2 as an unbiased estimator of the population
95
variance \sigma^2. For samples drawn from a gaussian distribution
96
the variance of \Hat\sigma^2 itself is 2 \sigma^4 / N.
100
This function computes the mean via a call to <CODE>gsl_stats_mean</CODE>. If
101
you have already computed the mean then you can pass it directly to
102
<CODE>gsl_stats_variance_m</CODE>.
108
<DT><U>Statistics:</U> double <B>gsl_stats_variance_m</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, double <VAR>mean</VAR>)</I>
109
<DD><A NAME="IDX1685"></A>
110
This function returns the sample variance of <VAR>data</VAR> relative to the
111
given value of <VAR>mean</VAR>. The function is computed with \Hat\mu
112
replaced by the value of <VAR>mean</VAR> that you supply,
116
<PRE class="example">
117
\Hat\sigma^2 = (1/(N-1)) \sum (x_i - mean)^2
124
<DT><U>Statistics:</U> double <B>gsl_stats_sd</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
125
<DD><A NAME="IDX1686"></A>
126
<DT><U>Statistics:</U> double <B>gsl_stats_sd_m</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, double <VAR>mean</VAR>)</I>
127
<DD><A NAME="IDX1687"></A>
128
The standard deviation is defined as the square root of the variance.
129
These functions return the square root of the corresponding variance
136
<DT><U>Statistics:</U> double <B>gsl_stats_variance_with_fixed_mean</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, double <VAR>mean</VAR>)</I>
137
<DD><A NAME="IDX1688"></A>
138
This function computes an unbiased estimate of the variance of
139
<VAR>data</VAR> when the population mean <VAR>mean</VAR> of the underlying
140
distribution is known <EM>a priori</EM>. In this case the estimator for
141
the variance uses the factor 1/N and the sample mean
142
\Hat\mu is replaced by the known population mean \mu,
146
<PRE class="example">
147
\Hat\sigma^2 = (1/N) \sum (x_i - \mu)^2
157
<DT><U>Statistics:</U> double <B>gsl_stats_sd_with_fixed_mean</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, double <VAR>mean</VAR>)</I>
158
<DD><A NAME="IDX1689"></A>
159
This function calculates the standard deviation of <VAR>data</VAR> for a
160
fixed population mean <VAR>mean</VAR>. The result is the square root of the
161
corresponding variance function.
167
<H2><A NAME="SEC329" HREF="gsl-ref_toc.html#TOC329">Absolute deviation</A></H2>
171
<DT><U>Statistics:</U> double <B>gsl_stats_absdev</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
172
<DD><A NAME="IDX1690"></A>
173
This function computes the absolute deviation from the mean of
174
<VAR>data</VAR>, a dataset of length <VAR>n</VAR> with stride <VAR>stride</VAR>. The
175
absolute deviation from the mean is defined as,
179
<PRE class="example">
180
absdev = (1/N) \sum |x_i - \Hat\mu|
184
where x_i are the elements of the dataset <VAR>data</VAR>. The
185
absolute deviation from the mean provides a more robust measure of the
186
width of a distribution than the variance. This function computes the
187
mean of <VAR>data</VAR> via a call to <CODE>gsl_stats_mean</CODE>.
193
<DT><U>Statistics:</U> double <B>gsl_stats_absdev_m</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, double <VAR>mean</VAR>)</I>
194
<DD><A NAME="IDX1691"></A>
195
This function computes the absolute deviation of the dataset <VAR>data</VAR>
196
relative to the given value of <VAR>mean</VAR>,
200
<PRE class="example">
201
absdev = (1/N) \sum |x_i - mean|
205
This function is useful if you have already computed the mean of
206
<VAR>data</VAR> (and want to avoid recomputing it), or wish to calculate the
207
absolute deviation relative to another value (such as zero, or the
214
<H2><A NAME="SEC330" HREF="gsl-ref_toc.html#TOC330">Higher moments (skewness and kurtosis)</A></H2>
218
<DT><U>Statistics:</U> double <B>gsl_stats_skew</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
219
<DD><A NAME="IDX1692"></A>
220
This function computes the skewness of <VAR>data</VAR>, a dataset of length
221
<VAR>n</VAR> with stride <VAR>stride</VAR>. The skewness is defined as,
225
<PRE class="example">
226
skew = (1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^3
230
where x_i are the elements of the dataset <VAR>data</VAR>. The skewness
231
measures the asymmetry of the tails of a distribution.
235
The function computes the mean and estimated standard deviation of
236
<VAR>data</VAR> via calls to <CODE>gsl_stats_mean</CODE> and <CODE>gsl_stats_sd</CODE>.
242
<DT><U>Statistics:</U> double <B>gsl_stats_skew_m_sd</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, double <VAR>mean</VAR>, double <VAR>sd</VAR>)</I>
243
<DD><A NAME="IDX1693"></A>
244
This function computes the skewness of the dataset <VAR>data</VAR> using the
245
given values of the mean <VAR>mean</VAR> and standard deviation <VAR>sd</VAR>,
249
<PRE class="example">
250
skew = (1/N) \sum ((x_i - mean)/sd)^3
254
These functions are useful if you have already computed the mean and
255
standard deviation of <VAR>data</VAR> and want to avoid recomputing them.
261
<DT><U>Statistics:</U> double <B>gsl_stats_kurtosis</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
262
<DD><A NAME="IDX1694"></A>
263
This function computes the kurtosis of <VAR>data</VAR>, a dataset of length
264
<VAR>n</VAR> with stride <VAR>stride</VAR>. The kurtosis is defined as,
268
<PRE class="example">
269
kurtosis = ((1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^4) - 3
273
The kurtosis measures how sharply peaked a distribution is, relative to
274
its width. The kurtosis is normalized to zero for a gaussian
281
<DT><U>Statistics:</U> double <B>gsl_stats_kurtosis_m_sd</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, double <VAR>mean</VAR>, double <VAR>sd</VAR>)</I>
282
<DD><A NAME="IDX1695"></A>
283
This function computes the kurtosis of the dataset <VAR>data</VAR> using the
284
given values of the mean <VAR>mean</VAR> and standard deviation <VAR>sd</VAR>,
288
<PRE class="example">
289
kurtosis = ((1/N) \sum ((x_i - mean)/sd)^4) - 3
293
This function is useful if you have already computed the mean and
294
standard deviation of <VAR>data</VAR> and want to avoid recomputing them.
300
<H2><A NAME="SEC331" HREF="gsl-ref_toc.html#TOC331">Autocorrelation</A></H2>
304
<DT><U>Function:</U> double <B>gsl_stats_lag1_autocorrelation</B> <I>(const double <VAR>data</VAR>[], const size_t <VAR>stride</VAR>, const size_t <VAR>n</VAR>)</I>
305
<DD><A NAME="IDX1696"></A>
306
This function computes the lag-1 autocorrelation of the dataset <VAR>data</VAR>.
310
<PRE class="example">
311
a_1 = {\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i-1} - \Hat\mu)
313
\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i} - \Hat\mu)}
323
<DT><U>Function:</U> double <B>gsl_stats_lag1_autocorrelation_m</B> <I>(const double <VAR>data</VAR>[], const size_t <VAR>stride</VAR>, const size_t <VAR>n</VAR>, const double <VAR>mean</VAR>)</I>
324
<DD><A NAME="IDX1697"></A>
325
This function computes the lag-1 autocorrelation of the dataset
326
<VAR>data</VAR> using the given value of the mean <VAR>mean</VAR>.
333
<H2><A NAME="SEC332" HREF="gsl-ref_toc.html#TOC332">Covariance</A></H2>
335
<A NAME="IDX1698"></A>
340
<DT><U>Function:</U> double <B>gsl_stats_covariance</B> <I>(const double <VAR>data1</VAR>[], const size_t <VAR>stride1</VAR>, const double <VAR>data2</VAR>[], const size_t <VAR>stride2</VAR>, const size_t <VAR>n</VAR>)</I>
341
<DD><A NAME="IDX1699"></A>
342
This function computes the covariance of the datasets <VAR>data1</VAR> and
343
<VAR>data2</VAR> which must both be of the same length <VAR>n</VAR>.
347
<PRE class="example">
348
covar = (1/(n - 1)) \sum_{i = 1}^{n} (x_i - \Hat x) (y_i - \Hat y)
357
<DT><U>Function:</U> double <B>gsl_stats_covariance_m</B> <I>(const double <VAR>data1</VAR>[], const size_t <VAR>stride1</VAR>, const double <VAR>data2</VAR>[], const size_t <VAR>n</VAR>, const double <VAR>mean1</VAR>, const double <VAR>mean2</VAR>)</I>
358
<DD><A NAME="IDX1700"></A>
359
This function computes the covariance of the datasets <VAR>data1</VAR> and
360
<VAR>data2</VAR> using the given values of the means, <VAR>mean1</VAR> and
361
<VAR>mean2</VAR>. This is useful if you have already computed the means of
362
<VAR>data1</VAR> and <VAR>data2</VAR> and want to avoid recomputing them.
369
<H2><A NAME="SEC333" HREF="gsl-ref_toc.html#TOC333">Weighted Samples</A></H2>
372
The functions described in this section allow the computation of
373
statistics for weighted samples. The functions accept an array of
374
samples, x_i, with associated weights, w_i. Each sample
375
x_i is considered as having been drawn from a Gaussian
376
distribution with variance \sigma_i^2. The sample weight
377
w_i is defined as the reciprocal of this variance, w_i =
378
1/\sigma_i^2. Setting a weight to zero corresponds to removing a
379
sample from a dataset.
384
<DT><U>Statistics:</U> double <B>gsl_stats_wmean</B> <I>(const double <VAR>w</VAR>[], size_t <VAR>wstride</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
385
<DD><A NAME="IDX1701"></A>
386
This function returns the weighted mean of the dataset <VAR>data</VAR> with
387
stride <VAR>stride</VAR> and length <VAR>n</VAR>, using the set of weights <VAR>w</VAR>
388
with stride <VAR>wstride</VAR> and length <VAR>n</VAR>. The weighted mean is defined as,
392
<PRE class="example">
393
\Hat\mu = (\sum w_i x_i) / (\sum w_i)
400
<DT><U>Statistics:</U> double <B>gsl_stats_wvariance</B> <I>(const double <VAR>w</VAR>[], size_t <VAR>wstride</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
401
<DD><A NAME="IDX1702"></A>
402
This function returns the estimated variance of the dataset <VAR>data</VAR>
403
with stride <VAR>stride</VAR> and length <VAR>n</VAR>, using the set of weights
404
<VAR>w</VAR> with stride <VAR>wstride</VAR> and length <VAR>n</VAR>. The estimated
405
variance of a weighted dataset is defined as,
409
<PRE class="example">
410
\Hat\sigma^2 = ((\sum w_i)/((\sum w_i)^2 - \sum (w_i^2)))
411
\sum w_i (x_i - \Hat\mu)^2
415
Note that this expression reduces to an unweighted variance with the
416
familiar 1/(N-1) factor when there are N equal non-zero
423
<DT><U>Statistics:</U> double <B>gsl_stats_wvariance_m</B> <I>(const double <VAR>w</VAR>[], size_t <VAR>wstride</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, double <VAR>wmean</VAR>)</I>
424
<DD><A NAME="IDX1703"></A>
425
This function returns the estimated variance of the weighted dataset
426
<VAR>data</VAR> using the given weighted mean <VAR>wmean</VAR>.
432
<DT><U>Statistics:</U> double <B>gsl_stats_wsd</B> <I>(const double <VAR>w</VAR>[], size_t <VAR>wstride</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
433
<DD><A NAME="IDX1704"></A>
434
The standard deviation is defined as the square root of the variance.
435
This function returns the square root of the corresponding variance
436
function <CODE>gsl_stats_wvariance</CODE> above.
442
<DT><U>Statistics:</U> double <B>gsl_stats_wsd_m</B> <I>(const double <VAR>w</VAR>[], size_t <VAR>wstride</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, double <VAR>wmean</VAR>)</I>
443
<DD><A NAME="IDX1705"></A>
444
This function returns the square root of the corresponding variance
445
function <CODE>gsl_stats_wvariance_m</CODE> above.
451
<DT><U>Statistics:</U> double <B>gsl_stats_wvariance_with_fixed_mean</B> <I>(const double <VAR>w</VAR>[], size_t <VAR>wstride</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, const double <VAR>mean</VAR>)</I>
452
<DD><A NAME="IDX1706"></A>
453
This function computes an unbiased estimate of the variance of weighted
454
dataset <VAR>data</VAR> when the population mean <VAR>mean</VAR> of the underlying
455
distribution is known <EM>a priori</EM>. In this case the estimator for
456
the variance replaces the sample mean \Hat\mu by the known
461
<PRE class="example">
462
\Hat\sigma^2 = (\sum w_i (x_i - \mu)^2) / (\sum w_i)
469
<DT><U>Statistics:</U> double <B>gsl_stats_wsd_with_fixed_mean</B> <I>(const double <VAR>w</VAR>[], size_t <VAR>wstride</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, const double <VAR>mean</VAR>)</I>
470
<DD><A NAME="IDX1707"></A>
471
The standard deviation is defined as the square root of the variance.
472
This function returns the square root of the corresponding variance
479
<DT><U>Statistics:</U> double <B>gsl_stats_wabsdev</B> <I>(const double <VAR>w</VAR>[], size_t <VAR>wstride</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
480
<DD><A NAME="IDX1708"></A>
481
This function computes the weighted absolute deviation from the weighted
482
mean of <VAR>data</VAR>. The absolute deviation from the mean is defined as,
486
<PRE class="example">
487
absdev = (\sum w_i |x_i - \Hat\mu|) / (\sum w_i)
494
<DT><U>Statistics:</U> double <B>gsl_stats_wabsdev_m</B> <I>(const double <VAR>w</VAR>[], size_t <VAR>wstride</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, double <VAR>wmean</VAR>)</I>
495
<DD><A NAME="IDX1709"></A>
496
This function computes the absolute deviation of the weighted dataset
497
<VAR>data</VAR> about the given weighted mean <VAR>wmean</VAR>.
503
<DT><U>Statistics:</U> double <B>gsl_stats_wskew</B> <I>(const double <VAR>w</VAR>[], size_t <VAR>wstride</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
504
<DD><A NAME="IDX1710"></A>
505
This function computes the weighted skewness of the dataset <VAR>data</VAR>.
509
<PRE class="example">
510
skew = (\sum w_i ((x_i - xbar)/\sigma)^3) / (\sum w_i)
517
<DT><U>Statistics:</U> double <B>gsl_stats_wskew_m_sd</B> <I>(const double <VAR>w</VAR>[], size_t <VAR>wstride</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, double <VAR>wmean</VAR>, double <VAR>wsd</VAR>)</I>
518
<DD><A NAME="IDX1711"></A>
519
This function computes the weighted skewness of the dataset <VAR>data</VAR>
520
using the given values of the weighted mean and weighted standard
521
deviation, <VAR>wmean</VAR> and <VAR>wsd</VAR>.
527
<DT><U>Statistics:</U> double <B>gsl_stats_wkurtosis</B> <I>(const double <VAR>w</VAR>[], size_t <VAR>wstride</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
528
<DD><A NAME="IDX1712"></A>
529
This function computes the weighted kurtosis of the dataset <VAR>data</VAR>.
531
<PRE class="example">
532
kurtosis = ((\sum w_i ((x_i - xbar)/sigma)^4) / (\sum w_i)) - 3
539
<DT><U>Statistics:</U> double <B>gsl_stats_wkurtosis_m_sd</B> <I>(const double <VAR>w</VAR>[], size_t <VAR>wstride</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, double <VAR>wmean</VAR>, double <VAR>wsd</VAR>)</I>
540
<DD><A NAME="IDX1713"></A>
541
This function computes the weighted kurtosis of the dataset <VAR>data</VAR>
542
using the given values of the weighted mean and weighted standard
543
deviation, <VAR>wmean</VAR> and <VAR>wsd</VAR>.
549
<H2><A NAME="SEC334" HREF="gsl-ref_toc.html#TOC334">Maximum and Minimum values</A></H2>
553
<DT><U>Statistics:</U> double <B>gsl_stats_max</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
554
<DD><A NAME="IDX1714"></A>
555
This function returns the maximum value in <VAR>data</VAR>, a dataset of
556
length <VAR>n</VAR> with stride <VAR>stride</VAR>. The maximum value is defined
557
as the value of the element x_i which satisfies
558
x_i >= x_j for all j.
562
If you want instead to find the element with the largest absolute
563
magnitude you will need to apply <CODE>fabs</CODE> or <CODE>abs</CODE> to your data
564
before calling this function.
570
<DT><U>Statistics:</U> double <B>gsl_stats_min</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
571
<DD><A NAME="IDX1715"></A>
572
This function returns the minimum value in <VAR>data</VAR>, a dataset of
573
length <VAR>n</VAR> with stride <VAR>stride</VAR>. The minimum value is defined
574
as the value of the element x_i which satisfies
575
x_i <= x_j for all j.
579
If you want instead to find the element with the smallest absolute
580
magnitude you will need to apply <CODE>fabs</CODE> or <CODE>abs</CODE> to your data
581
before calling this function.
587
<DT><U>Statistics:</U> void <B>gsl_stats_minmax</B> <I>(double * <VAR>min</VAR>, double * <VAR>max</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
588
<DD><A NAME="IDX1716"></A>
589
This function finds both the minimum and maximum values <VAR>min</VAR>,
590
<VAR>max</VAR> in <VAR>data</VAR> in a single pass.
596
<DT><U>Statistics:</U> size_t <B>gsl_stats_max_index</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
597
<DD><A NAME="IDX1717"></A>
598
This function returns the index of the maximum value in <VAR>data</VAR>, a
599
dataset of length <VAR>n</VAR> with stride <VAR>stride</VAR>. The maximum value is
600
defined as the value of the element x_i which satisfies
601
x_i >= x_j for all j. When there are several equal maximum
602
elements then the first one is chosen.
608
<DT><U>Statistics:</U> size_t <B>gsl_stats_min_index</B> <I>(const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
609
<DD><A NAME="IDX1718"></A>
610
This function returns the index of the minimum value in <VAR>data</VAR>, a
611
dataset of length <VAR>n</VAR> with stride <VAR>stride</VAR>. The minimum value
612
is defined as the value of the element x_i which satisfies
613
x_i >= x_j for all j. When there are several equal
614
minimum elements then the first one is chosen.
620
<DT><U>Statistics:</U> void <B>gsl_stats_minmax_index</B> <I>(size_t * <VAR>min_index</VAR>, size_t * <VAR>max_index</VAR>, const double <VAR>data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
621
<DD><A NAME="IDX1719"></A>
622
This function returns the indexes <VAR>min_index</VAR>, <VAR>max_index</VAR> of
623
the minimum and maximum values in <VAR>data</VAR> in a single pass.
629
<H2><A NAME="SEC335" HREF="gsl-ref_toc.html#TOC335">Median and Percentiles</A></H2>
632
The median and percentile functions described in this section operate on
633
sorted data. For convenience we use <I>quantiles</I>, measured on a scale
634
of 0 to 1, instead of percentiles (which use a scale of 0 to 100).
639
<DT><U>Statistics:</U> double <B>gsl_stats_median_from_sorted_data</B> <I>(const double <VAR>sorted_data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>)</I>
640
<DD><A NAME="IDX1720"></A>
641
This function returns the median value of <VAR>sorted_data</VAR>, a dataset
642
of length <VAR>n</VAR> with stride <VAR>stride</VAR>. The elements of the array
643
must be in ascending numerical order. There are no checks to see
644
whether the data are sorted, so the function <CODE>gsl_sort</CODE> should
645
always be used first.
649
When the dataset has an odd number of elements the median is the value
650
of element (n-1)/2. When the dataset has an even number of
651
elements the median is the mean of the two nearest middle values,
652
elements (n-1)/2 and n/2. Since the algorithm for
653
computing the median involves interpolation this function always returns
654
a floating-point number, even for integer data types.
660
<DT><U>Statistics:</U> double <B>gsl_stats_quantile_from_sorted_data</B> <I>(const double <VAR>sorted_data</VAR>[], size_t <VAR>stride</VAR>, size_t <VAR>n</VAR>, double <VAR>f</VAR>)</I>
661
<DD><A NAME="IDX1721"></A>
662
This function returns a quantile value of <VAR>sorted_data</VAR>, a
663
double-precision array of length <VAR>n</VAR> with stride <VAR>stride</VAR>. The
664
elements of the array must be in ascending numerical order. The
665
quantile is determined by the <VAR>f</VAR>, a fraction between 0 and 1. For
666
example, to compute the value of the 75th percentile <VAR>f</VAR> should have
671
There are no checks to see whether the data are sorted, so the function
672
<CODE>gsl_sort</CODE> should always be used first.
676
The quantile is found by interpolation, using the formula
680
<PRE class="example">
681
quantile = (1 - \delta) x_i + \delta x_{i+1}
685
where i is <CODE>floor</CODE>((n - 1)f) and \delta is
690
Thus the minimum value of the array (<CODE>data[0*stride]</CODE>) is given by
691
<VAR>f</VAR> equal to zero, the maximum value (<CODE>data[(n-1)*stride]</CODE>) is
692
given by <VAR>f</VAR> equal to one and the median value is given by <VAR>f</VAR>
693
equal to 0.5. Since the algorithm for computing quantiles involves
694
interpolation this function always returns a floating-point number, even
695
for integer data types.
702
<H2><A NAME="SEC336" HREF="gsl-ref_toc.html#TOC336">Examples</A></H2>
704
Here is a basic example of how to use the statistical functions:
708
<PRE class="example">
709
#include <stdio.h>
710
#include <gsl/gsl_statistics.h>
715
double data[5] = {17.2, 18.1, 16.5, 18.3, 12.6};
716
double mean, variance, largest, smallest;
718
mean = gsl_stats_mean(data, 1, 5);
719
variance = gsl_stats_variance(data, 1, 5);
720
largest = gsl_stats_max(data, 1, 5);
721
smallest = gsl_stats_min(data, 1, 5);
723
printf ("The dataset is %g, %g, %g, %g, %g\n",
724
data[0], data[1], data[2], data[3], data[4]);
726
printf ("The sample mean is %g\n", mean);
727
printf ("The estimated variance is %g\n", variance);
728
printf ("The largest value is %g\n", largest);
729
printf ("The smallest value is %g\n", smallest);
735
The program should produce the following output,
739
<PRE class="example">
740
The dataset is 17.2, 18.1, 16.5, 18.3, 12.6
741
The sample mean is 16.54
742
The estimated variance is 4.2984
743
The largest value is 18.3
744
The smallest value is 12.6
748
Here is an example using sorted data,
752
<PRE class="example">
753
#include <stdio.h>
754
#include <gsl/gsl_sort.h>
755
#include <gsl/gsl_statistics.h>
760
double data[5] = {17.2, 18.1, 16.5, 18.3, 12.6};
761
double median, upperq, lowerq;
763
printf ("Original dataset: %g, %g, %g, %g, %g\n",
764
data[0], data[1], data[2], data[3], data[4]);
766
gsl_sort (data, 1, 5);
768
printf ("Sorted dataset: %g, %g, %g, %g, %g\n",
769
data[0], data[1], data[2], data[3], data[4]);
772
= gsl_stats_median_from_sorted_data (data,
776
= gsl_stats_quantile_from_sorted_data (data,
780
= gsl_stats_quantile_from_sorted_data (data,
784
printf ("The median is %g\n", median);
785
printf ("The upper quartile is %g\n", upperq);
786
printf ("The lower quartile is %g\n", lowerq);
792
This program should produce the following output,
796
<PRE class="example">
797
Original dataset: 17.2, 18.1, 16.5, 18.3, 12.6
798
Sorted dataset: 12.6, 16.5, 17.2, 18.1, 18.3
800
The upper quartile is 18.1
801
The lower quartile is 16.5
806
<H2><A NAME="SEC337" HREF="gsl-ref_toc.html#TOC337">References and Further Reading</A></H2>
809
The standard reference for almost any topic in statistics is the
810
multi-volume <CITE>Advanced Theory of Statistics</CITE> by Kendall and Stuart.
817
Maurice Kendall, Alan Stuart, and J. Keith Ord.
818
<CITE>The Advanced Theory of Statistics</CITE> (multiple volumes)
819
reprinted as <CITE>Kendall's Advanced Theory of Statistics</CITE>.
820
Wiley, ISBN 047023380X.
824
Many statistical concepts can be more easily understood by a Bayesian
825
approach. The following book by Gelman, Carlin, Stern and Rubin gives a
826
comprehensive coverage of the subject.
833
Andrew Gelman, John B. Carlin, Hal S. Stern, Donald B. Rubin.
834
<CITE>Bayesian Data Analysis</CITE>.
835
Chapman & Hall, ISBN 0412039915.
839
For physicists the Particle Data Group provides useful reviews of
840
Probability and Statistics in the "Mathematical Tools" section of its
841
Annual Review of Particle Physics.
848
<CITE>Review of Particle Properties</CITE>
849
R.M. Barnett et al., Physical Review D54, 1 (1996)
853
The Review of Particle Physics is available online at
854
<A HREF="http://pdg.lbl.gov/">http://pdg.lbl.gov/</A>.
859
<p>Go to the <A HREF="gsl-ref_1.html">first</A>, <A HREF="gsl-ref_19.html">previous</A>, <A HREF="gsl-ref_21.html">next</A>, <A HREF="gsl-ref_50.html">last</A> section, <A HREF="gsl-ref_toc.html">table of contents</A>.