4
# Copyright (c) 1999-2007 Gary Strangman; All Rights Reserved.
6
# Permission is hereby granted, free of charge, to any person obtaining a copy
7
# of this software and associated documentation files (the "Software"), to deal
8
# in the Software without restriction, including without limitation the rights
9
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10
# copies of the Software, and to permit persons to whom the Software is
11
# furnished to do so, subject to the following conditions:
13
# The above copyright notice and this permission notice shall be included in
14
# all copies or substantial portions of the Software.
16
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
24
# Comments and/or additions are welcome (send e-mail to:
25
# strang@nmr.mgh.harvard.edu).
30
(Requires pstat.py module.)
32
#################################################
33
####### Written by: Gary Strangman ###########
34
####### Last modified: Dec 18, 2007 ###########
35
#################################################
37
A collection of basic statistical functions for python. The function
40
IMPORTANT: There are really *3* sets of functions. The first set has an 'l'
41
prefix, which can be used with list or tuple arguments. The second set has
42
an 'a' prefix, which can accept NumPy array arguments. These latter
43
functions are defined only when NumPy is available on the system. The third
44
type has NO prefix (i.e., has the name that appears below). Functions of
45
this set are members of a "Dispatch" class, c/o David Ascher. This class
46
allows different functions to be called depending on the type of the passed
47
arguments. Thus, stats.mean is a member of the Dispatch class and
48
stats.mean(range(20)) will call stats.lmean(range(20)) while
49
stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)).
50
This is a handy way to keep consistent function names when different
51
argument types require different functions to be called. Having
52
implementated the Dispatch class, however, means that to get info on
53
a given function, you must use the REAL function name ... that is
54
"print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine,
55
while "print stats.mean.__doc__" will print the doc for the Dispatch
56
class. NUMPY FUNCTIONS ('a' prefix) generally have more argument options
57
but should otherwise be consistent with the corresponding list functions.
59
Disclaimers: The function list is obviously incomplete and, worse, the
60
functions are not optimized. All functions have been tested (some more
61
so than others), but they are far from bulletproof. Thus, as with any
62
free software, no warranty or guarantee is expressed or implied. :-) A
63
few extra functions that don't appear in the list below can be found by
64
interested treasure-hunters. These functions don't necessarily have
65
both list and array versions but were deemed useful.
67
CENTRAL TENDENCY: geometricmean
78
skewtest (for Numpy arrays only)
79
kurtosistest (for Numpy arrays only)
80
normaltest (for Numpy arrays only)
82
ALTERED VERSIONS: tmean (for Numpy arrays only)
83
tvar (for Numpy arrays only)
84
tmin (for Numpy arrays only)
85
tmax (for Numpy arrays only)
86
tstdev (for Numpy arrays only)
87
tsem (for Numpy arrays only)
90
FREQUENCY STATS: itemfreq
97
VARIABILITY: obrientransform
100
signaltonoise (for Numpy arrays only)
107
zmap (for Numpy arrays only)
109
TRIMMING FCNS: threshold (for Numpy arrays only)
112
round (round all vals to 'n' decimals; Numpy only)
114
CORRELATION FCNS: covariance (for Numpy arrays only)
115
correlation (for Numpy arrays only)
123
INFERENTIAL STATS: ttest_1samp
134
PROBABILITY CALCS: chisqprob
143
ANOVA FUNCTIONS: F_oneway
146
SUPPORT FUNCTIONS: writecc
148
sign (for Numpy arrays only)
162
## 07-11.26 ... conversion for numpy started
163
## 07-05-16 ... added Lin's Concordance Correlation Coefficient (alincc) and acov
164
## 05-08-21 ... added "Dice's coefficient"
165
## 04-10-26 ... added ap2t(), an ugly fcn for converting p-vals to T-vals
166
## 04-04-03 ... added amasslinregress() function to do regression on N-D arrays
167
## 03-01-03 ... CHANGED VERSION TO 0.6
168
## fixed atsem() to properly handle limits=None case
169
## improved histogram and median functions (estbinwidth) and
170
## fixed atvar() function (wrong answers for neg numbers?!?)
171
## 02-11-19 ... fixed attest_ind and attest_rel for div-by-zero Overflows
172
## 02-05-10 ... fixed lchisqprob indentation (failed when df=even)
173
## 00-12-28 ... removed aanova() to separate module, fixed licensing to
174
## match Python License, fixed doc string & imports
175
## 00-04-13 ... pulled all "global" statements, except from aanova()
176
## added/fixed lots of documentation, removed io.py dependency
177
## changed to version 0.5
178
## 99-11-13 ... added asign() function
179
## 99-11-01 ... changed version to 0.4 ... enough incremental changes now
180
## 99-10-25 ... added acovariance and acorrelation functions
181
## 99-10-10 ... fixed askew/akurtosis to avoid divide-by-zero errors
182
## added aglm function (crude, but will be improved)
183
## 99-10-04 ... upgraded acumsum, ass, asummult, asamplevar, avar, etc. to
184
## all handle lists of 'dimension's and keepdims
185
## REMOVED ar0, ar2, ar3, ar4 and replaced them with around
186
## reinserted fixes for abetai to avoid math overflows
187
## 99-09-05 ... rewrote achisqprob/aerfcc/aksprob/afprob/abetacf/abetai to
188
## handle multi-dimensional arrays (whew!)
189
## 99-08-30 ... fixed l/amoment, l/askew, l/akurtosis per D'Agostino (1990)
190
## added anormaltest per same reference
191
## re-wrote azprob to calc arrays of probs all at once
192
## 99-08-22 ... edited attest_ind printing section so arrays could be rounded
193
## 99-08-19 ... fixed amean and aharmonicmean for non-error(!) overflow on
194
## short/byte arrays (mean of #s btw 100-300 = -150??)
195
## 99-08-09 ... fixed asum so that the None case works for Byte arrays
196
## 99-08-08 ... fixed 7/3 'improvement' to handle t-calcs on N-D arrays
197
## 99-07-03 ... improved attest_ind, attest_rel (zero-division errortrap)
198
## 99-06-24 ... fixed bug(?) in attest_ind (n1=a.shape[0])
199
## 04/11/99 ... added asignaltonoise, athreshold functions, changed all
200
## max/min in array section to N.maximum/N.minimum,
201
## fixed square_of_sums to prevent integer overflow
202
## 04/10/99 ... !!! Changed function name ... sumsquared ==> square_of_sums
203
## 03/18/99 ... Added ar0, ar2, ar3 and ar4 rounding functions
204
## 02/28/99 ... Fixed aobrientransform to return an array rather than a list
205
## 01/15/99 ... Essentially ceased updating list-versions of functions (!!!)
206
## 01/13/99 ... CHANGED TO VERSION 0.3
207
## fixed bug in a/lmannwhitneyu p-value calculation
208
## 12/31/98 ... fixed variable-name bug in ldescribe
209
## 12/19/98 ... fixed bug in findwithin (fcns needed pstat. prefix)
210
## 12/16/98 ... changed amedianscore to return float (not array) for 1 score
211
## 12/14/98 ... added atmin and atmax functions
212
## removed umath from import line (not needed)
213
## l/ageometricmean modified to reduce chance of overflows (take
214
## nth root first, then multiply)
215
## 12/07/98 ... added __version__variable (now 0.2)
216
## removed all 'stats.' from anova() fcn
217
## 12/06/98 ... changed those functions (except shellsort) that altered
218
## arguments in-place ... cumsum, ranksort, ...
219
## updated (and fixed some) doc-strings
220
## 12/01/98 ... added anova() function (requires NumPy)
221
## incorporated Dispatch class
222
## 11/12/98 ... added functionality to amean, aharmonicmean, ageometricmean
223
## added 'asum' function (added functionality to N.add.reduce)
224
## fixed both moment and amoment (two errors)
225
## changed name of skewness and askewness to skew and askew
226
## fixed (a)histogram (which sometimes counted points <lowerlimit)
228
import pstat # required 3rd party module
229
import math, string, copy # required python modules
234
############# DISPATCH CODE ##############
239
The Dispatch class, care of David Ascher, allows different functions to
240
be called depending on the argument types. This way, there can be one
241
function name regardless of the argument type. To access function doc
242
in stats.py module, prefix the function with an 'l' or 'a' for list or
243
array arguments, respectively. That is, print stats.lmean.__doc__ or
244
print stats.amean.__doc__ or whatever.
246
def __init__(self, *tuples):
248
for func, types in tuples:
250
if t in self._dispatch.keys():
251
raise ValueError, "can't have two dispatches on "+str(t)
252
self._dispatch[t] = func
253
self._types = self._dispatch.keys()
255
def __call__(self, arg1, *args, **kw):
256
if type(arg1) not in self._types:
257
raise TypeError, "don't know how to dispatch %s arguments" % type(arg1)
258
return apply(self._dispatch[type(arg1)], (arg1,) + args, kw)
261
##########################################################################
262
######################## LIST-BASED FUNCTIONS ########################
263
##########################################################################
265
### Define these regardless
267
####################################
268
####### CENTRAL TENDENCY #########
269
####################################
271
def lgeometricmean(inlist):
273
Calculates the geometric mean of the values in the passed list.
274
That is: n-th root of (x1 * x2 * ... * xn). Assumes a '1D' list.
276
Usage: lgeometricmean(inlist)
279
one_over_n = 1.0/len(inlist)
281
mult = mult * pow(item,one_over_n)
285
def lharmonicmean(inlist):
287
Calculates the harmonic mean of the values in the passed list.
288
That is: n / (1/x1 + 1/x2 + ... + 1/xn). Assumes a '1D' list.
290
Usage: lharmonicmean(inlist)
295
return len(inlist) / sum
300
Returns the arithematic mean of the values in the passed list.
301
Assumes a '1D' list, but will function on the 1st dim of an array(!).
308
return sum/float(len(inlist))
311
def lmedian(inlist,numbins=1000):
313
Returns the computed median value of a list of numbers, given the
314
number of bins to use for the histogram (more bins brings the computed value
315
closer to the median score, default number of bins = 1000). See G.W.
316
Heiman's Basic Stats (1st Edition), or CRC Probability & Statistics.
318
Usage: lmedian (inlist, numbins=1000)
320
(hist, smallest, binsize, extras) = histogram(inlist,numbins,[min(inlist),max(inlist)]) # make histog
321
cumhist = cumsum(hist) # make cumulative histogram
322
for i in range(len(cumhist)): # get 1st(!) index holding 50%ile score
323
if cumhist[i]>=len(inlist)/2.0:
326
LRL = smallest + binsize*cfbin # get lower read limit of that bin
327
cfbelow = cumhist[cfbin-1]
328
freq = float(hist[cfbin]) # frequency IN the 50%ile bin
329
median = LRL + ((len(inlist)/2.0 - cfbelow)/float(freq))*binsize # median formula
333
def lmedianscore(inlist):
335
Returns the 'middle' score of the passed list. If there is an even
336
number of scores, the mean of the 2 middle scores is returned.
338
Usage: lmedianscore(inlist)
340
newlist = copy.deepcopy(inlist)
342
if len(newlist) % 2 == 0: # if even number of scores, average middle 2
343
index = len(newlist)/2
344
# integer division correct
345
median = float(newlist[index] + newlist[index-1]) /2
347
index = len(newlist)/2
348
# int divsion gives mid value when count from 0
349
median = newlist[index]
355
Returns a list of the modal (most common) score(s) in the passed
356
list. If there is more than one such score, all are returned. The
357
bin-count for the mode(s) is also returned.
360
Returns: bin-count for mode(s), a list of modal value(s)
362
scores = pstat.unique(inlist)
366
freq.append(inlist.count(item))
372
indx = freq.index(maxfreq)
373
mode.append(scores[indx])
381
####################################
382
############ MOMENTS #############
383
####################################
385
def lmoment(inlist,moment=1):
387
Calculates the nth moment about the mean for a sample (defaults to
388
the 1st moment). Used to calculate coefficients of skewness and kurtosis.
390
Usage: lmoment(inlist,moment=1)
391
Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r)
400
s = s + (x-mn)**moment
404
def lvariation(inlist):
406
Returns the coefficient of variation, as defined in CRC Standard
407
Probability and Statistics, p.6.
409
Usage: lvariation(inlist)
411
return 100.0*samplestdev(inlist)/float(mean(inlist))
416
Returns the skewness of a distribution, as defined in Numerical
417
Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
422
x = moment(inlist,3)/pow(moment(inlist,2),1.5)
428
def lkurtosis(inlist):
430
Returns the kurtosis of a distribution, as defined in Numerical
431
Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
433
Usage: lkurtosis(inlist)
436
x = moment(inlist,4)/pow(moment(inlist,2),2.0)
442
def ldescribe(inlist):
444
Returns some descriptive statistics of the passed list (assumed to be 1D).
446
Usage: ldescribe(inlist)
447
Returns: n, (min,max), mean, standard deviation, skew, kurtosis
450
mm = (min(inlist),max(inlist))
454
kurt = kurtosis(inlist)
455
# return n, mm, m, sd, sk, kurt
456
return n, mm[0], mm[1], m, sd, sk, kurt
459
####################################
460
####### FREQUENCY STATS ##########
461
####################################
463
def litemfreq(inlist):
465
Returns a list of pairs. Each pair consists of one of the scores in inlist
466
and it's frequency count. Assumes a 1D list is passed.
468
Usage: litemfreq(inlist)
469
Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
471
scores = pstat.unique(inlist)
475
freq.append(inlist.count(item))
476
return pstat.abut(scores, freq)
479
def lscoreatpercentile(inlist, percent):
481
Returns the score at a given percentile relative to the distribution
484
Usage: lscoreatpercentile(inlist,percent)
487
print "\nDividing percent>1 by 100 in lscoreatpercentile().\n"
488
percent = percent / 100.0
489
targetcf = percent*len(inlist)
490
h, lrl, binsize, extras = histogram(inlist)
491
cumhist = cumsum(copy.deepcopy(h))
492
for i in range(len(cumhist)):
493
if cumhist[i] >= targetcf:
495
score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)
499
def lpercentileofscore(inlist, score,histbins=10,defaultlimits=None):
501
Returns the percentile value of a score relative to the distribution
502
given by inlist. Formula depends on the values used to histogram the data(!).
504
Usage: lpercentileofscore(inlist,score,histbins=10,defaultlimits=None)
506
h, lrl, binsize, extras = histogram(inlist,histbins,defaultlimits)
507
cumhist = cumsum(copy.deepcopy(h))
508
i = int((score - lrl)/float(binsize))
509
pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inlist)) * 100
513
def lhistogram(inlist,numbins=10,defaultreallimits=None,printextras=0):
515
Returns (i) a list of histogram bin counts, (ii) the smallest value
516
of the histogram binning, and (iii) the bin width (the last 2 are not
517
necessarily integers). Default number of bins is 10. If no sequence object
518
is given for defaultreallimits, the routine picks (usually non-pretty) bins
519
spanning all the numbers in the inlist.
521
Usage: lhistogram (inlist, numbins=10, defaultreallimits=None,suppressoutput=0)
522
Returns: list of bin values, lowerreallimit, binsize, extrapoints
524
if (defaultreallimits <> None):
525
if type(defaultreallimits) not in [ListType,TupleType] or len(defaultreallimits)==1: # only one limit given, assumed to be lower one & upper is calc'd
526
lowerreallimit = defaultreallimits
527
upperreallimit = 1.000001 * max(inlist)
528
else: # assume both limits given
529
lowerreallimit = defaultreallimits[0]
530
upperreallimit = defaultreallimits[1]
531
binsize = (upperreallimit-lowerreallimit)/float(numbins)
532
else: # no limits given for histogram, both must be calc'd
533
estbinwidth=(max(inlist)-min(inlist))/float(numbins) +1e-6 #1=>cover all
534
binsize = ((max(inlist)-min(inlist)+estbinwidth))/float(numbins)
535
lowerreallimit = min(inlist) - binsize/2 #lower real limit,1st bin
540
if (num-lowerreallimit) < 0:
541
extrapoints = extrapoints + 1
543
bintoincrement = int((num-lowerreallimit)/float(binsize))
544
bins[bintoincrement] = bins[bintoincrement] + 1
546
extrapoints = extrapoints + 1
547
if (extrapoints > 0 and printextras == 1):
548
print '\nPoints outside given histogram range =',extrapoints
549
return (bins, lowerreallimit, binsize, extrapoints)
552
def lcumfreq(inlist,numbins=10,defaultreallimits=None):
554
Returns a cumulative frequency histogram, using the histogram function.
556
Usage: lcumfreq(inlist,numbins=10,defaultreallimits=None)
557
Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
559
h,l,b,e = histogram(inlist,numbins,defaultreallimits)
560
cumhist = cumsum(copy.deepcopy(h))
564
def lrelfreq(inlist,numbins=10,defaultreallimits=None):
566
Returns a relative frequency histogram, using the histogram function.
568
Usage: lrelfreq(inlist,numbins=10,defaultreallimits=None)
569
Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
571
h,l,b,e = histogram(inlist,numbins,defaultreallimits)
572
for i in range(len(h)):
573
h[i] = h[i]/float(len(inlist))
577
####################################
578
##### VARIABILITY FUNCTIONS ######
579
####################################
581
def lobrientransform(*args):
583
Computes a transform on input data (any number of columns). Used to
584
test for homogeneity of variance prior to running one-way stats. From
585
Maxwell and Delaney, p.112.
587
Usage: lobrientransform(*args)
588
Returns: transformed data for use in an ANOVA
597
nargs.append(copy.deepcopy(args[i]))
598
n[i] = float(len(nargs[i]))
600
m[i] = mean(nargs[i])
602
for i in range(n[j]):
603
t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2
604
t2 = 0.5*v[j]*(n[j]-1.0)
605
t3 = (n[j]-1.0)*(n[j]-2.0)
606
nargs[j][i] = (t1-t2) / float(t3)
609
if v[j] - mean(nargs[j]) > TINY:
612
raise ValueError, 'Problem in obrientransform.'
617
def lsamplevar(inlist):
619
Returns the variance of the values in the passed list using
620
N for the denominator (i.e., DESCRIBES the sample variance only).
622
Usage: lsamplevar(inlist)
628
deviations.append(item-mn)
629
return ss(deviations)/float(n)
632
def lsamplestdev(inlist):
634
Returns the standard deviation of the values in the passed list using
635
N for the denominator (i.e., DESCRIBES the sample stdev only).
637
Usage: lsamplestdev(inlist)
639
return math.sqrt(samplevar(inlist))
642
def lcov(x,y, keepdims=0):
644
Returns the estimated covariance of the values in the passed
645
array (i.e., N-1). Dimension can equal None (ravel array first), an
646
integer (the dimension over which to operate), or a sequence (operate
647
over multiple dimensions). Set keepdims=1 to return an array with the
648
same number of dimensions as inarray.
650
Usage: lcov(x,y,keepdims=0)
655
xdeviations = [0]*len(x)
656
ydeviations = [0]*len(y)
657
for i in range(len(x)):
658
xdeviations[i] = x[i] - xmn
659
ydeviations[i] = y[i] - ymn
661
for i in range(len(xdeviations)):
662
ss = ss + xdeviations[i]*ydeviations[i]
668
Returns the variance of the values in the passed list using N-1
669
for the denominator (i.e., for estimating population variance).
675
deviations = [0]*len(inlist)
676
for i in range(len(inlist)):
677
deviations[i] = inlist[i] - mn
679
x = ss(deviations)/float(n-1)
687
Returns the standard deviation of the values in the passed list
688
using N-1 in the denominator (i.e., to estimate population stdev).
690
Usage: lstdev(inlist)
692
return math.sqrt(var(inlist))
697
Returns the standard error of the values in the passed list using N-1
698
in the denominator (i.e., to estimate population standard error).
700
Usage: lsterr(inlist)
702
return stdev(inlist) / float(math.sqrt(len(inlist)))
707
Returns the estimated standard error of the mean (sx-bar) of the
708
values in the passed list. sem = stdev / sqrt(n)
714
return sd/math.sqrt(n)
717
def lz(inlist, score):
719
Returns the z-score for a given input score, given that score and the
720
list from which that score came. Not appropriate for population calculations.
722
Usage: lz(inlist, score)
724
z = (score-mean(inlist))/samplestdev(inlist)
730
Returns a list of z-scores, one for each score in the passed list.
736
zscores.append(z(inlist,item))
740
####################################
741
####### TRIMMING FUNCTIONS #######
742
####################################
744
def ltrimboth(l,proportiontocut):
746
Slices off the passed proportion of items from BOTH ends of the passed
747
list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost'
748
10% of scores. Assumes list is sorted by magnitude. Slices off LESS if
749
proportion results in a non-integer slice index (i.e., conservatively
750
slices off proportiontocut).
752
Usage: ltrimboth (l,proportiontocut)
753
Returns: trimmed version of list l
755
lowercut = int(proportiontocut*len(l))
756
uppercut = len(l) - lowercut
757
return l[lowercut:uppercut]
760
def ltrim1(l,proportiontocut,tail='right'):
762
Slices off the passed proportion of items from ONE end of the passed
763
list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
764
10% of scores). Slices off LESS if proportion results in a non-integer
765
slice index (i.e., conservatively slices off proportiontocut).
767
Usage: ltrim1 (l,proportiontocut,tail='right') or set tail='left'
768
Returns: trimmed version of list l
772
uppercut = len(l) - int(proportiontocut*len(l))
774
lowercut = int(proportiontocut*len(l))
776
return l[lowercut:uppercut]
779
####################################
780
##### CORRELATION FUNCTIONS ######
781
####################################
785
Interactively determines the type of data and then runs the
786
appropriated statistic for paired group data.
789
Returns: appropriate statistic name, value, and probability
792
while samples not in ['i','r','I','R','c','C']:
793
print '\nIndependent or related samples, or correlation (i,r,c): ',
794
samples = raw_input()
796
if samples in ['i','I','r','R']:
797
print '\nComparing variances ...',
798
# USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112
799
r = obrientransform(x,y)
800
f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1))
802
vartype='unequal, p='+str(round(p,4))
806
if samples in ['i','I']:
808
t,p = ttest_ind(x,y,0)
809
print '\nIndependent samples t-test: ', round(t,4),round(p,4)
811
if len(x)>20 or len(y)>20:
813
print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4)
815
u,p = mannwhitneyu(x,y)
816
print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4)
818
else: # RELATED SAMPLES
820
t,p = ttest_rel(x,y,0)
821
print '\nRelated samples t-test: ', round(t,4),round(p,4)
824
print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4)
825
else: # CORRELATION ANALYSIS
827
while corrtype not in ['c','C','r','R','d','D']:
828
print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
829
corrtype = raw_input()
830
if corrtype in ['c','C']:
831
m,b,r,p,see = linregress(x,y)
832
print '\nLinear regression for continuous variables ...'
833
lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]]
835
elif corrtype in ['r','R']:
837
print '\nCorrelation for ranked variables ...'
838
print "Spearman's r: ",round(r,4),round(p,4)
840
r,p = pointbiserialr(x,y)
841
print '\nAssuming x contains a dichotomous variable ...'
842
print 'Point Biserial r: ',round(r,4),round(p,4)
849
Calculates a Pearson correlation coefficient and the associated
850
probability value. Taken from Heiman's Basic Statistics for the Behav.
853
Usage: lpearsonr(x,y) where x and y are equal-length lists
854
Returns: Pearson's r value, two-tailed p-value
858
raise ValueError, 'Input values not paired in pearsonr. Aborting.'
864
r_num = n*(summult(x,y)) - sum(x)*sum(y)
865
r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y)))
866
r = (r_num / r_den) # denominator already a float
868
t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
869
prob = betai(0.5*df,0.5,df/float(df+t*t))
875
Calculates Lin's concordance correlation coefficient.
877
Usage: alincc(x,y) where x, y are equal-length arrays
880
covar = lcov(x,y)*(len(x)-1)/float(len(x)) # correct denom to n
881
xvar = lvar(x)*(len(x)-1)/float(len(x)) # correct denom to n
882
yvar = lvar(y)*(len(y)-1)/float(len(y)) # correct denom to n
883
lincc = (2 * covar) / ((xvar+yvar) +((amean(x)-amean(y))**2))
889
Calculates a Spearman rank-order correlation coefficient. Taken
890
from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
892
Usage: lspearmanr(x,y) where x and y are equal-length lists
893
Returns: Spearman's r, two-tailed p-value
897
raise ValueError, 'Input values not paired in spearmanr. Aborting.'
901
dsq = sumdiffsquared(rankx,ranky)
902
rs = 1 - 6*dsq / float(n*(n**2-1))
903
t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
905
probrs = betai(0.5*df,0.5,df/(df+t*t))
906
''' t already a float
907
probability values for rs are from part 2 of the spearman function in
908
Numerical Recipies, p.510. They are close to tables, but not exact. (?)'''
912
def lpointbiserialr(x,y):
914
Calculates a point-biserial correlation coefficient and the associated
915
probability value. Taken from Heiman's Basic Statistics for the Behav.
918
Usage: lpointbiserialr(x,y) where x,y are equal-length lists
919
Returns: Point-biserial r, two-tailed p-value
923
raise ValueError, 'INPUT VALUES NOT PAIRED IN pointbiserialr. ABORTING.'
924
data = pstat.abut(x,y)
925
categories = pstat.unique(x)
926
if len(categories) <> 2:
927
raise ValueError, "Exactly 2 categories required for pointbiserialr()."
928
else: # there are 2 categories, continue
929
codemap = pstat.abut(categories,range(2))
930
recoded = pstat.recode(data,codemap,0)
931
x = pstat.linexand(data,0,categories[0])
932
y = pstat.linexand(data,0,categories[1])
933
xmean = mean(pstat.colex(x,1))
934
ymean = mean(pstat.colex(y,1))
936
adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n)))
937
rpb = (ymean - xmean)/samplestdev(pstat.colex(data,1))*adjust
939
t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY)))
940
prob = betai(0.5*df,0.5,df/(df+t*t)) # t already a float
944
def lkendalltau(x,y):
946
Calculates Kendall's tau ... correlation of ordinal data. Adapted
947
from function kendl1 in Numerical Recipies. Needs good test-routine.@@@
949
Usage: lkendalltau(x,y)
950
Returns: Kendall's tau, two-tailed p-value
955
for j in range(len(x)-1):
956
for k in range(j,len(y)):
960
if (aa): # neither list has a tie
972
tau = iss / math.sqrt(n1*n2)
973
svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1))
974
z = tau / math.sqrt(svar)
975
prob = erfcc(abs(z)/1.4142136)
979
def llinregress(x,y):
981
Calculates a regression line on x,y pairs.
983
Usage: llinregress(x,y) x,y are equal-length lists of x-y coordinates
984
Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate
988
raise ValueError, 'Input values not paired in linregress. Aborting.'
994
r_num = float(n*(summult(x,y)) - sum(x)*sum(y))
995
r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y)))
997
z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY))
999
t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
1000
prob = betai(0.5*df,0.5,df/(df+t*t))
1001
slope = r_num / float(n*ss(x) - square_of_sums(x))
1002
intercept = ymean - slope*xmean
1003
sterrest = math.sqrt(1-r*r)*samplestdev(y)
1004
return slope, intercept, r, prob, sterrest
1007
####################################
1008
##### INFERENTIAL STATISTICS #####
1009
####################################
1011
def lttest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
1013
Calculates the t-obtained for the independent samples T-test on ONE group
1014
of scores a, given a population mean. If printit=1, results are printed
1015
to the screen. If printit='filename', the results are output to 'filename'
1016
using the given writemode (default=append). Returns t-value, and prob.
1018
Usage: lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
1019
Returns: t-value, two-tailed prob
1025
svar = ((n-1)*v)/float(df)
1026
t = (x-popmean)/math.sqrt(svar*(1.0/n))
1027
prob = betai(0.5*df,0.5,float(df)/(df+t*t))
1030
statname = 'Single-sample T-test.'
1031
outputpairedstats(printit,writemode,
1032
'Population','--',popmean,0,0,0,
1033
name,n,x,v,min(a),max(a),
1038
def lttest_ind(a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'):
1040
Calculates the t-obtained T-test on TWO INDEPENDENT samples of
1041
scores a, and b. From Numerical Recipies, p.483. If printit=1, results
1042
are printed to the screen. If printit='filename', the results are output
1043
to 'filename' using the given writemode (default=append). Returns t-value,
1046
Usage: lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a')
1047
Returns: t-value, two-tailed prob
1056
svar = ((n1-1)*v1+(n2-1)*v2)/float(df)
1057
t = (x1-x2)/math.sqrt(svar*(1.0/n1 + 1.0/n2))
1058
prob = betai(0.5*df,0.5,df/(df+t*t))
1061
statname = 'Independent samples T-test.'
1062
outputpairedstats(printit,writemode,
1063
name1,n1,x1,v1,min(a),max(a),
1064
name2,n2,x2,v2,min(b),max(b),
1069
def lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a'):
1071
Calculates the t-obtained T-test on TWO RELATED samples of scores,
1072
a and b. From Numerical Recipies, p.483. If printit=1, results are
1073
printed to the screen. If printit='filename', the results are output to
1074
'filename' using the given writemode (default=append). Returns t-value,
1077
Usage: lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a')
1078
Returns: t-value, two-tailed prob
1081
raise ValueError, 'Unequal length lists in ttest_rel.'
1088
for i in range(len(a)):
1089
cov = cov + (a[i]-x1) * (b[i]-x2)
1091
cov = cov / float(df)
1092
sd = math.sqrt((v1+v2 - 2.0*cov)/float(n))
1094
prob = betai(0.5*df,0.5,df/(df+t*t))
1097
statname = 'Related samples T-test.'
1098
outputpairedstats(printit,writemode,
1099
name1,n,x1,v1,min(a),max(a),
1100
name2,n,x2,v2,min(b),max(b),
1105
def lchisquare(f_obs,f_exp=None):
1107
Calculates a one-way chi square for list of observed frequencies and returns
1108
the result. If no expected frequencies are given, the total N is assumed to
1109
be equally distributed across all groups.
1111
Usage: lchisquare(f_obs, f_exp=None) f_obs = list of observed cell freq.
1112
Returns: chisquare-statistic, associated p-value
1114
k = len(f_obs) # number of groups
1116
f_exp = [sum(f_obs)/float(k)] * len(f_obs) # create k bins with = freq.
1118
for i in range(len(f_obs)):
1119
chisq = chisq + (f_obs[i]-f_exp[i])**2 / float(f_exp[i])
1120
return chisq, chisqprob(chisq, k-1)
1123
def lks_2samp(data1,data2):
1125
Computes the Kolmogorov-Smirnof statistic on 2 samples. From
1126
Numerical Recipies in C, page 493.
1128
Usage: lks_2samp(data1,data2) data1&2 are lists of values for 2 conditions
1129
Returns: KS D-value, associated p-value
1142
while j1 < n1 and j2 < n2:
1146
fn1 = (j1)/float(en1)
1149
fn2 = (j2)/float(en2)
1152
if math.fabs(dt) > math.fabs(d):
1155
en = math.sqrt(en1*en2/float(en1+en2))
1156
prob = ksprob((en+0.12+0.11/en)*abs(d))
1162
def lmannwhitneyu(x,y):
1164
Calculates a Mann-Whitney U statistic on the provided scores and
1165
returns the result. Use only when the n in each condition is < 20 and
1166
you have 2 independent samples of ranks. NOTE: Mann-Whitney U is
1167
significant if the u-obtained is LESS THAN or equal to the critical
1168
value of U found in the tables. Equivalent to Kruskal-Wallis H with
1171
Usage: lmannwhitneyu(data)
1172
Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
1176
ranked = rankdata(x+y)
1177
rankx = ranked[0:n1] # get the x-ranks
1178
ranky = ranked[n1:] # the rest are y-ranks
1179
u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx) # calc U for x
1180
u2 = n1*n2 - u1 # remainder is U for y
1183
T = math.sqrt(tiecorrect(ranked)) # correction factor for tied scores
1185
raise ValueError, 'All numbers are identical in lmannwhitneyu'
1186
sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0)
1187
z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc
1188
return smallu, 1.0 - zprob(z)
1191
def ltiecorrect(rankvals):
1193
Corrects for ties in Mann Whitney U and Kruskal Wallis H tests. See
1194
Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences.
1195
New York: McGraw-Hill. Code adapted from |Stat rankind.c code.
1197
Usage: ltiecorrect(rankvals)
1198
Returns: T correction factor for U or H
1200
sorted,posn = shellsort(rankvals)
1205
if sorted[i] == sorted[i+1]:
1207
while (i<n-1) and (sorted[i] == sorted[i+1]):
1210
T = T + nties**3 - nties
1212
T = T / float(n**3-n)
1218
Calculates the rank sums statistic on the provided scores and
1219
returns the result. Use only when the n in each condition is > 20 and you
1220
have 2 independent samples of ranks.
1222
Usage: lranksums(x,y)
1223
Returns: a z-statistic, two-tailed p-value
1228
ranked = rankdata(alldata)
1232
expected = n1*(n1+n2+1) / 2.0
1233
z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0)
1234
prob = 2*(1.0 -zprob(abs(z)))
1238
def lwilcoxont(x,y):
1240
Calculates the Wilcoxon T-test for related samples and returns the
1241
result. A non-parametric T-test.
1243
Usage: lwilcoxont(x,y)
1244
Returns: a t-statistic, two-tail probability estimate
1246
if len(x) <> len(y):
1247
raise ValueError, 'Unequal N in wilcoxont. Aborting.'
1249
for i in range(len(x)):
1255
absranked = rankdata(absd)
1258
for i in range(len(absd)):
1260
r_minus = r_minus + absranked[i]
1262
r_plus = r_plus + absranked[i]
1263
wt = min(r_plus, r_minus)
1264
mn = count * (count+1) * 0.25
1265
se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0)
1266
z = math.fabs(wt-mn) / se
1267
prob = 2*(1.0 -zprob(abs(z)))
1271
def lkruskalwallish(*args):
1273
The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
1274
groups, requiring at least 5 subjects in each group. This function
1275
calculates the Kruskal-Wallis H-test for 3 or more independent samples
1276
and returns the result.
1278
Usage: lkruskalwallish(*args)
1279
Returns: H-statistic (corrected for ties), associated p-value
1285
for i in range(len(args)):
1287
ranked = rankdata(all)
1288
T = tiecorrect(ranked)
1289
for i in range(len(args)):
1290
args[i] = ranked[0:n[i]]
1293
for i in range(len(args)):
1294
rsums.append(sum(args[i])**2)
1295
rsums[i] = rsums[i] / float(n[i])
1298
h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1)
1301
raise ValueError, 'All numbers are identical in lkruskalwallish'
1303
return h, chisqprob(h,df)
1306
def lfriedmanchisquare(*args):
1308
Friedman Chi-Square is a non-parametric, one-way within-subjects
1309
ANOVA. This function calculates the Friedman Chi-square test for repeated
1310
measures and returns the result, along with the associated probability
1311
value. It assumes 3 or more repeated measures. Only 3 levels requires a
1312
minimum of 10 subjects in the study. Four levels requires 5 subjects per
1315
Usage: lfriedmanchisquare(*args)
1316
Returns: chi-square statistic, associated p-value
1320
raise ValueError, 'Less than 3 levels. Friedman test not appropriate.'
1322
data = apply(pstat.abut,tuple(args))
1323
for i in range(len(data)):
1324
data[i] = rankdata(data[i])
1327
ssbn = ssbn + sum(args[i])**2
1328
chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1)
1329
return chisq, chisqprob(chisq,k-1)
1332
####################################
1333
#### PROBABILITY CALCULATIONS ####
1334
####################################
1336
def lchisqprob(chisq,df):
1338
Returns the (1-tailed) probability value associated with the provided
1339
chi-square value and df. Adapted from chisq.c in Gary Perlman's |Stat.
1341
Usage: lchisqprob(chisq,df)
1351
if chisq <=0 or df < 1:
1363
s = 2.0 * zprob(-math.sqrt(chisq))
1365
chisq = 0.5 * (df - 1.0)
1374
e = math.log(math.sqrt(math.pi))
1385
e = 1.0 / math.sqrt(math.pi) / math.sqrt(a)
1388
e = e * (a/float(z))
1398
Returns the complementary error function erfc(x) with fractional
1399
error everywhere less than 1.2e-7. Adapted from Numerical Recipies.
1404
t = 1.0 / (1.0+0.5*z)
1405
ans = t * math.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277)))))))))
1414
Returns the area under the normal curve 'to the left of' the given z value.
1416
for z<0, zprob(z) = 1-tail probability
1417
for z>0, 1.0-zprob(z) = 1-tail probability
1418
for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
1419
Adapted from z.c in Gary Perlman's |Stat.
1423
Z_MAX = 6.0 # maximum meaningful z-value
1427
y = 0.5 * math.fabs(z)
1428
if y >= (Z_MAX*0.5):
1432
x = ((((((((0.000124818987 * w
1433
-0.001075204047) * w +0.005198775019) * w
1434
-0.019198292004) * w +0.059054035642) * w
1435
-0.151968751364) * w +0.319152932694) * w
1436
-0.531923007300) * w +0.797884560593) * y * 2.0
1439
x = (((((((((((((-0.000045255659 * y
1440
+0.000152529290) * y -0.000019538132) * y
1441
-0.000676904986) * y +0.001390604284) * y
1442
-0.000794620820) * y -0.002034254874) * y
1443
+0.006549791214) * y -0.010557625006) * y
1444
+0.011630447319) * y -0.009279453341) * y
1445
+0.005353579108) * y -0.002141268741) * y
1446
+0.000535310849) * y +0.999936657524
1448
prob = ((x+1.0)*0.5)
1450
prob = ((1.0-x)*0.5)
1456
Computes a Kolmolgorov-Smirnov t-test significance level. Adapted from
1459
Usage: lksprob(alam)
1465
for j in range(1,201):
1466
term = fac*math.exp(a2*j*j)
1468
if math.fabs(term) <= (0.001*termbf) or math.fabs(term) < (1.0e-8*sum):
1471
termbf = math.fabs(term)
1472
return 1.0 # Get here only if fails to converge; was 0.0!!
1475
def lfprob(dfnum, dfden, F):
1477
Returns the (1-tailed) significance level (p-value) of an F
1478
statistic given the degrees of freedom for the numerator (dfR-dfF) and
1479
the degrees of freedom for the denominator (dfF).
1481
Usage: lfprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn
1483
p = betai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F))
1489
This function evaluates the continued fraction form of the incomplete
1490
Beta function, betai. (Adapted from: Numerical Recipies in C.)
1492
Usage: lbetacf(a,b,x)
1502
for i in range(ITMAX+1):
1505
d = em*(b-em)*x/((qam+tem)*(a+tem))
1508
d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem))
1516
if (abs(az-aold)<(EPS*abs(az))):
1518
print 'a or b too big, or ITMAX too small in Betacf.'
1523
Returns the gamma function of xx.
1524
Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
1525
(Adapted from: Numerical Recipies in C.)
1529
coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
1530
0.120858003e-2, -0.536382e-5]
1533
tmp = tmp - (x+0.5)*math.log(tmp)
1535
for j in range(len(coeff)):
1537
ser = ser + coeff[j]/x
1538
return -tmp + math.log(2.50662827465*ser)
1543
Returns the incomplete beta function:
1545
I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
1547
where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
1548
function of a. The continued fraction formulation is implemented here,
1549
using the betacf function. (Adapted from: Numerical Recipies in C.)
1551
Usage: lbetai(a,b,x)
1553
if (x<0.0 or x>1.0):
1554
raise ValueError, 'Bad x in lbetai'
1555
if (x==0.0 or x==1.0):
1558
bt = math.exp(gammln(a+b)-gammln(a)-gammln(b)+a*math.log(x)+b*
1560
if (x<(a+1.0)/(a+b+2.0)):
1561
return bt*betacf(a,b,x)/float(a)
1563
return 1.0-bt*betacf(b,a,1.0-x)/float(b)
1566
####################################
1567
####### ANOVA CALCULATIONS #######
1568
####################################
1570
def lF_oneway(*lists):
1572
Performs a 1-way ANOVA, returning an F-value and probability given
1573
any number of groups. From Heiman, pp.394-7.
1575
Usage: F_oneway(*lists) where *lists is any number of lists, one per
1577
Returns: F value, one-tailed p-value
1579
a = len(lists) # ANOVA on 'a' groups, each in it's own list
1584
tmp = map(N.array,lists)
1585
means = map(amean,tmp)
1586
vars = map(avar,tmp)
1588
for i in range(len(lists)):
1589
alldata = alldata + lists[i]
1590
alldata = N.array(alldata)
1592
sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign))
1595
ssbn = ssbn + asquare_of_sums(N.array(list))/float(len(list))
1596
ssbn = ssbn - (asquare_of_sums(alldata)/float(bign))
1600
msb = ssbn/float(dfbn)
1601
msw = sswn/float(dfwn)
1603
prob = fprob(dfbn,dfwn,f)
1607
def lF_value(ER,EF,dfnum,dfden):
1609
Returns an F-statistic given the following:
1610
ER = error associated with the null hypothesis (the Restricted model)
1611
EF = error associated with the alternate hypothesis (the Full model)
1612
dfR-dfF = degrees of freedom of the numerator
1613
dfF = degrees of freedom associated with the denominator/Full model
1615
Usage: lF_value(ER,EF,dfnum,dfden)
1617
return ((ER-EF)/float(dfnum) / (EF/float(dfden)))
1620
####################################
1621
######## SUPPORT FUNCTIONS #######
1622
####################################
1624
def writecc(listoflists,file,writetype='w',extra=2):
1626
Writes a list of lists to a file in columns, customized by the max
1627
size of items within the columns (max size of items in col, +2 characters)
1628
to specified file. File-overwrite is the default.
1630
Usage: writecc (listoflists,file,writetype='w',extra=2)
1633
if type(listoflists[0]) not in [ListType,TupleType]:
1634
listoflists = [listoflists]
1635
outfile = open(file,writetype)
1637
list2print = copy.deepcopy(listoflists)
1638
for i in range(len(listoflists)):
1639
if listoflists[i] == ['\n'] or listoflists[i]=='\n' or listoflists[i]=='dashes':
1640
rowstokill = rowstokill + [i]
1641
rowstokill.reverse()
1642
for row in rowstokill:
1644
maxsize = [0]*len(list2print[0])
1645
for col in range(len(list2print[0])):
1646
items = pstat.colex(list2print,col)
1647
items = map(pstat.makestr,items)
1648
maxsize[col] = max(map(len,items)) + extra
1649
for row in listoflists:
1650
if row == ['\n'] or row == '\n':
1652
elif row == ['dashes'] or row == 'dashes':
1653
dashes = [0]*len(maxsize)
1654
for j in range(len(maxsize)):
1655
dashes[j] = '-'*(maxsize[j]-2)
1656
outfile.write(pstat.lineincustcols(dashes,maxsize))
1658
outfile.write(pstat.lineincustcols(row,maxsize))
1664
def lincr(l,cap): # to increment a list up to a max-list of 'cap'
1666
Simulate a counting system from an n-dimensional list.
1668
Usage: lincr(l,cap) l=list to increment, cap=max values for each list pos'n
1669
Returns: next set of values for list l, OR -1 (if overflow)
1671
l[0] = l[0] + 1 # e.g., [0,0,0] --> [2,4,3] (=cap)
1672
for i in range(len(l)):
1673
if l[i] > cap[i] and i < len(l)-1: # if carryover AND not done
1676
elif l[i] > cap[i] and i == len(l)-1: # overflow past last column, must be finished
1683
Returns the sum of the items in the passed list.
1693
def lcumsum(inlist):
1695
Returns a list consisting of the cumulative sum of the items in the
1698
Usage: lcumsum(inlist)
1700
newlist = copy.deepcopy(inlist)
1701
for i in range(1,len(newlist)):
1702
newlist[i] = newlist[i] + newlist[i-1]
1708
Squares each value in the passed list, adds up these squares and
1719
def lsummult(list1,list2):
1721
Multiplies elements in list1 and list2, element by element, and
1722
returns the sum of all resulting multiplications. Must provide equal
1725
Usage: lsummult(list1,list2)
1727
if len(list1) <> len(list2):
1728
raise ValueError, "Lists not equal length in summult."
1730
for item1,item2 in pstat.abut(list1,list2):
1735
def lsumdiffsquared(x,y):
1737
Takes pairwise differences of the values in lists x and y, squares
1738
these differences, and returns the sum of these squares.
1740
Usage: lsumdiffsquared(x,y)
1741
Returns: sum[(x[i]-y[i])**2]
1744
for i in range(len(x)):
1745
sds = sds + (x[i]-y[i])**2
1749
def lsquare_of_sums(inlist):
1751
Adds the values in the passed list, squares the sum, and returns
1754
Usage: lsquare_of_sums(inlist)
1755
Returns: sum(inlist[i])**2
1761
def lshellsort(inlist):
1763
Shellsort algorithm. Sorts a 1D-list.
1765
Usage: lshellsort(inlist)
1766
Returns: sorted-inlist, sorting-index-vector (for original list)
1769
svec = copy.deepcopy(inlist)
1771
gap = n/2 # integer division needed
1773
for i in range(gap,n):
1774
for j in range(i-gap,-1,-gap):
1775
while j>=0 and svec[j]>svec[j+gap]:
1777
svec[j] = svec[j+gap]
1780
ivec[j] = ivec[j+gap]
1782
gap = gap / 2 # integer division needed
1783
# svec is now sorted inlist, and ivec has the order svec[i] = vec[ivec[i]]
1787
def lrankdata(inlist):
1789
Ranks the data in inlist, dealing with ties appropritely. Assumes
1790
a 1D inlist. Adapted from Gary Perlman's |Stat ranksort.
1792
Usage: lrankdata(inlist)
1793
Returns: a list of length equal to inlist, containing rank scores
1796
svec, ivec = shellsort(inlist)
1801
sumranks = sumranks + i
1802
dupcount = dupcount + 1
1803
if i==n-1 or svec[i] <> svec[i+1]:
1804
averank = sumranks / float(dupcount) + 1
1805
for j in range(i-dupcount+1,i+1):
1806
newlist[ivec[j]] = averank
1812
def outputpairedstats(fname,writemode,name1,n1,m1,se1,min1,max1,name2,n2,m2,se2,min2,max2,statname,stat,prob):
1814
Prints or write to a file stats for two groups, using the name, n,
1815
mean, sterr, min and max for each group, as well as the statistic name,
1816
its value, and the associated p-value.
1818
Usage: outputpairedstats(fname,writemode,
1819
name1,n1,mean1,stderr1,min1,max1,
1820
name2,n2,mean2,stderr2,min2,max2,
1824
suffix = '' # for *s after the p-value
1830
if prob < 0.001: suffix = ' ***'
1831
elif prob < 0.01: suffix = ' **'
1832
elif prob < 0.05: suffix = ' *'
1833
title = [['Name','N','Mean','SD','Min','Max']]
1834
lofl = title+[[name1,n1,round(m1,3),round(math.sqrt(se1),3),min1,max1],
1835
[name2,n2,round(m2,3),round(math.sqrt(se2),3),min2,max2]]
1836
if type(fname)<>StringType or len(fname)==0:
1843
if stat.shape == ():
1845
if prob.shape == ():
1849
print 'Test statistic = ',round(stat,3),' p = ',round(prob,3),suffix
1852
file = open(fname,writemode)
1853
file.write('\n'+statname+'\n\n')
1855
writecc(lofl,fname,'a')
1856
file = open(fname,'a')
1858
if stat.shape == ():
1860
if prob.shape == ():
1864
file.write(pstat.list2string(['\nTest statistic = ',round(stat,4),' p = ',round(prob,4),suffix,'\n\n']))
1869
def lfindwithin(data):
1871
Returns an integer representing a binary vector, where 1=within-
1872
subject factor, 0=between. Input equals the entire data 2D list (i.e.,
1873
column 0=random factor, column -1=measured values (those two are skipped).
1874
Note: input data is in |Stat format ... a list of lists ("2D list") with
1875
one row per measured value, first column=subject identifier, last column=
1876
score, one in-between column per factor (these columns contain level
1877
designations on each factor). See also stats.anova.__doc__.
1879
Usage: lfindwithin(data) data in |Stat format
1881
numfact = len(data[0])-1
1883
for col in range(1,numfact):
1884
examplelevel = pstat.unique(pstat.colex(data,col))[0]
1885
rows = pstat.linexand(data,col,examplelevel) # get 1 level of this factor
1886
factsubjs = pstat.unique(pstat.colex(rows,0))
1887
allsubjs = pstat.unique(pstat.colex(data,0))
1888
if len(factsubjs) == len(allsubjs): # fewer Ss than scores on this factor?
1889
withinvec = withinvec + (1 << col)
1893
#########################################################
1894
#########################################################
1895
####### DISPATCH LISTS AND TUPLES TO ABOVE FCNS #########
1896
#########################################################
1897
#########################################################
1899
## CENTRAL TENDENCY:
1900
geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), )
1901
harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), )
1902
mean = Dispatch ( (lmean, (ListType, TupleType)), )
1903
median = Dispatch ( (lmedian, (ListType, TupleType)), )
1904
medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), )
1905
mode = Dispatch ( (lmode, (ListType, TupleType)), )
1908
moment = Dispatch ( (lmoment, (ListType, TupleType)), )
1909
variation = Dispatch ( (lvariation, (ListType, TupleType)), )
1910
skew = Dispatch ( (lskew, (ListType, TupleType)), )
1911
kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), )
1912
describe = Dispatch ( (ldescribe, (ListType, TupleType)), )
1914
## FREQUENCY STATISTICS:
1915
itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), )
1916
scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), )
1917
percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), )
1918
histogram = Dispatch ( (lhistogram, (ListType, TupleType)), )
1919
cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), )
1920
relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), )
1923
obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), )
1924
samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), )
1925
samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), )
1926
var = Dispatch ( (lvar, (ListType, TupleType)), )
1927
stdev = Dispatch ( (lstdev, (ListType, TupleType)), )
1928
sterr = Dispatch ( (lsterr, (ListType, TupleType)), )
1929
sem = Dispatch ( (lsem, (ListType, TupleType)), )
1930
z = Dispatch ( (lz, (ListType, TupleType)), )
1931
zs = Dispatch ( (lzs, (ListType, TupleType)), )
1934
trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), )
1935
trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), )
1937
## CORRELATION FCNS:
1938
paired = Dispatch ( (lpaired, (ListType, TupleType)), )
1939
pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), )
1940
spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), )
1941
pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), )
1942
kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), )
1943
linregress = Dispatch ( (llinregress, (ListType, TupleType)), )
1945
## INFERENTIAL STATS:
1946
ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), )
1947
ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), )
1948
ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), )
1949
chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), )
1950
ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), )
1951
mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), )
1952
ranksums = Dispatch ( (lranksums, (ListType, TupleType)), )
1953
tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), )
1954
wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), )
1955
kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), )
1956
friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), )
1958
## PROBABILITY CALCS:
1959
chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), )
1960
zprob = Dispatch ( (lzprob, (IntType, FloatType)), )
1961
ksprob = Dispatch ( (lksprob, (IntType, FloatType)), )
1962
fprob = Dispatch ( (lfprob, (IntType, FloatType)), )
1963
betacf = Dispatch ( (lbetacf, (IntType, FloatType)), )
1964
betai = Dispatch ( (lbetai, (IntType, FloatType)), )
1965
erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), )
1966
gammln = Dispatch ( (lgammln, (IntType, FloatType)), )
1969
F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), )
1970
F_value = Dispatch ( (lF_value, (ListType, TupleType)), )
1972
## SUPPORT FUNCTIONS:
1973
incr = Dispatch ( (lincr, (ListType, TupleType)), )
1974
sum = Dispatch ( (lsum, (ListType, TupleType)), )
1975
cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), )
1976
ss = Dispatch ( (lss, (ListType, TupleType)), )
1977
summult = Dispatch ( (lsummult, (ListType, TupleType)), )
1978
square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), )
1979
sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), )
1980
shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), )
1981
rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), )
1982
findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), )
1985
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1986
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1987
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1988
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1989
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1990
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1991
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1992
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1993
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1994
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1995
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1996
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1997
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1998
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1999
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
2000
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
2001
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
2002
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
2003
#============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
2005
try: # DEFINE THESE *ONLY* IF NUMERIC IS AVAILABLE
2007
import numpy.linalg as LA
2010
#####################################
2011
######## ACENTRAL TENDENCY ########
2012
#####################################
2014
def ageometricmean(inarray,dimension=None,keepdims=0):
2016
Calculates the geometric mean of the values in the passed array.
2017
That is: n-th root of (x1 * x2 * ... * xn). Defaults to ALL values in
2018
the passed array. Use dimension=None to flatten array first. REMEMBER: if
2019
dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2020
if dimension is a sequence, it collapses over all specified dimensions. If
2021
keepdims is set to 1, the resulting array will have as many dimensions as
2022
inarray, with only 1 'level' per dim that was collapsed over.
2024
Usage: ageometricmean(inarray,dimension=None,keepdims=0)
2025
Returns: geometric mean computed over dim(s) listed in dimension
2027
inarray = N.array(inarray,N.float_)
2028
if dimension == None:
2029
inarray = N.ravel(inarray)
2031
mult = N.power(inarray,1.0/size)
2032
mult = N.multiply.reduce(mult)
2033
elif type(dimension) in [IntType,FloatType]:
2034
size = inarray.shape[dimension]
2035
mult = N.power(inarray,1.0/size)
2036
mult = N.multiply.reduce(mult,dimension)
2038
shp = list(inarray.shape)
2040
sum = N.reshape(sum,shp)
2041
else: # must be a SEQUENCE of dims to average over
2042
dims = list(dimension)
2045
size = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.float_)
2046
mult = N.power(inarray,1.0/size)
2048
mult = N.multiply.reduce(mult,dim)
2050
shp = list(inarray.shape)
2053
mult = N.reshape(mult,shp)
2057
def aharmonicmean(inarray,dimension=None,keepdims=0):
2059
Calculates the harmonic mean of the values in the passed array.
2060
That is: n / (1/x1 + 1/x2 + ... + 1/xn). Defaults to ALL values in
2061
the passed array. Use dimension=None to flatten array first. REMEMBER: if
2062
dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2063
if dimension is a sequence, it collapses over all specified dimensions. If
2064
keepdims is set to 1, the resulting array will have as many dimensions as
2065
inarray, with only 1 'level' per dim that was collapsed over.
2067
Usage: aharmonicmean(inarray,dimension=None,keepdims=0)
2068
Returns: harmonic mean computed over dim(s) in dimension
2070
inarray = inarray.astype(N.float_)
2071
if dimension == None:
2072
inarray = N.ravel(inarray)
2074
s = N.add.reduce(1.0 / inarray)
2075
elif type(dimension) in [IntType,FloatType]:
2076
size = float(inarray.shape[dimension])
2077
s = N.add.reduce(1.0/inarray, dimension)
2079
shp = list(inarray.shape)
2081
s = N.reshape(s,shp)
2082
else: # must be a SEQUENCE of dims to average over
2083
dims = list(dimension)
2086
for i in range(len(inarray.shape)):
2089
tinarray = N.transpose(inarray,nondims+dims) # put keep-dims first
2090
idx = [0] *len(nondims)
2092
size = len(N.ravel(inarray))
2093
s = asum(1.0 / inarray)
2095
s = N.reshape([s],N.ones(len(inarray.shape)))
2098
loopcap = N.array(tinarray.shape[0:len(nondims)]) -1
2099
s = N.zeros(loopcap+1,N.float_)
2100
while incr(idx,loopcap) <> -1:
2101
s[idx] = asum(1.0/tinarray[idx])
2102
size = N.multiply.reduce(N.take(inarray.shape,dims))
2104
shp = list(inarray.shape)
2107
s = N.reshape(s,shp)
2111
def amean(inarray,dimension=None,keepdims=0):
2113
Calculates the arithmatic mean of the values in the passed array.
2114
That is: 1/n * (x1 + x2 + ... + xn). Defaults to ALL values in the
2115
passed array. Use dimension=None to flatten array first. REMEMBER: if
2116
dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2117
if dimension is a sequence, it collapses over all specified dimensions. If
2118
keepdims is set to 1, the resulting array will have as many dimensions as
2119
inarray, with only 1 'level' per dim that was collapsed over.
2121
Usage: amean(inarray,dimension=None,keepdims=0)
2122
Returns: arithematic mean calculated over dim(s) in dimension
2124
if inarray.dtype in [N.int_, N.short,N.ubyte]:
2125
inarray = inarray.astype(N.float_)
2126
if dimension == None:
2127
inarray = N.ravel(inarray)
2128
sum = N.add.reduce(inarray)
2129
denom = float(len(inarray))
2130
elif type(dimension) in [IntType,FloatType]:
2131
sum = asum(inarray,dimension)
2132
denom = float(inarray.shape[dimension])
2134
shp = list(inarray.shape)
2136
sum = N.reshape(sum,shp)
2137
else: # must be a TUPLE of dims to average over
2138
dims = list(dimension)
2143
sum = N.add.reduce(sum,dim)
2144
denom = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.float_)
2146
shp = list(inarray.shape)
2149
sum = N.reshape(sum,shp)
2153
def amedian(inarray,numbins=1000):
2155
Calculates the COMPUTED median value of an array of numbers, given the
2156
number of bins to use for the histogram (more bins approaches finding the
2157
precise median value of the array; default number of bins = 1000). From
2158
G.W. Heiman's Basic Stats, or CRC Probability & Statistics.
2159
NOTE: THIS ROUTINE ALWAYS uses the entire passed array (flattens it first).
2161
Usage: amedian(inarray,numbins=1000)
2162
Returns: median calculated over ALL values in inarray
2164
inarray = N.ravel(inarray)
2165
(hist, smallest, binsize, extras) = ahistogram(inarray,numbins,[min(inarray),max(inarray)])
2166
cumhist = N.cumsum(hist) # make cumulative histogram
2167
otherbins = N.greater_equal(cumhist,len(inarray)/2.0)
2168
otherbins = list(otherbins) # list of 0/1s, 1s start at median bin
2169
cfbin = otherbins.index(1) # get 1st(!) index holding 50%ile score
2170
LRL = smallest + binsize*cfbin # get lower read limit of that bin
2171
cfbelow = N.add.reduce(hist[0:cfbin]) # cum. freq. below bin
2172
freq = hist[cfbin] # frequency IN the 50%ile bin
2173
median = LRL + ((len(inarray)/2.0-cfbelow)/float(freq))*binsize # MEDIAN
2177
def amedianscore(inarray,dimension=None):
2179
Returns the 'middle' score of the passed array. If there is an even
2180
number of scores, the mean of the 2 middle scores is returned. Can function
2181
with 1D arrays, or on the FIRST dimension of 2D arrays (i.e., dimension can
2182
be None, to pre-flatten the array, or else dimension must equal 0).
2184
Usage: amedianscore(inarray,dimension=None)
2185
Returns: 'middle' score of the array, or the mean of the 2 middle scores
2187
if dimension == None:
2188
inarray = N.ravel(inarray)
2190
inarray = N.sort(inarray,dimension)
2191
if inarray.shape[dimension] % 2 == 0: # if even number of elements
2192
indx = inarray.shape[dimension]/2 # integer division correct
2193
median = N.asarray(inarray[indx]+inarray[indx-1]) / 2.0
2195
indx = inarray.shape[dimension] / 2 # integer division correct
2196
median = N.take(inarray,[indx],dimension)
2197
if median.shape == (1,):
2202
def amode(a, dimension=None):
2204
Returns an array of the modal (most common) score in the passed array.
2205
If there is more than one such score, ONLY THE FIRST is returned.
2206
The bin-count for the modal values is also returned. Operates on whole
2207
array (dimension=None), or on a given dimension.
2209
Usage: amode(a, dimension=None)
2210
Returns: array of bin-counts for mode(s), array of corresponding modal values
2212
if dimension == None:
2215
scores = pstat.aunique(N.ravel(a)) # get ALL unique values
2216
testshape = list(a.shape)
2217
testshape[dimension] = 1
2218
oldmostfreq = N.zeros(testshape)
2219
oldcounts = N.zeros(testshape)
2220
for score in scores:
2221
template = N.equal(a,score)
2222
counts = asum(template,dimension,1)
2223
mostfrequent = N.where(counts>oldcounts,score,oldmostfreq)
2224
oldcounts = N.where(counts>oldcounts,counts,oldcounts)
2225
oldmostfreq = mostfrequent
2226
return oldcounts, mostfrequent
2229
def atmean(a,limits=None,inclusive=(1,1)):
2231
Returns the arithmetic mean of all values in an array, ignoring values
2232
strictly outside the sequence passed to 'limits'. Note: either limit
2233
in the sequence, or the value of limits itself, can be set to None. The
2234
inclusive list/tuple determines whether the lower and upper limiting bounds
2235
(respectively) are open/exclusive (0) or closed/inclusive (1).
2237
Usage: atmean(a,limits=None,inclusive=(1,1))
2239
if a.dtype in [N.int_, N.short,N.ubyte]:
2240
a = a.astype(N.float_)
2243
assert type(limits) in [ListType,TupleType,N.ndarray], "Wrong type for limits in atmean"
2244
if inclusive[0]: lowerfcn = N.greater_equal
2245
else: lowerfcn = N.greater
2246
if inclusive[1]: upperfcn = N.less_equal
2247
else: upperfcn = N.less
2248
if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2249
raise ValueError, "No array values within given limits (atmean)."
2250
elif limits[0]==None and limits[1]<>None:
2251
mask = upperfcn(a,limits[1])
2252
elif limits[0]<>None and limits[1]==None:
2253
mask = lowerfcn(a,limits[0])
2254
elif limits[0]<>None and limits[1]<>None:
2255
mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2256
s = float(N.add.reduce(N.ravel(a*mask)))
2257
n = float(N.add.reduce(N.ravel(mask)))
2261
def atvar(a,limits=None,inclusive=(1,1)):
2263
Returns the sample variance of values in an array, (i.e., using N-1),
2264
ignoring values strictly outside the sequence passed to 'limits'.
2265
Note: either limit in the sequence, or the value of limits itself,
2266
can be set to None. The inclusive list/tuple determines whether the lower
2267
and upper limiting bounds (respectively) are open/exclusive (0) or
2268
closed/inclusive (1). ASSUMES A FLAT ARRAY (OR ELSE PREFLATTENS).
2270
Usage: atvar(a,limits=None,inclusive=(1,1))
2272
a = a.astype(N.float_)
2273
if limits == None or limits == [None,None]:
2275
assert type(limits) in [ListType,TupleType,N.ndarray], "Wrong type for limits in atvar"
2276
if inclusive[0]: lowerfcn = N.greater_equal
2277
else: lowerfcn = N.greater
2278
if inclusive[1]: upperfcn = N.less_equal
2279
else: upperfcn = N.less
2280
if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2281
raise ValueError, "No array values within given limits (atvar)."
2282
elif limits[0]==None and limits[1]<>None:
2283
mask = upperfcn(a,limits[1])
2284
elif limits[0]<>None and limits[1]==None:
2285
mask = lowerfcn(a,limits[0])
2286
elif limits[0]<>None and limits[1]<>None:
2287
mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2289
a = N.compress(mask,a) # squish out excluded values
2293
def atmin(a,lowerlimit=None,dimension=None,inclusive=1):
2295
Returns the minimum value of a, along dimension, including only values less
2296
than (or equal to, if inclusive=1) lowerlimit. If the limit is set to None,
2297
all values in the array are used.
2299
Usage: atmin(a,lowerlimit=None,dimension=None,inclusive=1)
2301
if inclusive: lowerfcn = N.greater
2302
else: lowerfcn = N.greater_equal
2303
if dimension == None:
2306
if lowerlimit == None:
2307
lowerlimit = N.minimum.reduce(N.ravel(a))-11
2308
biggest = N.maximum.reduce(N.ravel(a))
2309
ta = N.where(lowerfcn(a,lowerlimit),a,biggest)
2310
return N.minimum.reduce(ta,dimension)
2313
def atmax(a,upperlimit,dimension=None,inclusive=1):
2315
Returns the maximum value of a, along dimension, including only values greater
2316
than (or equal to, if inclusive=1) upperlimit. If the limit is set to None,
2317
a limit larger than the max value in the array is used.
2319
Usage: atmax(a,upperlimit,dimension=None,inclusive=1)
2321
if inclusive: upperfcn = N.less
2322
else: upperfcn = N.less_equal
2323
if dimension == None:
2326
if upperlimit == None:
2327
upperlimit = N.maximum.reduce(N.ravel(a))+1
2328
smallest = N.minimum.reduce(N.ravel(a))
2329
ta = N.where(upperfcn(a,upperlimit),a,smallest)
2330
return N.maximum.reduce(ta,dimension)
2333
def atstdev(a,limits=None,inclusive=(1,1)):
2335
Returns the standard deviation of all values in an array, ignoring values
2336
strictly outside the sequence passed to 'limits'. Note: either limit
2337
in the sequence, or the value of limits itself, can be set to None. The
2338
inclusive list/tuple determines whether the lower and upper limiting bounds
2339
(respectively) are open/exclusive (0) or closed/inclusive (1).
2341
Usage: atstdev(a,limits=None,inclusive=(1,1))
2343
return N.sqrt(tvar(a,limits,inclusive))
2346
def atsem(a,limits=None,inclusive=(1,1)):
2348
Returns the standard error of the mean for the values in an array,
2349
(i.e., using N for the denominator), ignoring values strictly outside
2350
the sequence passed to 'limits'. Note: either limit in the sequence,
2351
or the value of limits itself, can be set to None. The inclusive list/tuple
2352
determines whether the lower and upper limiting bounds (respectively) are
2353
open/exclusive (0) or closed/inclusive (1).
2355
Usage: atsem(a,limits=None,inclusive=(1,1))
2357
sd = tstdev(a,limits,inclusive)
2358
if limits == None or limits == [None,None]:
2359
n = float(len(N.ravel(a)))
2360
limits = [min(a)-1, max(a)+1]
2361
assert type(limits) in [ListType,TupleType,N.ndarray], "Wrong type for limits in atsem"
2362
if inclusive[0]: lowerfcn = N.greater_equal
2363
else: lowerfcn = N.greater
2364
if inclusive[1]: upperfcn = N.less_equal
2365
else: upperfcn = N.less
2366
if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2367
raise ValueError, "No array values within given limits (atsem)."
2368
elif limits[0]==None and limits[1]<>None:
2369
mask = upperfcn(a,limits[1])
2370
elif limits[0]<>None and limits[1]==None:
2371
mask = lowerfcn(a,limits[0])
2372
elif limits[0]<>None and limits[1]<>None:
2373
mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2374
term1 = N.add.reduce(N.ravel(a*a*mask))
2375
n = float(N.add.reduce(N.ravel(mask)))
2376
return sd/math.sqrt(n)
2379
#####################################
2380
############ AMOMENTS #############
2381
#####################################
2383
def amoment(a,moment=1,dimension=None):
2385
Calculates the nth moment about the mean for a sample (defaults to the
2386
1st moment). Generally used to calculate coefficients of skewness and
2387
kurtosis. Dimension can equal None (ravel array first), an integer
2388
(the dimension over which to operate), or a sequence (operate over
2389
multiple dimensions).
2391
Usage: amoment(a,moment=1,dimension=None)
2392
Returns: appropriate moment along given dimension
2394
if dimension == None:
2400
mn = amean(a,dimension,1) # 1=keepdims
2401
s = N.power((a-mn),moment)
2402
return amean(s,dimension)
2405
def avariation(a,dimension=None):
2407
Returns the coefficient of variation, as defined in CRC Standard
2408
Probability and Statistics, p.6. Dimension can equal None (ravel array
2409
first), an integer (the dimension over which to operate), or a
2410
sequence (operate over multiple dimensions).
2412
Usage: avariation(a,dimension=None)
2414
return 100.0*asamplestdev(a,dimension)/amean(a,dimension)
2417
def askew(a,dimension=None):
2419
Returns the skewness of a distribution (normal ==> 0.0; >0 means extra
2420
weight in left tail). Use askewtest() to see if it's close enough.
2421
Dimension can equal None (ravel array first), an integer (the
2422
dimension over which to operate), or a sequence (operate over multiple
2425
Usage: askew(a, dimension=None)
2426
Returns: skew of vals in a along dimension, returning ZERO where all vals equal
2428
denom = N.power(amoment(a,2,dimension),1.5)
2429
zero = N.equal(denom,0)
2430
if type(denom) == N.ndarray and asum(zero) <> 0:
2431
print "Number of zeros in askew: ",asum(zero)
2432
denom = denom + zero # prevent divide-by-zero
2433
return N.where(zero, 0, amoment(a,3,dimension)/denom)
2436
def akurtosis(a,dimension=None):
2438
Returns the kurtosis of a distribution (normal ==> 3.0; >3 means
2439
heavier in the tails, and usually more peaked). Use akurtosistest()
2440
to see if it's close enough. Dimension can equal None (ravel array
2441
first), an integer (the dimension over which to operate), or a
2442
sequence (operate over multiple dimensions).
2444
Usage: akurtosis(a,dimension=None)
2445
Returns: kurtosis of values in a along dimension, and ZERO where all vals equal
2447
denom = N.power(amoment(a,2,dimension),2)
2448
zero = N.equal(denom,0)
2449
if type(denom) == N.ndarray and asum(zero) <> 0:
2450
print "Number of zeros in akurtosis: ",asum(zero)
2451
denom = denom + zero # prevent divide-by-zero
2452
return N.where(zero,0,amoment(a,4,dimension)/denom)
2455
def adescribe(inarray,dimension=None):
2457
Returns several descriptive statistics of the passed array. Dimension
2458
can equal None (ravel array first), an integer (the dimension over
2459
which to operate), or a sequence (operate over multiple dimensions).
2461
Usage: adescribe(inarray,dimension=None)
2462
Returns: n, (min,max), mean, standard deviation, skew, kurtosis
2464
if dimension == None:
2465
inarray = N.ravel(inarray)
2467
n = inarray.shape[dimension]
2468
mm = (N.minimum.reduce(inarray),N.maximum.reduce(inarray))
2469
m = amean(inarray,dimension)
2470
sd = astdev(inarray,dimension)
2471
skew = askew(inarray,dimension)
2472
kurt = akurtosis(inarray,dimension)
2473
# return n, mm, m, sd, skew, kurt
2474
return n, mm[0], mm[1], m, sd, sk, kurt
2477
#####################################
2478
######## NORMALITY TESTS ##########
2479
#####################################
2481
def askewtest(a,dimension=None):
2483
Tests whether the skew is significantly different from a normal
2484
distribution. Dimension can equal None (ravel array first), an
2485
integer (the dimension over which to operate), or a sequence (operate
2486
over multiple dimensions).
2488
Usage: askewtest(a,dimension=None)
2489
Returns: z-score and 2-tail z-probability
2491
if dimension == None:
2494
b2 = askew(a,dimension)
2495
n = float(a.shape[dimension])
2496
y = b2 * N.sqrt(((n+1)*(n+3)) / (6.0*(n-2)) )
2497
beta2 = ( 3.0*(n*n+27*n-70)*(n+1)*(n+3) ) / ( (n-2.0)*(n+5)*(n+7)*(n+9) )
2499
W2 = -1 + N.sqrt(2*(beta2-1))
2500
delta = 1/N.sqrt(N.log(N.sqrt(W2)))
2501
alpha = N.sqrt(2/(W2-1))
2502
y = N.where(y==0,1,y)
2503
Z = delta*N.log(y/alpha + N.sqrt((y/alpha)**2+1))
2504
return Z, (1.0-zprob(Z))*2
2507
def akurtosistest(a,dimension=None):
2509
Tests whether a dataset has normal kurtosis (i.e.,
2510
kurtosis=3(n-1)/(n+1)) Valid only for n>20. Dimension can equal None
2511
(ravel array first), an integer (the dimension over which to operate),
2512
or a sequence (operate over multiple dimensions).
2514
Usage: akurtosistest(a,dimension=None)
2515
Returns: z-score and 2-tail z-probability, returns 0 for bad pixels
2517
if dimension == None:
2520
n = float(a.shape[dimension])
2522
print "akurtosistest only valid for n>=20 ... continuing anyway, n=",n
2523
b2 = akurtosis(a,dimension)
2524
E = 3.0*(n-1) /(n+1)
2525
varb2 = 24.0*n*(n-2)*(n-3) / ((n+1)*(n+1)*(n+3)*(n+5))
2526
x = (b2-E)/N.sqrt(varb2)
2527
sqrtbeta1 = 6.0*(n*n-5*n+2)/((n+7)*(n+9)) * N.sqrt((6.0*(n+3)*(n+5))/
2529
A = 6.0 + 8.0/sqrtbeta1 *(2.0/sqrtbeta1 + N.sqrt(1+4.0/(sqrtbeta1**2)))
2530
term1 = 1 -2/(9.0*A)
2531
denom = 1 +x*N.sqrt(2/(A-4.0))
2532
denom = N.where(N.less(denom,0), 99, denom)
2533
term2 = N.where(N.equal(denom,0), term1, N.power((1-2.0/A)/denom,1/3.0))
2534
Z = ( term1 - term2 ) / N.sqrt(2/(9.0*A))
2535
Z = N.where(N.equal(denom,99), 0, Z)
2536
return Z, (1.0-zprob(Z))*2
2539
def anormaltest(a,dimension=None):
2541
Tests whether skew and/OR kurtosis of dataset differs from normal
2542
curve. Can operate over multiple dimensions. Dimension can equal
2543
None (ravel array first), an integer (the dimension over which to
2544
operate), or a sequence (operate over multiple dimensions).
2546
Usage: anormaltest(a,dimension=None)
2547
Returns: z-score and 2-tail probability
2549
if dimension == None:
2552
s,p = askewtest(a,dimension)
2553
k,p = akurtosistest(a,dimension)
2554
k2 = N.power(s,2) + N.power(k,2)
2555
return k2, achisqprob(k2,2)
2558
#####################################
2559
###### AFREQUENCY FUNCTIONS #######
2560
#####################################
2564
Returns a 2D array of item frequencies. Column 1 contains item values,
2565
column 2 contains their respective counts. Assumes a 1D array is passed.
2569
Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
2571
scores = pstat.aunique(a)
2572
scores = N.sort(scores)
2573
freq = N.zeros(len(scores))
2574
for i in range(len(scores)):
2575
freq[i] = N.add.reduce(N.equal(a,scores[i]))
2576
return N.array(pstat.aabut(scores, freq))
2579
def ascoreatpercentile(inarray, percent):
2581
Usage: ascoreatpercentile(inarray,percent) 0<percent<100
2582
Returns: score at given percentile, relative to inarray distribution
2584
percent = percent / 100.0
2585
targetcf = percent*len(inarray)
2586
h, lrl, binsize, extras = histogram(inarray)
2587
cumhist = cumsum(h*1)
2588
for i in range(len(cumhist)):
2589
if cumhist[i] >= targetcf:
2591
score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)
2595
def apercentileofscore(inarray,score,histbins=10,defaultlimits=None):
2597
Note: result of this function depends on the values used to histogram
2600
Usage: apercentileofscore(inarray,score,histbins=10,defaultlimits=None)
2601
Returns: percentile-position of score (0-100) relative to inarray
2603
h, lrl, binsize, extras = histogram(inarray,histbins,defaultlimits)
2604
cumhist = cumsum(h*1)
2605
i = int((score - lrl)/float(binsize))
2606
pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inarray)) * 100
2610
def ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1):
2612
Returns (i) an array of histogram bin counts, (ii) the smallest value
2613
of the histogram binning, and (iii) the bin width (the last 2 are not
2614
necessarily integers). Default number of bins is 10. Defaultlimits
2615
can be None (the routine picks bins spanning all the numbers in the
2616
inarray) or a 2-sequence (lowerlimit, upperlimit). Returns all of the
2617
following: array of bin values, lowerreallimit, binsize, extrapoints.
2619
Usage: ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1)
2620
Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range)
2622
inarray = N.ravel(inarray) # flatten any >1D arrays
2623
if (defaultlimits <> None):
2624
lowerreallimit = defaultlimits[0]
2625
upperreallimit = defaultlimits[1]
2626
binsize = (upperreallimit-lowerreallimit) / float(numbins)
2628
Min = N.minimum.reduce(inarray)
2629
Max = N.maximum.reduce(inarray)
2630
estbinwidth = float(Max - Min)/float(numbins) + 1e-6
2631
binsize = (Max-Min+estbinwidth)/float(numbins)
2632
lowerreallimit = Min - binsize/2.0 #lower real limit,1st bin
2633
bins = N.zeros(numbins)
2637
if (num-lowerreallimit) < 0:
2638
extrapoints = extrapoints + 1
2640
bintoincrement = int((num-lowerreallimit) / float(binsize))
2641
bins[bintoincrement] = bins[bintoincrement] + 1
2642
except: # point outside lower/upper limits
2643
extrapoints = extrapoints + 1
2644
if (extrapoints > 0 and printextras == 1):
2645
print '\nPoints outside given histogram range =',extrapoints
2646
return (bins, lowerreallimit, binsize, extrapoints)
2649
def acumfreq(a,numbins=10,defaultreallimits=None):
2651
Returns a cumulative frequency histogram, using the histogram function.
2652
Defaultreallimits can be None (use all data), or a 2-sequence containing
2653
lower and upper limits on values to include.
2655
Usage: acumfreq(a,numbins=10,defaultreallimits=None)
2656
Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2658
h,l,b,e = histogram(a,numbins,defaultreallimits)
2659
cumhist = cumsum(h*1)
2660
return cumhist,l,b,e
2663
def arelfreq(a,numbins=10,defaultreallimits=None):
2665
Returns a relative frequency histogram, using the histogram function.
2666
Defaultreallimits can be None (use all data), or a 2-sequence containing
2667
lower and upper limits on values to include.
2669
Usage: arelfreq(a,numbins=10,defaultreallimits=None)
2670
Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2672
h,l,b,e = histogram(a,numbins,defaultreallimits)
2673
h = N.array(h/float(a.shape[0]))
2677
#####################################
2678
###### AVARIABILITY FUNCTIONS #####
2679
#####################################
2681
def aobrientransform(*args):
2683
Computes a transform on input data (any number of columns). Used to
2684
test for homogeneity of variance prior to running one-way stats. Each
2685
array in *args is one level of a factor. If an F_oneway() run on the
2686
transformed data and found significant, variances are unequal. From
2687
Maxwell and Delaney, p.112.
2689
Usage: aobrientransform(*args) *args = 1D arrays, one per level of factor
2690
Returns: transformed data for use in an ANOVA
2694
n = N.zeros(k,N.float_)
2695
v = N.zeros(k,N.float_)
2696
m = N.zeros(k,N.float_)
2699
nargs.append(args[i].astype(N.float_))
2700
n[i] = float(len(nargs[i]))
2701
v[i] = var(nargs[i])
2702
m[i] = mean(nargs[i])
2704
for i in range(n[j]):
2705
t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2
2706
t2 = 0.5*v[j]*(n[j]-1.0)
2707
t3 = (n[j]-1.0)*(n[j]-2.0)
2708
nargs[j][i] = (t1-t2) / float(t3)
2711
if v[j] - mean(nargs[j]) > TINY:
2714
raise ValueError, 'Lack of convergence in obrientransform.'
2716
return N.array(nargs)
2719
def asamplevar(inarray,dimension=None,keepdims=0):
2721
Returns the sample standard deviation of the values in the passed
2722
array (i.e., using N). Dimension can equal None (ravel array first),
2723
an integer (the dimension over which to operate), or a sequence
2724
(operate over multiple dimensions). Set keepdims=1 to return an array
2725
with the same number of dimensions as inarray.
2727
Usage: asamplevar(inarray,dimension=None,keepdims=0)
2729
if dimension == None:
2730
inarray = N.ravel(inarray)
2733
mn = amean(inarray,dimension)[:,N.NewAxis]
2735
mn = amean(inarray,dimension,keepdims=1)
2736
deviations = inarray - mn
2737
if type(dimension) == ListType:
2740
n = n*inarray.shape[d]
2742
n = inarray.shape[dimension]
2743
svar = ass(deviations,dimension,keepdims) / float(n)
2747
def asamplestdev(inarray, dimension=None, keepdims=0):
2749
Returns the sample standard deviation of the values in the passed
2750
array (i.e., using N). Dimension can equal None (ravel array first),
2751
an integer (the dimension over which to operate), or a sequence
2752
(operate over multiple dimensions). Set keepdims=1 to return an array
2753
with the same number of dimensions as inarray.
2755
Usage: asamplestdev(inarray,dimension=None,keepdims=0)
2757
return N.sqrt(asamplevar(inarray,dimension,keepdims))
2760
def asignaltonoise(instack,dimension=0):
2762
Calculates signal-to-noise. Dimension can equal None (ravel array
2763
first), an integer (the dimension over which to operate), or a
2764
sequence (operate over multiple dimensions).
2766
Usage: asignaltonoise(instack,dimension=0):
2767
Returns: array containing the value of (mean/stdev) along dimension,
2770
m = mean(instack,dimension)
2771
sd = stdev(instack,dimension)
2772
return N.where(sd==0,0,m/sd)
2775
def acov(x,y, dimension=None,keepdims=0):
2777
Returns the estimated covariance of the values in the passed
2778
array (i.e., N-1). Dimension can equal None (ravel array first), an
2779
integer (the dimension over which to operate), or a sequence (operate
2780
over multiple dimensions). Set keepdims=1 to return an array with the
2781
same number of dimensions as inarray.
2783
Usage: acov(x,y,dimension=None,keepdims=0)
2785
if dimension == None:
2789
xmn = amean(x,dimension,1) # keepdims
2790
xdeviations = x - xmn
2791
ymn = amean(y,dimension,1) # keepdims
2792
ydeviations = y - ymn
2793
if type(dimension) == ListType:
2798
n = x.shape[dimension]
2799
covar = N.sum(xdeviations*ydeviations)/float(n-1)
2803
def avar(inarray, dimension=None,keepdims=0):
2805
Returns the estimated population variance of the values in the passed
2806
array (i.e., N-1). Dimension can equal None (ravel array first), an
2807
integer (the dimension over which to operate), or a sequence (operate
2808
over multiple dimensions). Set keepdims=1 to return an array with the
2809
same number of dimensions as inarray.
2811
Usage: avar(inarray,dimension=None,keepdims=0)
2813
if dimension == None:
2814
inarray = N.ravel(inarray)
2816
mn = amean(inarray,dimension,1)
2817
deviations = inarray - mn
2818
if type(dimension) == ListType:
2821
n = n*inarray.shape[d]
2823
n = inarray.shape[dimension]
2824
var = ass(deviations,dimension,keepdims)/float(n-1)
2828
def astdev(inarray, dimension=None, keepdims=0):
2830
Returns the estimated population standard deviation of the values in
2831
the passed array (i.e., N-1). Dimension can equal None (ravel array
2832
first), an integer (the dimension over which to operate), or a
2833
sequence (operate over multiple dimensions). Set keepdims=1 to return
2834
an array with the same number of dimensions as inarray.
2836
Usage: astdev(inarray,dimension=None,keepdims=0)
2838
return N.sqrt(avar(inarray,dimension,keepdims))
2841
def asterr(inarray, dimension=None, keepdims=0):
2843
Returns the estimated population standard error of the values in the
2844
passed array (i.e., N-1). Dimension can equal None (ravel array
2845
first), an integer (the dimension over which to operate), or a
2846
sequence (operate over multiple dimensions). Set keepdims=1 to return
2847
an array with the same number of dimensions as inarray.
2849
Usage: asterr(inarray,dimension=None,keepdims=0)
2851
if dimension == None:
2852
inarray = N.ravel(inarray)
2854
return astdev(inarray,dimension,keepdims) / float(N.sqrt(inarray.shape[dimension]))
2857
def asem(inarray, dimension=None, keepdims=0):
2859
Returns the standard error of the mean (i.e., using N) of the values
2860
in the passed array. Dimension can equal None (ravel array first), an
2861
integer (the dimension over which to operate), or a sequence (operate
2862
over multiple dimensions). Set keepdims=1 to return an array with the
2863
same number of dimensions as inarray.
2865
Usage: asem(inarray,dimension=None, keepdims=0)
2867
if dimension == None:
2868
inarray = N.ravel(inarray)
2870
if type(dimension) == ListType:
2873
n = n*inarray.shape[d]
2875
n = inarray.shape[dimension]
2876
s = asamplestdev(inarray,dimension,keepdims) / N.sqrt(n-1)
2882
Returns the z-score of a given input score, given thearray from which
2883
that score came. Not appropriate for population calculations, nor for
2888
z = (score-amean(a)) / asamplestdev(a)
2894
Returns a 1D array of z-scores, one for each score in the passed array,
2895
computed relative to the passed array.
2901
zscores.append(z(a,item))
2902
return N.array(zscores)
2905
def azmap(scores, compare, dimension=0):
2907
Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to
2908
array passed to compare (e.g., [time,x,y]). Assumes collapsing over dim 0
2909
of the compare array.
2911
Usage: azs(scores, compare, dimension=0)
2913
mns = amean(compare,dimension)
2914
sstd = asamplestdev(compare,0)
2915
return (scores - mns) / sstd
2918
#####################################
2919
####### ATRIMMING FUNCTIONS #######
2920
#####################################
2922
## deleted around() as it's in numpy now
2924
def athreshold(a,threshmin=None,threshmax=None,newval=0):
2926
Like Numeric.clip() except that values <threshmid or >threshmax are replaced
2927
by newval instead of by threshmin/threshmax (respectively).
2929
Usage: athreshold(a,threshmin=None,threshmax=None,newval=0)
2930
Returns: a, with values <threshmin or >threshmax replaced with newval
2932
mask = N.zeros(a.shape)
2933
if threshmin <> None:
2934
mask = mask + N.where(a<threshmin,1,0)
2935
if threshmax <> None:
2936
mask = mask + N.where(a>threshmax,1,0)
2937
mask = N.clip(mask,0,1)
2938
return N.where(mask,newval,a)
2941
def atrimboth(a,proportiontocut):
2943
Slices off the passed proportion of items from BOTH ends of the passed
2944
array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND
2945
'rightmost' 10% of scores. You must pre-sort the array if you want
2946
"proper" trimming. Slices off LESS if proportion results in a
2947
non-integer slice index (i.e., conservatively slices off
2950
Usage: atrimboth (a,proportiontocut)
2951
Returns: trimmed version of array a
2953
lowercut = int(proportiontocut*len(a))
2954
uppercut = len(a) - lowercut
2955
return a[lowercut:uppercut]
2958
def atrim1(a,proportiontocut,tail='right'):
2960
Slices off the passed proportion of items from ONE end of the passed
2961
array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
2962
10% of scores). Slices off LESS if proportion results in a non-integer
2963
slice index (i.e., conservatively slices off proportiontocut).
2965
Usage: atrim1(a,proportiontocut,tail='right') or set tail='left'
2966
Returns: trimmed version of array a
2968
if string.lower(tail) == 'right':
2970
uppercut = len(a) - int(proportiontocut*len(a))
2971
elif string.lower(tail) == 'left':
2972
lowercut = int(proportiontocut*len(a))
2974
return a[lowercut:uppercut]
2977
#####################################
2978
##### ACORRELATION FUNCTIONS ######
2979
#####################################
2983
Computes the covariance matrix of a matrix X. Requires a 2D matrix input.
2985
Usage: acovariance(X)
2986
Returns: covariance matrix of X
2988
if len(X.shape) <> 2:
2989
raise TypeError, "acovariance requires 2D matrices"
2992
return N.dot(N.transpose(X),X) / float(n) - N.multiply.outer(mX,mX)
2995
def acorrelation(X):
2997
Computes the correlation matrix of a matrix X. Requires a 2D matrix input.
2999
Usage: acorrelation(X)
3000
Returns: correlation matrix of X
3004
return C / N.sqrt(N.multiply.outer(V,V))
3009
Interactively determines the type of data in x and y, and then runs the
3010
appropriated statistic for paired group data.
3012
Usage: apaired(x,y) x,y = the two arrays of values to be compared
3013
Returns: appropriate statistic name, value, and probability
3016
while samples not in ['i','r','I','R','c','C']:
3017
print '\nIndependent or related samples, or correlation (i,r,c): ',
3018
samples = raw_input()
3020
if samples in ['i','I','r','R']:
3021
print '\nComparing variances ...',
3022
# USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112
3023
r = obrientransform(x,y)
3024
f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1))
3026
vartype='unequal, p='+str(round(p,4))
3030
if samples in ['i','I']:
3032
t,p = ttest_ind(x,y,None,0)
3033
print '\nIndependent samples t-test: ', round(t,4),round(p,4)
3035
if len(x)>20 or len(y)>20:
3037
print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4)
3039
u,p = mannwhitneyu(x,y)
3040
print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4)
3042
else: # RELATED SAMPLES
3044
t,p = ttest_rel(x,y,0)
3045
print '\nRelated samples t-test: ', round(t,4),round(p,4)
3048
print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4)
3049
else: # CORRELATION ANALYSIS
3051
while corrtype not in ['c','C','r','R','d','D']:
3052
print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
3053
corrtype = raw_input()
3054
if corrtype in ['c','C']:
3055
m,b,r,p,see = linregress(x,y)
3056
print '\nLinear regression for continuous variables ...'
3057
lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]]
3059
elif corrtype in ['r','R']:
3060
r,p = spearmanr(x,y)
3061
print '\nCorrelation for ranked variables ...'
3062
print "Spearman's r: ",round(r,4),round(p,4)
3064
r,p = pointbiserialr(x,y)
3065
print '\nAssuming x contains a dichotomous variable ...'
3066
print 'Point Biserial r: ',round(r,4),round(p,4)
3073
Calculates Dice's coefficient ... (2*number of common terms)/(number of terms in x +
3074
number of terms in y). Returns a value between 0 (orthogonal) and 1.
3081
common = len(x.intersection(y))
3082
total = float(len(x) + len(y))
3083
return 2*common/total
3086
def icc(x,y=None,verbose=0):
3088
Calculates intraclass correlation coefficients using simple, Type I sums of squares.
3089
If only one variable is passed, assumed it's an Nx2 matrix
3091
Usage: icc(x,y=None,verbose=0)
3092
Returns: icc rho, prob ####PROB IS A GUESS BASED ON PEARSON
3096
all = N.concatenate([x,y],0)
3101
totalss = ass(all-mean(all))
3102
pairmeans = (x+y)/2.
3103
withinss = ass(x-pairmeans) + ass(y-pairmeans)
3104
withindf = float(len(x))
3105
betwdf = float(len(x)-1)
3106
withinms = withinss / withindf
3107
betweenms = (totalss-withinss) / betwdf
3108
rho = (betweenms-withinms)/(withinms+betweenms)
3109
t = rho*math.sqrt(betwdf/((1.0-rho+TINY)*(1.0+rho+TINY)))
3110
prob = abetai(0.5*betwdf,0.5,betwdf/(betwdf+t*t),verbose)
3116
Calculates Lin's concordance correlation coefficient.
3118
Usage: alincc(x,y) where x, y are equal-length arrays
3123
covar = acov(x,y)*(len(x)-1)/float(len(x)) # correct denom to n
3124
xvar = avar(x)*(len(x)-1)/float(len(x)) # correct denom to n
3125
yvar = avar(y)*(len(y)-1)/float(len(y)) # correct denom to n
3126
lincc = (2 * covar) / ((xvar+yvar) +((amean(x)-amean(y))**2))
3130
def apearsonr(x,y,verbose=1):
3132
Calculates a Pearson correlation coefficient and returns p. Taken
3133
from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195.
3135
Usage: apearsonr(x,y,verbose=1) where x,y are equal length arrays
3136
Returns: Pearson's r, two-tailed p-value
3142
r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y)
3143
r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y)))
3146
t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
3147
prob = abetai(0.5*df,0.5,df/(df+t*t),verbose)
3151
def aspearmanr(x,y):
3153
Calculates a Spearman rank-order correlation coefficient. Taken
3154
from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
3156
Usage: aspearmanr(x,y) where x,y are equal-length arrays
3157
Returns: Spearman's r, two-tailed p-value
3163
dsq = N.add.reduce((rankx-ranky)**2)
3164
rs = 1 - 6*dsq / float(n*(n**2-1))
3165
t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
3167
probrs = abetai(0.5*df,0.5,df/(df+t*t))
3168
# probability values for rs are from part 2 of the spearman function in
3169
# Numerical Recipies, p.510. They close to tables, but not exact.(?)
3173
def apointbiserialr(x,y):
3175
Calculates a point-biserial correlation coefficient and the associated
3176
probability value. Taken from Heiman's Basic Statistics for the Behav.
3179
Usage: apointbiserialr(x,y) where x,y are equal length arrays
3180
Returns: Point-biserial r, two-tailed p-value
3183
categories = pstat.aunique(x)
3184
data = pstat.aabut(x,y)
3185
if len(categories) <> 2:
3186
raise ValueError, "Exactly 2 categories required (in x) for pointbiserialr()."
3187
else: # there are 2 categories, continue
3188
codemap = pstat.aabut(categories,N.arange(2))
3189
recoded = pstat.arecode(data,codemap,0)
3190
x = pstat.alinexand(data,0,categories[0])
3191
y = pstat.alinexand(data,0,categories[1])
3192
xmean = amean(pstat.acolex(x,1))
3193
ymean = amean(pstat.acolex(y,1))
3195
adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n)))
3196
rpb = (ymean - xmean)/asamplestdev(pstat.acolex(data,1))*adjust
3198
t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY)))
3199
prob = abetai(0.5*df,0.5,df/(df+t*t))
3203
def akendalltau(x,y):
3205
Calculates Kendall's tau ... correlation of ordinal data. Adapted
3206
from function kendl1 in Numerical Recipies. Needs good test-cases.@@@
3208
Usage: akendalltau(x,y)
3209
Returns: Kendall's tau, two-tailed p-value
3214
for j in range(len(x)-1):
3215
for k in range(j,len(y)):
3219
if (aa): # neither array has a tie
3231
tau = iss / math.sqrt(n1*n2)
3232
svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1))
3233
z = tau / math.sqrt(svar)
3234
prob = erfcc(abs(z)/1.4142136)
3238
def alinregress(*args):
3240
Calculates a regression line on two arrays, x and y, corresponding to x,y
3241
pairs. If a single 2D array is passed, alinregress finds dim with 2 levels
3242
and splits data into x,y pairs along that dim.
3244
Usage: alinregress(*args) args=2 equal-length arrays, or one 2D array
3245
Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate, n
3248
if len(args) == 1: # more than 1D array?
3262
r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y)
3263
r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y)))
3265
z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY))
3267
t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
3268
prob = abetai(0.5*df,0.5,df/(df+t*t))
3269
slope = r_num / (float(n)*ass(x) - asquare_of_sums(x))
3270
intercept = ymean - slope*xmean
3271
sterrest = math.sqrt(1-r*r)*asamplestdev(y)
3272
return slope, intercept, r, prob, sterrest, n
3274
def amasslinregress(*args):
3276
Calculates a regression line on one 1D array (x) and one N-D array (y).
3278
Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate, n
3281
if len(args) == 1: # more than 1D array?
3284
x = N.ravel(args[0])
3287
x = N.ravel(args[:,0])
3292
x = x.astype(N.float_)
3293
y = y.astype(N.float_)
3297
shp = N.ones(len(y.shape))
3300
print x.shape, y.shape
3301
r_num = n*(N.add.reduce(x*y,0)) - N.add.reduce(x)*N.add.reduce(y,0)
3302
r_den = N.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y,0)-asquare_of_sums(y,0)))
3303
zerodivproblem = N.equal(r_den,0)
3304
r_den = N.where(zerodivproblem,1,r_den) # avoid zero-division in 1st place
3305
r = r_num / r_den # need to do this nicely for matrix division
3306
r = N.where(zerodivproblem,0.0,r)
3307
z = 0.5*N.log((1.0+r+TINY)/(1.0-r+TINY))
3309
t = r*N.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
3310
prob = abetai(0.5*df,0.5,df/(df+t*t))
3312
ss = float(n)*ass(x)-asquare_of_sums(x)
3313
s_den = N.where(ss==0,1,ss) # avoid zero-division in 1st place
3314
slope = r_num / s_den
3315
intercept = ymean - slope*xmean
3316
sterrest = N.sqrt(1-r*r)*asamplestdev(y,0)
3317
return slope, intercept, r, prob, sterrest, n
3320
#####################################
3321
##### AINFERENTIAL STATISTICS #####
3322
#####################################
3324
def attest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
3326
Calculates the t-obtained for the independent samples T-test on ONE group
3327
of scores a, given a population mean. If printit=1, results are printed
3328
to the screen. If printit='filename', the results are output to 'filename'
3329
using the given writemode (default=append). Returns t-value, and prob.
3331
Usage: attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
3332
Returns: t-value, two-tailed prob
3334
if type(a) != N.ndarray:
3340
svar = ((n-1)*v) / float(df)
3341
t = (x-popmean)/math.sqrt(svar*(1.0/n))
3342
prob = abetai(0.5*df,0.5,df/(df+t*t))
3345
statname = 'Single-sample T-test.'
3346
outputpairedstats(printit,writemode,
3347
'Population','--',popmean,0,0,0,
3348
name,n,x,v,N.minimum.reduce(N.ravel(a)),
3349
N.maximum.reduce(N.ravel(a)),
3354
def attest_ind(a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2',writemode='a'):
3356
Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores
3357
a, and b. From Numerical Recipies, p.483. If printit=1, results are
3358
printed to the screen. If printit='filename', the results are output
3359
to 'filename' using the given writemode (default=append). Dimension
3360
can equal None (ravel array first), or an integer (the dimension over
3361
which to operate on a and b).
3363
Usage: attest_ind (a,b,dimension=None,printit=0,
3364
Name1='Samp1',Name2='Samp2',writemode='a')
3365
Returns: t-value, two-tailed p-value
3367
if dimension == None:
3371
x1 = amean(a,dimension)
3372
x2 = amean(b,dimension)
3373
v1 = avar(a,dimension)
3374
v2 = avar(b,dimension)
3375
n1 = a.shape[dimension]
3376
n2 = b.shape[dimension]
3378
svar = ((n1-1)*v1+(n2-1)*v2) / float(df)
3379
zerodivproblem = N.equal(svar,0)
3380
svar = N.where(zerodivproblem,1,svar) # avoid zero-division in 1st place
3381
t = (x1-x2)/N.sqrt(svar*(1.0/n1 + 1.0/n2)) # N-D COMPUTATION HERE!!!!!!
3382
t = N.where(zerodivproblem,1.0,t) # replace NaN/wrong t-values with 1.0
3383
probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
3385
if type(t) == N.ndarray:
3386
probs = N.reshape(probs,t.shape)
3387
if probs.shape == (1,):
3391
if type(t) == N.ndarray:
3393
if type(probs) == N.ndarray:
3395
statname = 'Independent samples T-test.'
3396
outputpairedstats(printit,writemode,
3397
name1,n1,x1,v1,N.minimum.reduce(N.ravel(a)),
3398
N.maximum.reduce(N.ravel(a)),
3399
name2,n2,x2,v2,N.minimum.reduce(N.ravel(b)),
3400
N.maximum.reduce(N.ravel(b)),
3407
Tries to compute a t-value from a p-value (or pval array) and associated df.
3408
SLOW for large numbers of elements(!) as it re-computes p-values 20 times
3409
(smaller step-sizes) at which point it decides it's done. Keeps the signs
3410
of the input array. Returns 1000 (or -1000) if t>100.
3412
Usage: ap2t(pval,df)
3413
Returns: an array of t-values with the shape of pval
3415
pval = N.array(pval)
3416
signs = N.sign(pval)
3418
t = N.ones(pval.shape,N.float_)*50
3419
step = N.ones(pval.shape,N.float_)*25
3420
print "Initial ap2t() prob calc"
3421
prob = abetai(0.5*df,0.5,float(df)/(df+t*t))
3422
print 'ap2t() iter: ',
3425
t = N.where(pval<prob,t+step,t-step)
3426
prob = abetai(0.5*df,0.5,float(df)/(df+t*t))
3429
# since this is an ugly hack, we get ugly boundaries
3430
t = N.where(t>99.9,1000,t) # hit upper-boundary
3432
return t #, prob, pval
3435
def attest_rel(a,b,dimension=None,printit=0,name1='Samp1',name2='Samp2',writemode='a'):
3437
Calculates the t-obtained T-test on TWO RELATED samples of scores, a
3438
and b. From Numerical Recipies, p.483. If printit=1, results are
3439
printed to the screen. If printit='filename', the results are output
3440
to 'filename' using the given writemode (default=append). Dimension
3441
can equal None (ravel array first), or an integer (the dimension over
3442
which to operate on a and b).
3444
Usage: attest_rel(a,b,dimension=None,printit=0,
3445
name1='Samp1',name2='Samp2',writemode='a')
3446
Returns: t-value, two-tailed p-value
3448
if dimension == None:
3453
raise ValueError, 'Unequal length arrays.'
3454
x1 = amean(a,dimension)
3455
x2 = amean(b,dimension)
3456
v1 = avar(a,dimension)
3457
v2 = avar(b,dimension)
3458
n = a.shape[dimension]
3460
d = (a-b).astype('d')
3462
denom = N.sqrt((n*N.add.reduce(d*d,dimension) - N.add.reduce(d,dimension)**2) /df)
3463
zerodivproblem = N.equal(denom,0)
3464
denom = N.where(zerodivproblem,1,denom) # avoid zero-division in 1st place
3465
t = N.add.reduce(d,dimension) / denom # N-D COMPUTATION HERE!!!!!!
3466
t = N.where(zerodivproblem,1.0,t) # replace NaN/wrong t-values with 1.0
3467
probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
3468
if type(t) == N.ndarray:
3469
probs = N.reshape(probs,t.shape)
3470
if probs.shape == (1,):
3474
statname = 'Related samples T-test.'
3475
outputpairedstats(printit,writemode,
3476
name1,n,x1,v1,N.minimum.reduce(N.ravel(a)),
3477
N.maximum.reduce(N.ravel(a)),
3478
name2,n,x2,v2,N.minimum.reduce(N.ravel(b)),
3479
N.maximum.reduce(N.ravel(b)),
3485
def achisquare(f_obs,f_exp=None):
3487
Calculates a one-way chi square for array of observed frequencies and returns
3488
the result. If no expected frequencies are given, the total N is assumed to
3489
be equally distributed across all groups.
3492
Usage: achisquare(f_obs, f_exp=None) f_obs = array of observed cell freq.
3493
Returns: chisquare-statistic, associated p-value
3497
f_exp = N.array([sum(f_obs)/float(k)] * len(f_obs),N.float_)
3498
f_exp = f_exp.astype(N.float_)
3499
chisq = N.add.reduce((f_obs-f_exp)**2 / f_exp)
3500
return chisq, achisqprob(chisq, k-1)
3503
def aks_2samp(data1,data2):
3505
Computes the Kolmogorov-Smirnof statistic on 2 samples. Modified from
3506
Numerical Recipies in C, page 493. Returns KS D-value, prob. Not ufunc-
3509
Usage: aks_2samp(data1,data2) where data1 and data2 are 1D arrays
3510
Returns: KS D-value, p-value
3512
j1 = 0 # N.zeros(data1.shape[1:]) TRIED TO MAKE THIS UFUNC-LIKE
3513
j2 = 0 # N.zeros(data2.shape[1:])
3514
fn1 = 0.0 # N.zeros(data1.shape[1:],N.float_)
3515
fn2 = 0.0 # N.zeros(data2.shape[1:],N.float_)
3520
d = N.zeros(data1.shape[1:],N.float_)
3521
data1 = N.sort(data1,0)
3522
data2 = N.sort(data2,0)
3523
while j1 < n1 and j2 < n2:
3527
fn1 = (j1)/float(en1)
3530
fn2 = (j2)/float(en2)
3533
if abs(dt) > abs(d):
3536
en = math.sqrt(en1*en2/float(en1+en2))
3537
prob = aksprob((en+0.12+0.11/en)*N.fabs(d))
3543
def amannwhitneyu(x,y):
3545
Calculates a Mann-Whitney U statistic on the provided scores and
3546
returns the result. Use only when the n in each condition is < 20 and
3547
you have 2 independent samples of ranks. REMEMBER: Mann-Whitney U is
3548
significant if the u-obtained is LESS THAN or equal to the critical
3551
Usage: amannwhitneyu(x,y) where x,y are arrays of values for 2 conditions
3552
Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
3556
ranked = rankdata(N.concatenate((x,y)))
3557
rankx = ranked[0:n1] # get the x-ranks
3558
ranky = ranked[n1:] # the rest are y-ranks
3559
u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx) # calc U for x
3560
u2 = n1*n2 - u1 # remainder is U for y
3563
T = math.sqrt(tiecorrect(ranked)) # correction factor for tied scores
3565
raise ValueError, 'All numbers are identical in amannwhitneyu'
3566
sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0)
3567
z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc
3568
return smallu, 1.0 - azprob(z)
3571
def atiecorrect(rankvals):
3573
Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests.
3574
See Siegel, S. (1956) Nonparametric Statistics for the Behavioral
3575
Sciences. New York: McGraw-Hill. Code adapted from |Stat rankind.c
3578
Usage: atiecorrect(rankvals)
3579
Returns: T correction factor for U or H
3581
sorted,posn = ashellsort(N.array(rankvals))
3586
if sorted[i] == sorted[i+1]:
3588
while (i<n-1) and (sorted[i] == sorted[i+1]):
3591
T = T + nties**3 - nties
3593
T = T / float(n**3-n)
3599
Calculates the rank sums statistic on the provided scores and returns
3602
Usage: aranksums(x,y) where x,y are arrays of values for 2 conditions
3603
Returns: z-statistic, two-tailed p-value
3607
alldata = N.concatenate((x,y))
3608
ranked = arankdata(alldata)
3612
expected = n1*(n1+n2+1) / 2.0
3613
z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0)
3614
prob = 2*(1.0 - azprob(abs(z)))
3618
def awilcoxont(x,y):
3620
Calculates the Wilcoxon T-test for related samples and returns the
3621
result. A non-parametric T-test.
3623
Usage: awilcoxont(x,y) where x,y are equal-length arrays for 2 conditions
3624
Returns: t-statistic, two-tailed p-value
3626
if len(x) <> len(y):
3627
raise ValueError, 'Unequal N in awilcoxont. Aborting.'
3629
d = N.compress(N.not_equal(d,0),d) # Keep all non-zero differences
3632
absranked = arankdata(absd)
3635
for i in range(len(absd)):
3637
r_minus = r_minus + absranked[i]
3639
r_plus = r_plus + absranked[i]
3640
wt = min(r_plus, r_minus)
3641
mn = count * (count+1) * 0.25
3642
se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0)
3643
z = math.fabs(wt-mn) / se
3644
z = math.fabs(wt-mn) / se
3645
prob = 2*(1.0 -zprob(abs(z)))
3649
def akruskalwallish(*args):
3651
The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
3652
groups, requiring at least 5 subjects in each group. This function
3653
calculates the Kruskal-Wallis H and associated p-value for 3 or more
3654
independent samples.
3656
Usage: akruskalwallish(*args) args are separate arrays for 3+ conditions
3657
Returns: H-statistic (corrected for ties), associated p-value
3659
assert len(args) == 3, "Need at least 3 groups in stats.akruskalwallish()"
3664
for i in range(len(args)):
3665
all = all + args[i].tolist()
3666
ranked = rankdata(all)
3667
T = tiecorrect(ranked)
3668
for i in range(len(args)):
3669
args[i] = ranked[0:n[i]]
3672
for i in range(len(args)):
3673
rsums.append(sum(args[i])**2)
3674
rsums[i] = rsums[i] / float(n[i])
3677
h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1)
3680
raise ValueError, 'All numbers are identical in akruskalwallish'
3682
return h, chisqprob(h,df)
3685
def afriedmanchisquare(*args):
3687
Friedman Chi-Square is a non-parametric, one-way within-subjects
3688
ANOVA. This function calculates the Friedman Chi-square test for
3689
repeated measures and returns the result, along with the associated
3690
probability value. It assumes 3 or more repeated measures. Only 3
3691
levels requires a minimum of 10 subjects in the study. Four levels
3692
requires 5 subjects per level(??).
3694
Usage: afriedmanchisquare(*args) args are separate arrays for 2+ conditions
3695
Returns: chi-square statistic, associated p-value
3699
raise ValueError, '\nLess than 3 levels. Friedman test not appropriate.\n'
3701
data = apply(pstat.aabut,args)
3702
data = data.astype(N.float_)
3703
for i in range(len(data)):
3704
data[i] = arankdata(data[i])
3705
ssbn = asum(asum(args,1)**2)
3706
chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1)
3707
return chisq, achisqprob(chisq,k-1)
3710
#####################################
3711
#### APROBABILITY CALCULATIONS ####
3712
#####################################
3714
def achisqprob(chisq,df):
3716
Returns the (1-tail) probability value associated with the provided chi-square
3717
value and df. Heavily modified from chisq.c in Gary Perlman's |Stat. Can
3718
handle multiple dimensions.
3720
Usage: achisqprob(chisq,df) chisq=chisquare stat., df=degrees of freedom
3725
exponents = N.where(N.less(x,-BIG),-BIG,x)
3726
return N.exp(exponents)
3728
if type(chisq) == N.ndarray:
3732
chisq = N.array([chisq])
3734
return N.ones(chisq.shape,N.float)
3735
probs = N.zeros(chisq.shape,N.float_)
3736
probs = N.where(N.less_equal(chisq,0),1.0,probs) # set prob=1 for chisq<0
3746
s = 2.0 * azprob(-N.sqrt(chisq))
3749
chisq = 0.5 * (df - 1.0)
3751
z = N.ones(probs.shape,N.float_)
3753
z = 0.5 *N.ones(probs.shape,N.float_)
3755
e = N.zeros(probs.shape,N.float_)
3757
e = N.log(N.sqrt(N.pi)) *N.ones(probs.shape,N.float_)
3759
mask = N.zeros(probs.shape)
3760
a_big = N.greater(a,BIG)
3761
a_big_frozen = -1 *N.ones(probs.shape,N.float_)
3762
totalelements = N.multiply.reduce(N.array(probs.shape))
3763
while asum(mask)<>totalelements:
3768
newmask = N.greater(z,chisq)
3769
a_big_frozen = N.where(newmask*N.equal(mask,0)*a_big, s, a_big_frozen)
3770
mask = N.clip(newmask+mask,0,1)
3772
z = N.ones(probs.shape,N.float_)
3773
e = N.ones(probs.shape,N.float_)
3775
z = 0.5 *N.ones(probs.shape,N.float_)
3776
e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape,N.float_)
3778
mask = N.zeros(probs.shape)
3779
a_notbig_frozen = -1 *N.ones(probs.shape,N.float_)
3780
while asum(mask)<>totalelements:
3781
e = e * (a/z.astype(N.float_))
3784
# print '#2', z, e, c, s, c*y+s2
3785
newmask = N.greater(z,chisq)
3786
a_notbig_frozen = N.where(newmask*N.equal(mask,0)*(1-a_big),
3787
c*y+s2, a_notbig_frozen)
3788
mask = N.clip(newmask+mask,0,1)
3789
probs = N.where(N.equal(probs,1),1,
3790
N.where(N.greater(a,BIG),a_big_frozen,a_notbig_frozen))
3798
Returns the complementary error function erfc(x) with fractional error
3799
everywhere less than 1.2e-7. Adapted from Numerical Recipies. Can
3800
handle multiple dimensions.
3805
t = 1.0 / (1.0+0.5*z)
3806
ans = t * N.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277)))))))))
3807
return N.where(N.greater_equal(x,0), ans, 2.0-ans)
3812
Returns the area under the normal curve 'to the left of' the given z value.
3814
for z<0, zprob(z) = 1-tail probability
3815
for z>0, 1.0-zprob(z) = 1-tail probability
3816
for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
3817
Adapted from z.c in Gary Perlman's |Stat. Can handle multiple dimensions.
3819
Usage: azprob(z) where z is a z-value
3822
x = (((((((((((((-0.000045255659 * y
3823
+0.000152529290) * y -0.000019538132) * y
3824
-0.000676904986) * y +0.001390604284) * y
3825
-0.000794620820) * y -0.002034254874) * y
3826
+0.006549791214) * y -0.010557625006) * y
3827
+0.011630447319) * y -0.009279453341) * y
3828
+0.005353579108) * y -0.002141268741) * y
3829
+0.000535310849) * y +0.999936657524
3833
x = ((((((((0.000124818987 * w
3834
-0.001075204047) * w +0.005198775019) * w
3835
-0.019198292004) * w +0.059054035642) * w
3836
-0.151968751364) * w +0.319152932694) * w
3837
-0.531923007300) * w +0.797884560593) * N.sqrt(w) * 2.0
3840
Z_MAX = 6.0 # maximum meaningful z-value
3841
x = N.zeros(z.shape,N.float_) # initialize
3843
x = N.where(N.less(y,1.0),wfunc(y*y),yfunc(y-2.0)) # get x's
3844
x = N.where(N.greater(y,Z_MAX*0.5),1.0,x) # kill those with big Z
3845
prob = N.where(N.greater(z,0),(x+1)*0.5,(1-x)*0.5)
3851
Returns the probability value for a K-S statistic computed via ks_2samp.
3852
Adapted from Numerical Recipies. Can handle multiple dimensions.
3854
Usage: aksprob(alam)
3856
if type(alam) == N.ndarray:
3857
frozen = -1 *N.ones(alam.shape,N.float64)
3858
alam = alam.astype(N.float64)
3861
frozen = N.array(-1.)
3862
alam = N.array(alam,N.float64)
3864
mask = N.zeros(alam.shape)
3865
fac = 2.0 *N.ones(alam.shape,N.float_)
3866
sum = N.zeros(alam.shape,N.float_)
3867
termbf = N.zeros(alam.shape,N.float_)
3868
a2 = N.array(-2.0*alam*alam,N.float64)
3869
totalelements = N.multiply.reduce(N.array(mask.shape))
3870
for j in range(1,201):
3871
if asum(mask) == totalelements:
3873
exponents = (a2*j*j)
3874
overflowmask = N.less(exponents,-746)
3875
frozen = N.where(overflowmask,0,frozen)
3876
mask = mask+overflowmask
3877
term = fac*N.exp(exponents)
3879
newmask = N.where(N.less_equal(abs(term),(0.001*termbf)) +
3880
N.less(abs(term),1.0e-8*sum), 1, 0)
3881
frozen = N.where(newmask*N.equal(mask,0), sum, frozen)
3882
mask = N.clip(mask+newmask,0,1)
3886
return N.where(N.equal(frozen,-1), 1.0, frozen) # 1.0 if doesn't converge
3888
return N.where(N.equal(frozen,-1), 1.0, frozen)[0] # 1.0 if doesn't converge
3891
def afprob(dfnum, dfden, F):
3893
Returns the 1-tailed significance level (p-value) of an F statistic
3894
given the degrees of freedom for the numerator (dfR-dfF) and the degrees
3895
of freedom for the denominator (dfF). Can handle multiple dims for F.
3897
Usage: afprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn
3899
if type(F) == N.ndarray:
3900
return abetai(0.5*dfden, 0.5*dfnum, dfden/(1.0*dfden+dfnum*F))
3902
return abetai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F))
3905
def abetacf(a,b,x,verbose=1):
3907
Evaluates the continued fraction form of the incomplete Beta function,
3908
betai. (Adapted from: Numerical Recipies in C.) Can handle multiple
3911
Usage: abetacf(a,b,x,verbose=1)
3917
if type(x) == N.ndarray:
3918
frozen = N.ones(x.shape,N.float_) *-1 #start out w/ -1s, should replace all
3921
frozen = N.array([-1])
3923
mask = N.zeros(x.shape)
3929
for i in range(ITMAX+1):
3930
if N.sum(N.ravel(N.equal(frozen,-1)))==0:
3934
d = em*(b-em)*x/((qam+tem)*(a+tem))
3937
d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem))
3945
newmask = N.less(abs(az-aold),EPS*abs(az))
3946
frozen = N.where(newmask*N.equal(mask,0), az, frozen)
3947
mask = N.clip(mask+newmask,0,1)
3948
noconverge = asum(N.equal(frozen,-1))
3949
if noconverge <> 0 and verbose:
3950
print 'a or b too big, or ITMAX too small in Betacf for ',noconverge,' elements'
3959
Returns the gamma function of xx.
3960
Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
3961
Adapted from: Numerical Recipies in C. Can handle multiple dims ... but
3962
probably doesn't normally have to.
3966
coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
3967
0.120858003e-2, -0.536382e-5]
3970
tmp = tmp - (x+0.5)*N.log(tmp)
3972
for j in range(len(coeff)):
3974
ser = ser + coeff[j]/x
3975
return -tmp + N.log(2.50662827465*ser)
3978
def abetai(a,b,x,verbose=1):
3980
Returns the incomplete beta function:
3982
I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
3984
where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
3985
function of a. The continued fraction formulation is implemented
3986
here, using the betacf function. (Adapted from: Numerical Recipies in
3987
C.) Can handle multiple dimensions.
3989
Usage: abetai(a,b,x,verbose=1)
3992
if type(a) == N.ndarray:
3993
if asum(N.less(x,0)+N.greater(x,1)) <> 0:
3994
raise ValueError, 'Bad x in abetai'
3995
x = N.where(N.equal(x,0),TINY,x)
3996
x = N.where(N.equal(x,1.0),1-TINY,x)
3998
bt = N.where(N.equal(x,0)+N.equal(x,1), 0, -1)
3999
exponents = ( gammln(a+b)-gammln(a)-gammln(b)+a*N.log(x)+b*
4001
# 746 (below) is the MAX POSSIBLE BEFORE OVERFLOW
4002
exponents = N.where(N.less(exponents,-740),-740,exponents)
4003
bt = N.exp(exponents)
4004
if type(x) == N.ndarray:
4005
ans = N.where(N.less(x,(a+1)/(a+b+2.0)),
4006
bt*abetacf(a,b,x,verbose)/float(a),
4007
1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b))
4009
if x<(a+1)/(a+b+2.0):
4010
ans = bt*abetacf(a,b,x,verbose)/float(a)
4012
ans = 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b)
4016
#####################################
4017
####### AANOVA CALCULATIONS #######
4018
#####################################
4020
import LinearAlgebra, operator
4023
def aglm(data,para):
4025
Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken
4027
Peterson et al. Statistical limitations in functional neuroimaging
4028
I. Non-inferential methods and statistical models. Phil Trans Royal Soc
4029
Lond B 354: 1239-1260.
4031
Usage: aglm(data,para)
4032
Returns: statistic, p-value ???
4034
if len(para) <> len(data):
4035
print "data and para must be same length in aglm"
4038
p = pstat.aunique(para)
4039
x = N.zeros((n,len(p))) # design matrix
4040
for l in range(len(p)):
4041
x[:,l] = N.equal(para,p[l])
4042
b = N.dot(N.dot(LA.inv(N.dot(N.transpose(x),x)), # i.e., b=inv(X'X)X'Y
4045
diffs = (data - N.dot(x,b))
4046
s_sq = 1./(n-len(p)) * N.dot(N.transpose(diffs), diffs)
4048
if len(p) == 2: # ttest_ind
4051
fact = asum(1.0/asum(x,0)) # i.e., 1/n1 + 1/n2 + 1/n3 ...
4052
t = N.dot(c,b) / N.sqrt(s_sq*fact)
4053
probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
4057
def aF_oneway(*args):
4059
Performs a 1-way ANOVA, returning an F-value and probability given
4060
any number of groups. From Heiman, pp.394-7.
4062
Usage: aF_oneway (*args) where *args is 2 or more arrays, one per
4064
Returns: f-value, probability
4066
na = len(args) # ANOVA on 'na' groups, each in it's own array
4071
tmp = map(N.array,args)
4072
means = map(amean,tmp)
4073
vars = map(avar,tmp)
4075
alldata = N.concatenate(args)
4077
sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign))
4080
ssbn = ssbn + asquare_of_sums(N.array(a))/float(len(a))
4081
ssbn = ssbn - (asquare_of_sums(alldata)/float(bign))
4085
msb = ssbn/float(dfbn)
4086
msw = sswn/float(dfwn)
4088
prob = fprob(dfbn,dfwn,f)
4092
def aF_value(ER,EF,dfR,dfF):
4094
Returns an F-statistic given the following:
4095
ER = error associated with the null hypothesis (the Restricted model)
4096
EF = error associated with the alternate hypothesis (the Full model)
4097
dfR = degrees of freedom the Restricted model
4098
dfF = degrees of freedom associated with the Restricted model
4100
return ((ER-EF)/float(dfR-dfF) / (EF/float(dfF)))
4103
def outputfstats(Enum, Eden, dfnum, dfden, f, prob):
4104
Enum = round(Enum,3)
4105
Eden = round(Eden,3)
4106
dfnum = round(Enum,3)
4107
dfden = round(dfden,3)
4109
prob = round(prob,3)
4110
suffix = '' # for *s after the p-value
4111
if prob < 0.001: suffix = ' ***'
4112
elif prob < 0.01: suffix = ' **'
4113
elif prob < 0.05: suffix = ' *'
4114
title = [['EF/ER','DF','Mean Square','F-value','prob','']]
4115
lofl = title+[[Enum, dfnum, round(Enum/float(dfnum),3), f, prob, suffix],
4116
[Eden, dfden, round(Eden/float(dfden),3),'','','']]
4121
def F_value_multivariate(ER, EF, dfnum, dfden):
4123
Returns an F-statistic given the following:
4124
ER = error associated with the null hypothesis (the Restricted model)
4125
EF = error associated with the alternate hypothesis (the Full model)
4126
dfR = degrees of freedom the Restricted model
4127
dfF = degrees of freedom associated with the Restricted model
4128
where ER and EF are matrices from a multivariate F calculation.
4130
if type(ER) in [IntType, FloatType]:
4131
ER = N.array([[ER]])
4132
if type(EF) in [IntType, FloatType]:
4133
EF = N.array([[EF]])
4134
n_um = (LA.det(ER) - LA.det(EF)) / float(dfnum)
4135
d_en = LA.det(EF) / float(dfden)
4139
#####################################
4140
####### ASUPPORT FUNCTIONS ########
4141
#####################################
4146
Returns: array shape of a, with -1 where a<0 and +1 where a>=0
4149
if ((type(a) == type(1.4)) or (type(a) == type(1))):
4150
return a-a-N.less(a,0)+N.greater(a,0)
4152
return N.zeros(N.shape(a))-N.less(a,0)+N.greater(a,0)
4155
def asum(a, dimension=None,keepdims=0):
4157
An alternative to the Numeric.add.reduce function, which allows one to
4158
(1) collapse over multiple dimensions at once, and/or (2) to retain
4159
all dimensions in the original array (squashing one down to size.
4160
Dimension can equal None (ravel array first), an integer (the
4161
dimension over which to operate), or a sequence (operate over multiple
4162
dimensions). If keepdims=1, the resulting array will have as many
4163
dimensions as the input array.
4165
Usage: asum(a, dimension=None, keepdims=0)
4166
Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1
4168
if type(a) == N.ndarray and a.dtype in [N.int_, N.short, N.ubyte]:
4169
a = a.astype(N.float_)
4170
if dimension == None:
4171
s = N.sum(N.ravel(a))
4172
elif type(dimension) in [IntType,FloatType]:
4173
s = N.add.reduce(a, dimension)
4177
s = N.reshape(s,shp)
4178
else: # must be a SEQUENCE of dims to sum over
4179
dims = list(dimension)
4184
s = N.add.reduce(s,dim)
4189
s = N.reshape(s,shp)
4193
def acumsum(a,dimension=None):
4195
Returns an array consisting of the cumulative sum of the items in the
4196
passed array. Dimension can equal None (ravel array first), an
4197
integer (the dimension over which to operate), or a sequence (operate
4198
over multiple dimensions, but this last one just barely makes sense).
4200
Usage: acumsum(a,dimension=None)
4202
if dimension == None:
4205
if type(dimension) in [ListType, TupleType, N.ndarray]:
4206
dimension = list(dimension)
4210
a = N.add.accumulate(a,d)
4213
return N.add.accumulate(a,dimension)
4216
def ass(inarray, dimension=None, keepdims=0):
4218
Squares each value in the passed array, adds these squares & returns
4219
the result. Unfortunate function name. :-) Defaults to ALL values in
4220
the array. Dimension can equal None (ravel array first), an integer
4221
(the dimension over which to operate), or a sequence (operate over
4222
multiple dimensions). Set keepdims=1 to maintain the original number
4225
Usage: ass(inarray, dimension=None, keepdims=0)
4226
Returns: sum-along-'dimension' for (inarray*inarray)
4228
if dimension == None:
4229
inarray = N.ravel(inarray)
4231
return asum(inarray*inarray,dimension,keepdims)
4234
def asummult(array1,array2,dimension=None,keepdims=0):
4236
Multiplies elements in array1 and array2, element by element, and
4237
returns the sum (along 'dimension') of all resulting multiplications.
4238
Dimension can equal None (ravel array first), an integer (the
4239
dimension over which to operate), or a sequence (operate over multiple
4240
dimensions). A trivial function, but included for completeness.
4242
Usage: asummult(array1,array2,dimension=None,keepdims=0)
4244
if dimension == None:
4245
array1 = N.ravel(array1)
4246
array2 = N.ravel(array2)
4248
return asum(array1*array2,dimension,keepdims)
4251
def asquare_of_sums(inarray, dimension=None, keepdims=0):
4253
Adds the values in the passed array, squares that sum, and returns the
4254
result. Dimension can equal None (ravel array first), an integer (the
4255
dimension over which to operate), or a sequence (operate over multiple
4256
dimensions). If keepdims=1, the returned array will have the same
4257
NUMBER of dimensions as the original.
4259
Usage: asquare_of_sums(inarray, dimension=None, keepdims=0)
4260
Returns: the square of the sum over dim(s) in dimension
4262
if dimension == None:
4263
inarray = N.ravel(inarray)
4265
s = asum(inarray,dimension,keepdims)
4266
if type(s) == N.ndarray:
4267
return s.astype(N.float_)*s
4272
def asumdiffsquared(a,b, dimension=None, keepdims=0):
4274
Takes pairwise differences of the values in arrays a and b, squares
4275
these differences, and returns the sum of these squares. Dimension
4276
can equal None (ravel array first), an integer (the dimension over
4277
which to operate), or a sequence (operate over multiple dimensions).
4278
keepdims=1 means the return shape = len(a.shape) = len(b.shape)
4280
Usage: asumdiffsquared(a,b)
4281
Returns: sum[ravel(a-b)**2]
4283
if dimension == None:
4284
inarray = N.ravel(a)
4286
return asum((a-b)**2,dimension,keepdims)
4289
def ashellsort(inarray):
4291
Shellsort algorithm. Sorts a 1D-array.
4293
Usage: ashellsort(inarray)
4294
Returns: sorted-inarray, sorting-index-vector (for original array)
4299
gap = n/2 # integer division needed
4301
for i in range(gap,n):
4302
for j in range(i-gap,-1,-gap):
4303
while j>=0 and svec[j]>svec[j+gap]:
4305
svec[j] = svec[j+gap]
4308
ivec[j] = ivec[j+gap]
4310
gap = gap / 2 # integer division needed
4311
# svec is now sorted input vector, ivec has the order svec[i] = vec[ivec[i]]
4315
def arankdata(inarray):
4317
Ranks the data in inarray, dealing with ties appropritely. Assumes
4318
a 1D inarray. Adapted from Gary Perlman's |Stat ranksort.
4320
Usage: arankdata(inarray)
4321
Returns: array of length equal to inarray, containing rank scores
4324
svec, ivec = ashellsort(inarray)
4327
newarray = N.zeros(n,N.float_)
4329
sumranks = sumranks + i
4330
dupcount = dupcount + 1
4331
if i==n-1 or svec[i] <> svec[i+1]:
4332
averank = sumranks / float(dupcount) + 1
4333
for j in range(i-dupcount+1,i+1):
4334
newarray[ivec[j]] = averank
4340
def afindwithin(data):
4342
Returns a binary vector, 1=within-subject factor, 0=between. Input
4343
equals the entire data array (i.e., column 1=random factor, last
4344
column = measured values.
4346
Usage: afindwithin(data) data in |Stat format
4348
numfact = len(data[0])-2
4349
withinvec = [0]*numfact
4350
for col in range(1,numfact+1):
4351
rows = pstat.linexand(data,col,pstat.unique(pstat.colex(data,1))[0]) # get 1 level of this factor
4352
if len(pstat.unique(pstat.colex(rows,0))) < len(rows): # if fewer subjects than scores on this factor
4353
withinvec[col-1] = 1
4357
#########################################################
4358
#########################################################
4359
###### RE-DEFINE DISPATCHES TO INCLUDE ARRAYS #########
4360
#########################################################
4361
#########################################################
4363
## CENTRAL TENDENCY:
4364
geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)),
4365
(ageometricmean, (N.ndarray,)) )
4366
harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)),
4367
(aharmonicmean, (N.ndarray,)) )
4368
mean = Dispatch ( (lmean, (ListType, TupleType)),
4369
(amean, (N.ndarray,)) )
4370
median = Dispatch ( (lmedian, (ListType, TupleType)),
4371
(amedian, (N.ndarray,)) )
4372
medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)),
4373
(amedianscore, (N.ndarray,)) )
4374
mode = Dispatch ( (lmode, (ListType, TupleType)),
4375
(amode, (N.ndarray,)) )
4376
tmean = Dispatch ( (atmean, (N.ndarray,)) )
4377
tvar = Dispatch ( (atvar, (N.ndarray,)) )
4378
tstdev = Dispatch ( (atstdev, (N.ndarray,)) )
4379
tsem = Dispatch ( (atsem, (N.ndarray,)) )
4382
moment = Dispatch ( (lmoment, (ListType, TupleType)),
4383
(amoment, (N.ndarray,)) )
4384
variation = Dispatch ( (lvariation, (ListType, TupleType)),
4385
(avariation, (N.ndarray,)) )
4386
skew = Dispatch ( (lskew, (ListType, TupleType)),
4387
(askew, (N.ndarray,)) )
4388
kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)),
4389
(akurtosis, (N.ndarray,)) )
4390
describe = Dispatch ( (ldescribe, (ListType, TupleType)),
4391
(adescribe, (N.ndarray,)) )
4393
## DISTRIBUTION TESTS
4395
skewtest = Dispatch ( (askewtest, (ListType, TupleType)),
4396
(askewtest, (N.ndarray,)) )
4397
kurtosistest = Dispatch ( (akurtosistest, (ListType, TupleType)),
4398
(akurtosistest, (N.ndarray,)) )
4399
normaltest = Dispatch ( (anormaltest, (ListType, TupleType)),
4400
(anormaltest, (N.ndarray,)) )
4403
itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)),
4404
(aitemfreq, (N.ndarray,)) )
4405
scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)),
4406
(ascoreatpercentile, (N.ndarray,)) )
4407
percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)),
4408
(apercentileofscore, (N.ndarray,)) )
4409
histogram = Dispatch ( (lhistogram, (ListType, TupleType)),
4410
(ahistogram, (N.ndarray,)) )
4411
cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)),
4412
(acumfreq, (N.ndarray,)) )
4413
relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)),
4414
(arelfreq, (N.ndarray,)) )
4417
obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)),
4418
(aobrientransform, (N.ndarray,)) )
4419
samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)),
4420
(asamplevar, (N.ndarray,)) )
4421
samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)),
4422
(asamplestdev, (N.ndarray,)) )
4423
signaltonoise = Dispatch( (asignaltonoise, (N.ndarray,)),)
4424
var = Dispatch ( (lvar, (ListType, TupleType)),
4425
(avar, (N.ndarray,)) )
4426
stdev = Dispatch ( (lstdev, (ListType, TupleType)),
4427
(astdev, (N.ndarray,)) )
4428
sterr = Dispatch ( (lsterr, (ListType, TupleType)),
4429
(asterr, (N.ndarray,)) )
4430
sem = Dispatch ( (lsem, (ListType, TupleType)),
4431
(asem, (N.ndarray,)) )
4432
z = Dispatch ( (lz, (ListType, TupleType)),
4433
(az, (N.ndarray,)) )
4434
zs = Dispatch ( (lzs, (ListType, TupleType)),
4435
(azs, (N.ndarray,)) )
4438
threshold = Dispatch( (athreshold, (N.ndarray,)),)
4439
trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)),
4440
(atrimboth, (N.ndarray,)) )
4441
trim1 = Dispatch ( (ltrim1, (ListType, TupleType)),
4442
(atrim1, (N.ndarray,)) )
4444
## CORRELATION FCNS:
4445
paired = Dispatch ( (lpaired, (ListType, TupleType)),
4446
(apaired, (N.ndarray,)) )
4447
lincc = Dispatch ( (llincc, (ListType, TupleType)),
4448
(alincc, (N.ndarray,)) )
4449
pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)),
4450
(apearsonr, (N.ndarray,)) )
4451
spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)),
4452
(aspearmanr, (N.ndarray,)) )
4453
pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)),
4454
(apointbiserialr, (N.ndarray,)) )
4455
kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)),
4456
(akendalltau, (N.ndarray,)) )
4457
linregress = Dispatch ( (llinregress, (ListType, TupleType)),
4458
(alinregress, (N.ndarray,)) )
4460
## INFERENTIAL STATS:
4461
ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)),
4462
(attest_1samp, (N.ndarray,)) )
4463
ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)),
4464
(attest_ind, (N.ndarray,)) )
4465
ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)),
4466
(attest_rel, (N.ndarray,)) )
4467
chisquare = Dispatch ( (lchisquare, (ListType, TupleType)),
4468
(achisquare, (N.ndarray,)) )
4469
ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)),
4470
(aks_2samp, (N.ndarray,)) )
4471
mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)),
4472
(amannwhitneyu, (N.ndarray,)) )
4473
tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)),
4474
(atiecorrect, (N.ndarray,)) )
4475
ranksums = Dispatch ( (lranksums, (ListType, TupleType)),
4476
(aranksums, (N.ndarray,)) )
4477
wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)),
4478
(awilcoxont, (N.ndarray,)) )
4480
kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)),
4481
(akruskalwallish, (N.ndarray,)) )
4482
friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)),
4483
(afriedmanchisquare, (N.ndarray,)) )
4485
## PROBABILITY CALCS:
4486
chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)),
4487
(achisqprob, (N.ndarray,)) )
4488
zprob = Dispatch ( (lzprob, (IntType, FloatType)),
4489
(azprob, (N.ndarray,)) )
4490
ksprob = Dispatch ( (lksprob, (IntType, FloatType)),
4491
(aksprob, (N.ndarray,)) )
4492
fprob = Dispatch ( (lfprob, (IntType, FloatType)),
4493
(afprob, (N.ndarray,)) )
4494
betacf = Dispatch ( (lbetacf, (IntType, FloatType)),
4495
(abetacf, (N.ndarray,)) )
4496
betai = Dispatch ( (lbetai, (IntType, FloatType)),
4497
(abetai, (N.ndarray,)) )
4498
erfcc = Dispatch ( (lerfcc, (IntType, FloatType)),
4499
(aerfcc, (N.ndarray,)) )
4500
gammln = Dispatch ( (lgammln, (IntType, FloatType)),
4501
(agammln, (N.ndarray,)) )
4504
F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)),
4505
(aF_oneway, (N.ndarray,)) )
4506
F_value = Dispatch ( (lF_value, (ListType, TupleType)),
4507
(aF_value, (N.ndarray,)) )
4509
## SUPPORT FUNCTIONS:
4510
incr = Dispatch ( (lincr, (ListType, TupleType, N.ndarray)), )
4511
sum = Dispatch ( (lsum, (ListType, TupleType)),
4512
(asum, (N.ndarray,)) )
4513
cumsum = Dispatch ( (lcumsum, (ListType, TupleType)),
4514
(acumsum, (N.ndarray,)) )
4515
ss = Dispatch ( (lss, (ListType, TupleType)),
4516
(ass, (N.ndarray,)) )
4517
summult = Dispatch ( (lsummult, (ListType, TupleType)),
4518
(asummult, (N.ndarray,)) )
4519
square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)),
4520
(asquare_of_sums, (N.ndarray,)) )
4521
sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)),
4522
(asumdiffsquared, (N.ndarray,)) )
4523
shellsort = Dispatch ( (lshellsort, (ListType, TupleType)),
4524
(ashellsort, (N.ndarray,)) )
4525
rankdata = Dispatch ( (lrankdata, (ListType, TupleType)),
4526
(arankdata, (N.ndarray,)) )
4527
findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)),
4528
(afindwithin, (N.ndarray,)) )
4530
###################### END OF NUMERIC FUNCTION BLOCK #####################
4532
###################### END OF STATISTICAL FUNCTIONS ######################