1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23 __VERSION__="ete2-2.0rev96"
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47 """
48 stats.py module
49
50 (Requires pstat.py module.)
51
52 #################################################
53 ####### Written by: Gary Strangman ###########
54 ####### Last modified: Dec 18, 2007 ###########
55 #################################################
56
57 A collection of basic statistical functions for python. The function
58 names appear below.
59
60 IMPORTANT: There are really *3* sets of functions. The first set has an 'l'
61 prefix, which can be used with list or tuple arguments. The second set has
62 an 'a' prefix, which can accept NumPy array arguments. These latter
63 functions are defined only when NumPy is available on the system. The third
64 type has NO prefix (i.e., has the name that appears below). Functions of
65 this set are members of a "Dispatch" class, c/o David Ascher. This class
66 allows different functions to be called depending on the type of the passed
67 arguments. Thus, stats.mean is a member of the Dispatch class and
68 stats.mean(range(20)) will call stats.lmean(range(20)) while
69 stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)).
70 This is a handy way to keep consistent function names when different
71 argument types require different functions to be called. Having
72 implementated the Dispatch class, however, means that to get info on
73 a given function, you must use the REAL function name ... that is
74 "print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine,
75 while "print stats.mean.__doc__" will print the doc for the Dispatch
76 class. NUMPY FUNCTIONS ('a' prefix) generally have more argument options
77 but should otherwise be consistent with the corresponding list functions.
78
79 Disclaimers: The function list is obviously incomplete and, worse, the
80 functions are not optimized. All functions have been tested (some more
81 so than others), but they are far from bulletproof. Thus, as with any
82 free software, no warranty or guarantee is expressed or implied. :-) A
83 few extra functions that don't appear in the list below can be found by
84 interested treasure-hunters. These functions don't necessarily have
85 both list and array versions but were deemed useful
86
87 CENTRAL TENDENCY: geometricmean
88 harmonicmean
89 mean
90 median
91 medianscore
92 mode
93
94 MOMENTS: moment
95 variation
96 skew
97 kurtosis
98 skewtest (for Numpy arrays only)
99 kurtosistest (for Numpy arrays only)
100 normaltest (for Numpy arrays only)
101
102 ALTERED VERSIONS: tmean (for Numpy arrays only)
103 tvar (for Numpy arrays only)
104 tmin (for Numpy arrays only)
105 tmax (for Numpy arrays only)
106 tstdev (for Numpy arrays only)
107 tsem (for Numpy arrays only)
108 describe
109
110 FREQUENCY STATS: itemfreq
111 scoreatpercentile
112 percentileofscore
113 histogram
114 cumfreq
115 relfreq
116
117 VARIABILITY: obrientransform
118 samplevar
119 samplestdev
120 signaltonoise (for Numpy arrays only)
121 var
122 stdev
123 sterr
124 sem
125 z
126 zs
127 zmap (for Numpy arrays only)
128
129 TRIMMING FCNS: threshold (for Numpy arrays only)
130 trimboth
131 trim1
132 round (round all vals to 'n' decimals; Numpy only)
133
134 CORRELATION FCNS: covariance (for Numpy arrays only)
135 correlation (for Numpy arrays only)
136 paired
137 pearsonr
138 spearmanr
139 pointbiserialr
140 kendalltau
141 linregress
142
143 INFERENTIAL STATS: ttest_1samp
144 ttest_ind
145 ttest_rel
146 chisquare
147 ks_2samp
148 mannwhitneyu
149 ranksums
150 wilcoxont
151 kruskalwallish
152 friedmanchisquare
153
154 PROBABILITY CALCS: chisqprob
155 erfcc
156 zprob
157 ksprob
158 fprob
159 betacf
160 gammln
161 betai
162
163 ANOVA FUNCTIONS: F_oneway
164 F_value
165
166 SUPPORT FUNCTIONS: writecc
167 incr
168 sign (for Numpy arrays only)
169 sum
170 cumsum
171 ss
172 summult
173 sumdiffsquared
174 square_of_sums
175 shellsort
176 rankdata
177 outputpairedstats
178 findwithin
179 """
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248 import pstat
249 import math, string, copy
250 from types import *
251
252 __version__ = 0.6
253
254
255
256
258 """
259 The Dispatch class, care of David Ascher, allows different functions to
260 be called depending on the argument types. This way, there can be one
261 function name regardless of the argument type. To access function doc
262 in stats.py module, prefix the function with an 'l' or 'a' for list or
263 array arguments, respectively. That is, print stats.lmean.__doc__ or
264 print stats.amean.__doc__ or whatever.
265 """
266
268 self._dispatch = {}
269 for func, types in tuples:
270 for t in types:
271 if t in self._dispatch.keys():
272 raise ValueError, "can't have two dispatches on "+str(t)
273 self._dispatch[t] = func
274 self._types = self._dispatch.keys()
275
277 if type(arg1) not in self._types:
278 raise TypeError, "don't know how to dispatch %s arguments" % type(arg1)
279 return apply(self._dispatch[type(arg1)], (arg1,) + args, kw)
280
281
282
283
284
285
286
287
288
289
290
291
293 """
294 Calculates the geometric mean of the values in the passed list.
295 That is: n-th root of (x1 * x2 * ... * xn). Assumes a '1D' list.
296
297 Usage: lgeometricmean(inlist)
298 """
299 mult = 1.0
300 one_over_n = 1.0/len(inlist)
301 for item in inlist:
302 mult = mult * pow(item,one_over_n)
303 return mult
304
305
307 """
308 Calculates the harmonic mean of the values in the passed list.
309 That is: n / (1/x1 + 1/x2 + ... + 1/xn). Assumes a '1D' list.
310
311 Usage: lharmonicmean(inlist)
312 """
313 sum = 0
314 for item in inlist:
315 sum = sum + 1.0/item
316 return len(inlist) / sum
317
318
320 """
321 Returns the arithematic mean of the values in the passed list.
322 Assumes a '1D' list, but will function on the 1st dim of an array(!).
323
324 Usage: lmean(inlist)
325 """
326 sum = 0
327 for item in inlist:
328 sum = sum + item
329 return sum/float(len(inlist))
330
331
352
353
371
372
374 """
375 Returns a list of the modal (most common) score(s) in the passed
376 list. If there is more than one such score, all are returned. The
377 bin-count for the mode(s) is also returned.
378
379 Usage: lmode(inlist)
380 Returns: bin-count for mode(s), a list of modal value(s)
381 """
382
383 scores = pstat.unique(inlist)
384 scores.sort()
385 freq = []
386 for item in scores:
387 freq.append(inlist.count(item))
388 maxfreq = max(freq)
389 mode = []
390 stillmore = 1
391 while stillmore:
392 try:
393 indx = freq.index(maxfreq)
394 mode.append(scores[indx])
395 del freq[indx]
396 del scores[indx]
397 except ValueError:
398 stillmore=0
399 return maxfreq, mode
400
401
402
403
404
405
407 """
408 Calculates the nth moment about the mean for a sample (defaults to
409 the 1st moment). Used to calculate coefficients of skewness and kurtosis.
410
411 Usage: lmoment(inlist,moment=1)
412 Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r)
413 """
414 if moment == 1:
415 return 0.0
416 else:
417 mn = mean(inlist)
418 n = len(inlist)
419 s = 0
420 for x in inlist:
421 s = s + (x-mn)**moment
422 return s/float(n)
423
424
426 """
427 Returns the coefficient of variation, as defined in CRC Standard
428 Probability and Statistics, p.6.
429
430 Usage: lvariation(inlist)
431 """
432 return 100.0*samplestdev(inlist)/float(mean(inlist))
433
434
436 """
437 Returns the skewness of a distribution, as defined in Numerical
438 Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
439
440 Usage: lskew(inlist)
441 """
442 return moment(inlist,3)/pow(moment(inlist,2),1.5)
443
444
446 """
447 Returns the kurtosis of a distribution, as defined in Numerical
448 Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
449
450 Usage: lkurtosis(inlist)
451 """
452 return moment(inlist,4)/pow(moment(inlist,2),2.0)
453
454
456 """
457 Returns some descriptive statistics of the passed list (assumed to be 1D).
458
459 Usage: ldescribe(inlist)
460 Returns: n, mean, standard deviation, skew, kurtosis
461 """
462 n = len(inlist)
463 mm = (min(inlist),max(inlist))
464 m = mean(inlist)
465 sd = stdev(inlist)
466 sk = skew(inlist)
467 kurt = kurtosis(inlist)
468 return n, mm, m, sd, sk, kurt
469
470
471
472
473
474
476 """
477 Returns a list of pairs. Each pair consists of one of the scores in inlist
478 and it's frequency count. Assumes a 1D list is passed.
479
480 Usage: litemfreq(inlist)
481 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
482 """
483 scores = pstat.unique(inlist)
484 scores.sort()
485 freq = []
486 for item in scores:
487 freq.append(inlist.count(item))
488 return pstat.abut(scores, freq)
489
490
492 """
493 Returns the score at a given percentile relative to the distribution
494 given by inlist.
495
496 Usage: lscoreatpercentile(inlist,percent)
497 """
498 if percent > 1:
499 print "\nDividing percent>1 by 100 in lscoreatpercentile().\n"
500 percent = percent / 100.0
501 targetcf = percent*len(inlist)
502 h, lrl, binsize, extras = histogram(inlist)
503 cumhist = cumsum(copy.deepcopy(h))
504 for i in range(len(cumhist)):
505 if cumhist[i] >= targetcf:
506 break
507 score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)
508 return score
509
510
512 """
513 Returns the percentile value of a score relative to the distribution
514 given by inlist. Formula depends on the values used to histogram the data(!).
515
516 Usage: lpercentileofscore(inlist,score,histbins=10,defaultlimits=None)
517 """
518
519 h, lrl, binsize, extras = histogram(inlist,histbins,defaultlimits)
520 cumhist = cumsum(copy.deepcopy(h))
521 i = int((score - lrl)/float(binsize))
522 pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inlist)) * 100
523 return pct
524
525
526 -def lhistogram (inlist,numbins=10,defaultreallimits=None,printextras=0):
527 """
528 Returns (i) a list of histogram bin counts, (ii) the smallest value
529 of the histogram binning, and (iii) the bin width (the last 2 are not
530 necessarily integers). Default number of bins is 10. If no sequence object
531 is given for defaultreallimits, the routine picks (usually non-pretty) bins
532 spanning all the numbers in the inlist.
533
534 Usage: lhistogram (inlist, numbins=10, defaultreallimits=None,suppressoutput=0)
535 Returns: list of bin values, lowerreallimit, binsize, extrapoints
536 """
537 if (defaultreallimits <> None):
538 if type(defaultreallimits) not in [ListType,TupleType] or len(defaultreallimits)==1:
539 lowerreallimit = defaultreallimits
540 upperreallimit = 1.000001 * max(inlist)
541 else:
542 lowerreallimit = defaultreallimits[0]
543 upperreallimit = defaultreallimits[1]
544 binsize = (upperreallimit-lowerreallimit)/float(numbins)
545 else:
546 estbinwidth=(max(inlist)-min(inlist))/float(numbins) +1e-6
547 binsize = ((max(inlist)-min(inlist)+estbinwidth))/float(numbins)
548 lowerreallimit = min(inlist) - binsize/2
549 bins = [0]*(numbins)
550 extrapoints = 0
551 for num in inlist:
552 try:
553 if (num-lowerreallimit) < 0:
554 extrapoints = extrapoints + 1
555 else:
556 bintoincrement = int((num-lowerreallimit)/float(binsize))
557 bins[bintoincrement] = bins[bintoincrement] + 1
558 except:
559 extrapoints = extrapoints + 1
560 if (extrapoints > 0 and printextras == 1):
561 print '\nPoints outside given histogram range =',extrapoints
562 return (bins, lowerreallimit, binsize, extrapoints)
563
564
565 -def lcumfreq(inlist,numbins=10,defaultreallimits=None):
566 """
567 Returns a cumulative frequency histogram, using the histogram function.
568
569 Usage: lcumfreq(inlist,numbins=10,defaultreallimits=None)
570 Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
571 """
572 h,l,b,e = histogram(inlist,numbins,defaultreallimits)
573 cumhist = cumsum(copy.deepcopy(h))
574 return cumhist,l,b,e
575
576
577 -def lrelfreq(inlist,numbins=10,defaultreallimits=None):
578 """
579 Returns a relative frequency histogram, using the histogram function.
580
581 Usage: lrelfreq(inlist,numbins=10,defaultreallimits=None)
582 Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
583 """
584 h,l,b,e = histogram(inlist,numbins,defaultreallimits)
585 for i in range(len(h)):
586 h[i] = h[i]/float(len(inlist))
587 return h,l,b,e
588
589
590
591
592
593
628
629
631 """
632 Returns the variance of the values in the passed list using
633 N for the denominator (i.e., DESCRIBES the sample variance only).
634
635 Usage: lsamplevar(inlist)
636 """
637 n = len(inlist)
638 mn = mean(inlist)
639 deviations = []
640 for item in inlist:
641 deviations.append(item-mn)
642 return ss(deviations)/float(n)
643
644
646 """
647 Returns the standard deviation of the values in the passed list using
648 N for the denominator (i.e., DESCRIBES the sample stdev only).
649
650 Usage: lsamplestdev(inlist)
651 """
652 return math.sqrt(samplevar(inlist))
653
654
655 -def lcov (x,y, keepdims=0):
656 """
657 Returns the estimated covariance of the values in the passed
658 array (i.e., N-1). Dimension can equal None (ravel array first), an
659 integer (the dimension over which to operate), or a sequence (operate
660 over multiple dimensions). Set keepdims=1 to return an array with the
661 same number of dimensions as inarray.
662
663 Usage: lcov(x,y,keepdims=0)
664 """
665
666 n = len(x)
667 xmn = mean(x)
668 ymn = mean(y)
669 xdeviations = [0]*len(x)
670 ydeviations = [0]*len(y)
671 for i in range(len(x)):
672 xdeviations[i] = x[i] - xmn
673 ydeviations[i] = y[i] - ymn
674 ss = 0.0
675 for i in range(len(xdeviations)):
676 ss = ss + xdeviations[i]*ydeviations[i]
677 return ss/float(n-1)
678
679
681 """
682 Returns the variance of the values in the passed list using N-1
683 for the denominator (i.e., for estimating population variance).
684
685 Usage: lvar(inlist)
686 """
687 n = len(inlist)
688 mn = mean(inlist)
689 deviations = [0]*len(inlist)
690 for i in range(len(inlist)):
691 deviations[i] = inlist[i] - mn
692 return ss(deviations)/float(n-1)
693
694
696 """
697 Returns the standard deviation of the values in the passed list
698 using N-1 in the denominator (i.e., to estimate population stdev).
699
700 Usage: lstdev(inlist)
701 """
702 return math.sqrt(var(inlist))
703
704
706 """
707 Returns the standard error of the values in the passed list using N-1
708 in the denominator (i.e., to estimate population standard error).
709
710 Usage: lsterr(inlist)
711 """
712 return stdev(inlist) / float(math.sqrt(len(inlist)))
713
714
716 """
717 Returns the estimated standard error of the mean (sx-bar) of the
718 values in the passed list. sem = stdev / sqrt(n)
719
720 Usage: lsem(inlist)
721 """
722 sd = stdev(inlist)
723 n = len(inlist)
724 return sd/math.sqrt(n)
725
726
727 -def lz (inlist, score):
728 """
729 Returns the z-score for a given input score, given that score and the
730 list from which that score came. Not appropriate for population calculations.
731
732 Usage: lz(inlist, score)
733 """
734 z = (score-mean(inlist))/samplestdev(inlist)
735 return z
736
737
739 """
740 Returns a list of z-scores, one for each score in the passed list.
741
742 Usage: lzs(inlist)
743 """
744 zscores = []
745 for item in inlist:
746 zscores.append(z(inlist,item))
747 return zscores
748
749
750
751
752
753
755 """
756 Slices off the passed proportion of items from BOTH ends of the passed
757 list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost'
758 10% of scores. Assumes list is sorted by magnitude. Slices off LESS if
759 proportion results in a non-integer slice index (i.e., conservatively
760 slices off proportiontocut).
761
762 Usage: ltrimboth (l,proportiontocut)
763 Returns: trimmed version of list l
764 """
765 lowercut = int(proportiontocut*len(l))
766 uppercut = len(l) - lowercut
767 return l[lowercut:uppercut]
768
769
770 -def ltrim1 (l,proportiontocut,tail='right'):
771 """
772 Slices off the passed proportion of items from ONE end of the passed
773 list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
774 10% of scores). Slices off LESS if proportion results in a non-integer
775 slice index (i.e., conservatively slices off proportiontocut).
776
777 Usage: ltrim1 (l,proportiontocut,tail='right') or set tail='left'
778 Returns: trimmed version of list l
779 """
780 if tail == 'right':
781 lowercut = 0
782 uppercut = len(l) - int(proportiontocut*len(l))
783 elif tail == 'left':
784 lowercut = int(proportiontocut*len(l))
785 uppercut = len(l)
786 return l[lowercut:uppercut]
787
788
789
790
791
792
794 """
795 Interactively determines the type of data and then runs the
796 appropriated statistic for paired group data.
797
798 Usage: lpaired(x,y)
799 Returns: appropriate statistic name, value, and probability
800 """
801 samples = ''
802 while samples not in ['i','r','I','R','c','C']:
803 print '\nIndependent or related samples, or correlation (i,r,c): ',
804 samples = raw_input()
805
806 if samples in ['i','I','r','R']:
807 print '\nComparing variances ...',
808
809 r = obrientransform(x,y)
810 f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1))
811 if p<0.05:
812 vartype='unequal, p='+str(round(p,4))
813 else:
814 vartype='equal'
815 print vartype
816 if samples in ['i','I']:
817 if vartype[0]=='e':
818 t,p = ttest_ind(x,y,0)
819 print '\nIndependent samples t-test: ', round(t,4),round(p,4)
820 else:
821 if len(x)>20 or len(y)>20:
822 z,p = ranksums(x,y)
823 print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4)
824 else:
825 u,p = mannwhitneyu(x,y)
826 print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4)
827
828 else:
829 if vartype[0]=='e':
830 t,p = ttest_rel(x,y,0)
831 print '\nRelated samples t-test: ', round(t,4),round(p,4)
832 else:
833 t,p = ranksums(x,y)
834 print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4)
835 else:
836 corrtype = ''
837 while corrtype not in ['c','C','r','R','d','D']:
838 print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
839 corrtype = raw_input()
840 if corrtype in ['c','C']:
841 m,b,r,p,see = linregress(x,y)
842 print '\nLinear regression for continuous variables ...'
843 lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]]
844 pstat.printcc(lol)
845 elif corrtype in ['r','R']:
846 r,p = spearmanr(x,y)
847 print '\nCorrelation for ranked variables ...'
848 print "Spearman's r: ",round(r,4),round(p,4)
849 else:
850 r,p = pointbiserialr(x,y)
851 print '\nAssuming x contains a dichotomous variable ...'
852 print 'Point Biserial r: ',round(r,4),round(p,4)
853 print '\n\n'
854 return None
855
856
858 """
859 Calculates a Pearson correlation coefficient and the associated
860 probability value. Taken from Heiman's Basic Statistics for the Behav.
861 Sci (2nd), p.195.
862
863 Usage: lpearsonr(x,y) where x and y are equal-length lists
864 Returns: Pearson's r value, two-tailed p-value
865 """
866 TINY = 1.0e-30
867 if len(x) <> len(y):
868 raise ValueError, 'Input values not paired in pearsonr. Aborting.'
869 n = len(x)
870 x = map(float,x)
871 y = map(float,y)
872 xmean = mean(x)
873 ymean = mean(y)
874 r_num = n*(summult(x,y)) - sum(x)*sum(y)
875 r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y)))
876 r = (r_num / r_den)
877 df = n-2
878 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
879 prob = betai(0.5*df,0.5,df/float(df+t*t))
880 return r, prob
881
882
884 """
885 Calculates Lin's concordance correlation coefficient.
886
887 Usage: alincc(x,y) where x, y are equal-length arrays
888 Returns: Lin's CC
889 """
890 covar = lcov(x,y)*(len(x)-1)/float(len(x))
891 xvar = lvar(x)*(len(x)-1)/float(len(x))
892 yvar = lvar(y)*(len(y)-1)/float(len(y))
893 lincc = (2 * covar) / ((xvar+yvar) +((amean(x)-amean(y))**2))
894 return lincc
895
896
898 """
899 Calculates a Spearman rank-order correlation coefficient. Taken
900 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
901
902 Usage: lspearmanr(x,y) where x and y are equal-length lists
903 Returns: Spearman's r, two-tailed p-value
904 """
905 TINY = 1e-30
906 if len(x) <> len(y):
907 raise ValueError, 'Input values not paired in spearmanr. Aborting.'
908 n = len(x)
909 rankx = rankdata(x)
910 ranky = rankdata(y)
911 dsq = sumdiffsquared(rankx,ranky)
912 rs = 1 - 6*dsq / float(n*(n**2-1))
913 t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
914 df = n-2
915 probrs = betai(0.5*df,0.5,df/(df+t*t))
916
917
918 return rs, probrs
919
920
922 """
923 Calculates a point-biserial correlation coefficient and the associated
924 probability value. Taken from Heiman's Basic Statistics for the Behav.
925 Sci (1st), p.194.
926
927 Usage: lpointbiserialr(x,y) where x,y are equal-length lists
928 Returns: Point-biserial r, two-tailed p-value
929 """
930 TINY = 1e-30
931 if len(x) <> len(y):
932 raise ValueError, 'INPUT VALUES NOT PAIRED IN pointbiserialr. ABORTING.'
933 data = pstat.abut(x,y)
934 categories = pstat.unique(x)
935 if len(categories) <> 2:
936 raise ValueError, "Exactly 2 categories required for pointbiserialr()."
937 else:
938 codemap = pstat.abut(categories,range(2))
939 recoded = pstat.recode(data,codemap,0)
940 x = pstat.linexand(data,0,categories[0])
941 y = pstat.linexand(data,0,categories[1])
942 xmean = mean(pstat.colex(x,1))
943 ymean = mean(pstat.colex(y,1))
944 n = len(data)
945 adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n)))
946 rpb = (ymean - xmean)/samplestdev(pstat.colex(data,1))*adjust
947 df = n-2
948 t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY)))
949 prob = betai(0.5*df,0.5,df/(df+t*t))
950 return rpb, prob
951
952
954 """
955 Calculates Kendall's tau ... correlation of ordinal data. Adapted
956 from function kendl1 in Numerical Recipies. Needs good test-routine.@@@
957
958 Usage: lkendalltau(x,y)
959 Returns: Kendall's tau, two-tailed p-value
960 """
961 n1 = 0
962 n2 = 0
963 iss = 0
964 for j in range(len(x)-1):
965 for k in range(j,len(y)):
966 a1 = x[j] - x[k]
967 a2 = y[j] - y[k]
968 aa = a1 * a2
969 if (aa):
970 n1 = n1 + 1
971 n2 = n2 + 1
972 if aa > 0:
973 iss = iss + 1
974 else:
975 iss = iss -1
976 else:
977 if (a1):
978 n1 = n1 + 1
979 else:
980 n2 = n2 + 1
981 tau = iss / math.sqrt(n1*n2)
982 svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1))
983 z = tau / math.sqrt(svar)
984 prob = erfcc(abs(z)/1.4142136)
985 return tau, prob
986
987
989 """
990 Calculates a regression line on x,y pairs.
991
992 Usage: llinregress(x,y) x,y are equal-length lists of x-y coordinates
993 Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate
994 """
995 TINY = 1.0e-20
996 if len(x) <> len(y):
997 raise ValueError, 'Input values not paired in linregress. Aborting.'
998 n = len(x)
999 x = map(float,x)
1000 y = map(float,y)
1001 xmean = mean(x)
1002 ymean = mean(y)
1003 r_num = float(n*(summult(x,y)) - sum(x)*sum(y))
1004 r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y)))
1005 r = r_num / r_den
1006 z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY))
1007 df = n-2
1008 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
1009 prob = betai(0.5*df,0.5,df/(df+t*t))
1010 slope = r_num / float(n*ss(x) - square_of_sums(x))
1011 intercept = ymean - slope*xmean
1012 sterrest = math.sqrt(1-r*r)*samplestdev(y)
1013 return slope, intercept, r, prob, sterrest
1014
1015
1016
1017
1018
1019
1020 -def lttest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
1021 """
1022 Calculates the t-obtained for the independent samples T-test on ONE group
1023 of scores a, given a population mean. If printit=1, results are printed
1024 to the screen. If printit='filename', the results are output to 'filename'
1025 using the given writemode (default=append). Returns t-value, and prob.
1026
1027 Usage: lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
1028 Returns: t-value, two-tailed prob
1029 """
1030 x = mean(a)
1031 v = var(a)
1032 n = len(a)
1033 df = n-1
1034 svar = ((n-1)*v)/float(df)
1035 t = (x-popmean)/math.sqrt(svar*(1.0/n))
1036 prob = betai(0.5*df,0.5,float(df)/(df+t*t))
1037
1038 if printit <> 0:
1039 statname = 'Single-sample T-test.'
1040 outputpairedstats(printit,writemode,
1041 'Population','--',popmean,0,0,0,
1042 name,n,x,v,min(a),max(a),
1043 statname,t,prob)
1044 return t,prob
1045
1046
1047 -def lttest_ind (a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'):
1048 """
1049 Calculates the t-obtained T-test on TWO INDEPENDENT samples of
1050 scores a, and b. From Numerical Recipies, p.483. If printit=1, results
1051 are printed to the screen. If printit='filename', the results are output
1052 to 'filename' using the given writemode (default=append). Returns t-value,
1053 and prob.
1054
1055 Usage: lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a')
1056 Returns: t-value, two-tailed prob
1057 """
1058 x1 = mean(a)
1059 x2 = mean(b)
1060 v1 = stdev(a)**2
1061 v2 = stdev(b)**2
1062 n1 = len(a)
1063 n2 = len(b)
1064 df = n1+n2-2
1065 svar = ((n1-1)*v1+(n2-1)*v2)/float(df)
1066 t = (x1-x2)/math.sqrt(svar*(1.0/n1 + 1.0/n2))
1067 prob = betai(0.5*df,0.5,df/(df+t*t))
1068
1069 if printit <> 0:
1070 statname = 'Independent samples T-test.'
1071 outputpairedstats(printit,writemode,
1072 name1,n1,x1,v1,min(a),max(a),
1073 name2,n2,x2,v2,min(b),max(b),
1074 statname,t,prob)
1075 return t,prob
1076
1077
1078 -def lttest_rel (a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a'):
1079 """
1080 Calculates the t-obtained T-test on TWO RELATED samples of scores,
1081 a and b. From Numerical Recipies, p.483. If printit=1, results are
1082 printed to the screen. If printit='filename', the results are output to
1083 'filename' using the given writemode (default=append). Returns t-value,
1084 and prob.
1085
1086 Usage: lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a')
1087 Returns: t-value, two-tailed prob
1088 """
1089 if len(a)<>len(b):
1090 raise ValueError, 'Unequal length lists in ttest_rel.'
1091 x1 = mean(a)
1092 x2 = mean(b)
1093 v1 = var(a)
1094 v2 = var(b)
1095 n = len(a)
1096 cov = 0
1097 for i in range(len(a)):
1098 cov = cov + (a[i]-x1) * (b[i]-x2)
1099 df = n-1
1100 cov = cov / float(df)
1101 sd = math.sqrt((v1+v2 - 2.0*cov)/float(n))
1102 t = (x1-x2)/sd
1103 prob = betai(0.5*df,0.5,df/(df+t*t))
1104
1105 if printit <> 0:
1106 statname = 'Related samples T-test.'
1107 outputpairedstats(printit,writemode,
1108 name1,n,x1,v1,min(a),max(a),
1109 name2,n,x2,v2,min(b),max(b),
1110 statname,t,prob)
1111 return t, prob
1112
1113
1115 """
1116 Calculates a one-way chi square for list of observed frequencies and returns
1117 the result. If no expected frequencies are given, the total N is assumed to
1118 be equally distributed across all groups.
1119
1120 Usage: lchisquare(f_obs, f_exp=None) f_obs = list of observed cell freq.
1121 Returns: chisquare-statistic, associated p-value
1122 """
1123 k = len(f_obs)
1124 if f_exp == None:
1125 f_exp = [sum(f_obs)/float(k)] * len(f_obs)
1126 chisq = 0
1127 for i in range(len(f_obs)):
1128 chisq = chisq + (f_obs[i]-f_exp[i])**2 / float(f_exp[i])
1129 return chisq, chisqprob(chisq, k-1)
1130
1131
1133 """
1134 Computes the Kolmogorov-Smirnof statistic on 2 samples. From
1135 Numerical Recipies in C, page 493.
1136
1137 Usage: lks_2samp(data1,data2) data1&2 are lists of values for 2 conditions
1138 Returns: KS D-value, associated p-value
1139 """
1140 j1 = 0
1141 j2 = 0
1142 fn1 = 0.0
1143 fn2 = 0.0
1144 n1 = len(data1)
1145 n2 = len(data2)
1146 en1 = n1
1147 en2 = n2
1148 d = 0.0
1149 data1.sort()
1150 data2.sort()
1151 while j1 < n1 and j2 < n2:
1152 d1=data1[j1]
1153 d2=data2[j2]
1154 if d1 <= d2:
1155 fn1 = (j1)/float(en1)
1156 j1 = j1 + 1
1157 if d2 <= d1:
1158 fn2 = (j2)/float(en2)
1159 j2 = j2 + 1
1160 dt = (fn2-fn1)
1161 if math.fabs(dt) > math.fabs(d):
1162 d = dt
1163 try:
1164 en = math.sqrt(en1*en2/float(en1+en2))
1165 prob = ksprob((en+0.12+0.11/en)*abs(d))
1166 except:
1167 prob = 1.0
1168 return d, prob
1169
1170
1172 """
1173 Calculates a Mann-Whitney U statistic on the provided scores and
1174 returns the result. Use only when the n in each condition is < 20 and
1175 you have 2 independent samples of ranks. NOTE: Mann-Whitney U is
1176 significant if the u-obtained is LESS THAN or equal to the critical
1177 value of U found in the tables. Equivalent to Kruskal-Wallis H with
1178 just 2 groups.
1179
1180 Usage: lmannwhitneyu(data)
1181 Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
1182 """
1183 n1 = len(x)
1184 n2 = len(y)
1185 ranked = rankdata(x+y)
1186 rankx = ranked[0:n1]
1187 ranky = ranked[n1:]
1188 u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx)
1189 u2 = n1*n2 - u1
1190 bigu = max(u1,u2)
1191 smallu = min(u1,u2)
1192 T = math.sqrt(tiecorrect(ranked))
1193 if T == 0:
1194 raise ValueError, 'All numbers are identical in lmannwhitneyu'
1195 sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0)
1196 z = abs((bigu-n1*n2/2.0) / sd)
1197 return smallu, 1.0 - zprob(z)
1198
1199
1201 """
1202 Corrects for ties in Mann Whitney U and Kruskal Wallis H tests. See
1203 Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences.
1204 New York: McGraw-Hill. Code adapted from |Stat rankind.c code.
1205
1206 Usage: ltiecorrect(rankvals)
1207 Returns: T correction factor for U or H
1208 """
1209 sorted,posn = shellsort(rankvals)
1210 n = len(sorted)
1211 T = 0.0
1212 i = 0
1213 while (i<n-1):
1214 if sorted[i] == sorted[i+1]:
1215 nties = 1
1216 while (i<n-1) and (sorted[i] == sorted[i+1]):
1217 nties = nties +1
1218 i = i +1
1219 T = T + nties**3 - nties
1220 i = i+1
1221 T = T / float(n**3-n)
1222 return 1.0 - T
1223
1224
1226 """
1227 Calculates the rank sums statistic on the provided scores and
1228 returns the result. Use only when the n in each condition is > 20 and you
1229 have 2 independent samples of ranks.
1230
1231 Usage: lranksums(x,y)
1232 Returns: a z-statistic, two-tailed p-value
1233 """
1234 n1 = len(x)
1235 n2 = len(y)
1236 alldata = x+y
1237 ranked = rankdata(alldata)
1238 x = ranked[:n1]
1239 y = ranked[n1:]
1240 s = sum(x)
1241 expected = n1*(n1+n2+1) / 2.0
1242 z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0)
1243 prob = 2*(1.0 -zprob(abs(z)))
1244 return z, prob
1245
1246
1248 """
1249 Calculates the Wilcoxon T-test for related samples and returns the
1250 result. A non-parametric T-test.
1251
1252 Usage: lwilcoxont(x,y)
1253 Returns: a t-statistic, two-tail probability estimate
1254 """
1255 if len(x) <> len(y):
1256 raise ValueError, 'Unequal N in wilcoxont. Aborting.'
1257 d=[]
1258 for i in range(len(x)):
1259 diff = x[i] - y[i]
1260 if diff <> 0:
1261 d.append(diff)
1262 count = len(d)
1263 absd = map(abs,d)
1264 absranked = rankdata(absd)
1265 r_plus = 0.0
1266 r_minus = 0.0
1267 for i in range(len(absd)):
1268 if d[i] < 0:
1269 r_minus = r_minus + absranked[i]
1270 else:
1271 r_plus = r_plus + absranked[i]
1272 wt = min(r_plus, r_minus)
1273 mn = count * (count+1) * 0.25
1274 se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0)
1275 z = math.fabs(wt-mn) / se
1276 prob = 2*(1.0 -zprob(abs(z)))
1277 return wt, prob
1278
1279
1281 """
1282 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
1283 groups, requiring at least 5 subjects in each group. This function
1284 calculates the Kruskal-Wallis H-test for 3 or more independent samples
1285 and returns the result.
1286
1287 Usage: lkruskalwallish(*args)
1288 Returns: H-statistic (corrected for ties), associated p-value
1289 """
1290 args = list(args)
1291 n = [0]*len(args)
1292 all = []
1293 n = map(len,args)
1294 for i in range(len(args)):
1295 all = all + args[i]
1296 ranked = rankdata(all)
1297 T = tiecorrect(ranked)
1298 for i in range(len(args)):
1299 args[i] = ranked[0:n[i]]
1300 del ranked[0:n[i]]
1301 rsums = []
1302 for i in range(len(args)):
1303 rsums.append(sum(args[i])**2)
1304 rsums[i] = rsums[i] / float(n[i])
1305 ssbn = sum(rsums)
1306 totaln = sum(n)
1307 h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1)
1308 df = len(args) - 1
1309 if T == 0:
1310 raise ValueError, 'All numbers are identical in lkruskalwallish'
1311 h = h / float(T)
1312 return h, chisqprob(h,df)
1313
1314
1316 """
1317 Friedman Chi-Square is a non-parametric, one-way within-subjects
1318 ANOVA. This function calculates the Friedman Chi-square test for repeated
1319 measures and returns the result, along with the associated probability
1320 value. It assumes 3 or more repeated measures. Only 3 levels requires a
1321 minimum of 10 subjects in the study. Four levels requires 5 subjects per
1322 level(??).
1323
1324 Usage: lfriedmanchisquare(*args)
1325 Returns: chi-square statistic, associated p-value
1326 """
1327 k = len(args)
1328 if k < 3:
1329 raise ValueError, 'Less than 3 levels. Friedman test not appropriate.'
1330 n = len(args[0])
1331 data = apply(pstat.abut,tuple(args))
1332 for i in range(len(data)):
1333 data[i] = rankdata(data[i])
1334 ssbn = 0
1335 for i in range(k):
1336 ssbn = ssbn + sum(args[i])**2
1337 chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1)
1338 return chisq, chisqprob(chisq,k-1)
1339
1340
1341
1342
1343
1344
1346 """
1347 Returns the (1-tailed) probability value associated with the provided
1348 chi-square value and df. Adapted from chisq.c in Gary Perlman's |Stat.
1349
1350 Usage: lchisqprob(chisq,df)
1351 """
1352 BIG = 20.0
1353 def ex(x):
1354 BIG = 20.0
1355 if x < -BIG:
1356 return 0.0
1357 else:
1358 return math.exp(x)
1359
1360 if chisq <=0 or df < 1:
1361 return 1.0
1362 a = 0.5 * chisq
1363 if df%2 == 0:
1364 even = 1
1365 else:
1366 even = 0
1367 if df > 1:
1368 y = ex(-a)
1369 if even:
1370 s = y
1371 else:
1372 s = 2.0 * zprob(-math.sqrt(chisq))
1373 if (df > 2):
1374 chisq = 0.5 * (df - 1.0)
1375 if even:
1376 z = 1.0
1377 else:
1378 z = 0.5
1379 if a > BIG:
1380 if even:
1381 e = 0.0
1382 else:
1383 e = math.log(math.sqrt(math.pi))
1384 c = math.log(a)
1385 while (z <= chisq):
1386 e = math.log(z) + e
1387 s = s + ex(c*z-a-e)
1388 z = z + 1.0
1389 return s
1390 else:
1391 if even:
1392 e = 1.0
1393 else:
1394 e = 1.0 / math.sqrt(math.pi) / math.sqrt(a)
1395 c = 0.0
1396 while (z <= chisq):
1397 e = e * (a/float(z))
1398 c = c + e
1399 z = z + 1.0
1400 return (c*y+s)
1401 else:
1402 return s
1403
1404
1406 """
1407 Returns the complementary error function erfc(x) with fractional
1408 error everywhere less than 1.2e-7. Adapted from Numerical Recipies.
1409
1410 Usage: lerfcc(x)
1411 """
1412 z = abs(x)
1413 t = 1.0 / (1.0+0.5*z)
1414 ans = t * math.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277)))))))))
1415 if x >= 0:
1416 return ans
1417 else:
1418 return 2.0 - ans
1419
1420
1422 """
1423 Returns the area under the normal curve 'to the left of' the given z value.
1424 Thus,
1425 for z<0, zprob(z) = 1-tail probability
1426 for z>0, 1.0-zprob(z) = 1-tail probability
1427 for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
1428 Adapted from z.c in Gary Perlman's |Stat.
1429
1430 Usage: lzprob(z)
1431 """
1432 Z_MAX = 6.0
1433 if z == 0.0:
1434 x = 0.0
1435 else:
1436 y = 0.5 * math.fabs(z)
1437 if y >= (Z_MAX*0.5):
1438 x = 1.0
1439 elif (y < 1.0):
1440 w = y*y
1441 x = ((((((((0.000124818987 * w
1442 -0.001075204047) * w +0.005198775019) * w
1443 -0.019198292004) * w +0.059054035642) * w
1444 -0.151968751364) * w +0.319152932694) * w
1445 -0.531923007300) * w +0.797884560593) * y * 2.0
1446 else:
1447 y = y - 2.0
1448 x = (((((((((((((-0.000045255659 * y
1449 +0.000152529290) * y -0.000019538132) * y
1450 -0.000676904986) * y +0.001390604284) * y
1451 -0.000794620820) * y -0.002034254874) * y
1452 +0.006549791214) * y -0.010557625006) * y
1453 +0.011630447319) * y -0.009279453341) * y
1454 +0.005353579108) * y -0.002141268741) * y
1455 +0.000535310849) * y +0.999936657524
1456 if z > 0.0:
1457 prob = ((x+1.0)*0.5)
1458 else:
1459 prob = ((1.0-x)*0.5)
1460 return prob
1461
1462
1464 """
1465 Computes a Kolmolgorov-Smirnov t-test significance level. Adapted from
1466 Numerical Recipies.
1467
1468 Usage: lksprob(alam)
1469 """
1470 fac = 2.0
1471 sum = 0.0
1472 termbf = 0.0
1473 a2 = -2.0*alam*alam
1474 for j in range(1,201):
1475 term = fac*math.exp(a2*j*j)
1476 sum = sum + term
1477 if math.fabs(term) <= (0.001*termbf) or math.fabs(term) < (1.0e-8*sum):
1478 return sum
1479 fac = -fac
1480 termbf = math.fabs(term)
1481 return 1.0
1482
1483
1484 -def lfprob (dfnum, dfden, F):
1485 """
1486 Returns the (1-tailed) significance level (p-value) of an F
1487 statistic given the degrees of freedom for the numerator (dfR-dfF) and
1488 the degrees of freedom for the denominator (dfF).
1489
1490 Usage: lfprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn
1491 """
1492 p = betai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F))
1493 return p
1494
1495
1497 """
1498 This function evaluates the continued fraction form of the incomplete
1499 Beta function, betai. (Adapted from: Numerical Recipies in C.)
1500
1501 Usage: lbetacf(a,b,x)
1502 """
1503 ITMAX = 200
1504 EPS = 3.0e-7
1505
1506 bm = az = am = 1.0
1507 qab = a+b
1508 qap = a+1.0
1509 qam = a-1.0
1510 bz = 1.0-qab*x/qap
1511 for i in range(ITMAX+1):
1512 em = float(i+1)
1513 tem = em + em
1514 d = em*(b-em)*x/((qam+tem)*(a+tem))
1515 ap = az + d*am
1516 bp = bz+d*bm
1517 d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem))
1518 app = ap+d*az
1519 bpp = bp+d*bz
1520 aold = az
1521 am = ap/bpp
1522 bm = bp/bpp
1523 az = app/bpp
1524 bz = 1.0
1525 if (abs(az-aold)<(EPS*abs(az))):
1526 return az
1527 print 'a or b too big, or ITMAX too small in Betacf.'
1528
1529
1531 """
1532 Returns the gamma function of xx.
1533 Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
1534 (Adapted from: Numerical Recipies in C.)
1535
1536 Usage: lgammln(xx)
1537 """
1538
1539 coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
1540 0.120858003e-2, -0.536382e-5]
1541 x = xx - 1.0
1542 tmp = x + 5.5
1543 tmp = tmp - (x+0.5)*math.log(tmp)
1544 ser = 1.0
1545 for j in range(len(coeff)):
1546 x = x + 1
1547 ser = ser + coeff[j]/x
1548 return -tmp + math.log(2.50662827465*ser)
1549
1550
1552 """
1553 Returns the incomplete beta function:
1554
1555 I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
1556
1557 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
1558 function of a. The continued fraction formulation is implemented here,
1559 using the betacf function. (Adapted from: Numerical Recipies in C.)
1560
1561 Usage: lbetai(a,b,x)
1562 """
1563 if (x<0.0 or x>1.0):
1564 raise ValueError, 'Bad x in lbetai'
1565 if (x==0.0 or x==1.0):
1566 bt = 0.0
1567 else:
1568 bt = math.exp(gammln(a+b)-gammln(a)-gammln(b)+a*math.log(x)+b*
1569 math.log(1.0-x))
1570 if (x<(a+1.0)/(a+b+2.0)):
1571 return bt*betacf(a,b,x)/float(a)
1572 else:
1573 return 1.0-bt*betacf(b,a,1.0-x)/float(b)
1574
1575
1576
1577
1578
1579
1581 """
1582 Performs a 1-way ANOVA, returning an F-value and probability given
1583 any number of groups. From Heiman, pp.394-7.
1584
1585 Usage: F_oneway(*lists) where *lists is any number of lists, one per
1586 treatment group
1587 Returns: F value, one-tailed p-value
1588 """
1589 a = len(lists)
1590 means = [0]*a
1591 vars = [0]*a
1592 ns = [0]*a
1593 alldata = []
1594 tmp = map(N.array,lists)
1595 means = map(amean,tmp)
1596 vars = map(avar,tmp)
1597 ns = map(len,lists)
1598 for i in range(len(lists)):
1599 alldata = alldata + lists[i]
1600 alldata = N.array(alldata)
1601 bign = len(alldata)
1602 sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign))
1603 ssbn = 0
1604 for list in lists:
1605 ssbn = ssbn + asquare_of_sums(N.array(list))/float(len(list))
1606 ssbn = ssbn - (asquare_of_sums(alldata)/float(bign))
1607 sswn = sstot-ssbn
1608 dfbn = a-1
1609 dfwn = bign - a
1610 msb = ssbn/float(dfbn)
1611 msw = sswn/float(dfwn)
1612 f = msb/msw
1613 prob = fprob(dfbn,dfwn,f)
1614 return f, prob
1615
1616
1618 """
1619 Returns an F-statistic given the following:
1620 ER = error associated with the null hypothesis (the Restricted model)
1621 EF = error associated with the alternate hypothesis (the Full model)
1622 dfR-dfF = degrees of freedom of the numerator
1623 dfF = degrees of freedom associated with the denominator/Full model
1624
1625 Usage: lF_value(ER,EF,dfnum,dfden)
1626 """
1627 return ((ER-EF)/float(dfnum) / (EF/float(dfden)))
1628
1629
1630
1631
1632
1633
1634 -def writecc (listoflists,file,writetype='w',extra=2):
1635 """
1636 Writes a list of lists to a file in columns, customized by the max
1637 size of items within the columns (max size of items in col, +2 characters)
1638 to specified file. File-overwrite is the default.
1639
1640 Usage: writecc (listoflists,file,writetype='w',extra=2)
1641 Returns: None
1642 """
1643 if type(listoflists[0]) not in [ListType,TupleType]:
1644 listoflists = [listoflists]
1645 outfile = open(file,writetype)
1646 rowstokill = []
1647 list2print = copy.deepcopy(listoflists)
1648 for i in range(len(listoflists)):
1649 if listoflists[i] == ['\n'] or listoflists[i]=='\n' or listoflists[i]=='dashes':
1650 rowstokill = rowstokill + [i]
1651 rowstokill.reverse()
1652 for row in rowstokill:
1653 del list2print[row]
1654 maxsize = [0]*len(list2print[0])
1655 for col in range(len(list2print[0])):
1656 items = pstat.colex(list2print,col)
1657 items = map(pstat.makestr,items)
1658 maxsize[col] = max(map(len,items)) + extra
1659 for row in listoflists:
1660 if row == ['\n'] or row == '\n':
1661 outfile.write('\n')
1662 elif row == ['dashes'] or row == 'dashes':
1663 dashes = [0]*len(maxsize)
1664 for j in range(len(maxsize)):
1665 dashes[j] = '-'*(maxsize[j]-2)
1666 outfile.write(pstat.lineincustcols(dashes,maxsize))
1667 else:
1668 outfile.write(pstat.lineincustcols(row,maxsize))
1669 outfile.write('\n')
1670 outfile.close()
1671 return None
1672
1673
1675 """
1676 Simulate a counting system from an n-dimensional list.
1677
1678 Usage: lincr(l,cap) l=list to increment, cap=max values for each list pos'n
1679 Returns: next set of values for list l, OR -1 (if overflow)
1680 """
1681 l[0] = l[0] + 1
1682 for i in range(len(l)):
1683 if l[i] > cap[i] and i < len(l)-1:
1684 l[i] = 0
1685 l[i+1] = l[i+1] + 1
1686 elif l[i] > cap[i] and i == len(l)-1:
1687 l = -1
1688 return l
1689
1690
1692 """
1693 Returns the sum of the items in the passed list.
1694
1695 Usage: lsum(inlist)
1696 """
1697 s = 0
1698 for item in inlist:
1699 s = s + item
1700 return s
1701
1702
1704 """
1705 Returns a list consisting of the cumulative sum of the items in the
1706 passed list.
1707
1708 Usage: lcumsum(inlist)
1709 """
1710 newlist = copy.deepcopy(inlist)
1711 for i in range(1,len(newlist)):
1712 newlist[i] = newlist[i] + newlist[i-1]
1713 return newlist
1714
1715
1717 """
1718 Squares each value in the passed list, adds up these squares and
1719 returns the result.
1720
1721 Usage: lss(inlist)
1722 """
1723 ss = 0
1724 for item in inlist:
1725 ss = ss + item*item
1726 return ss
1727
1728
1730 """
1731 Multiplies elements in list1 and list2, element by element, and
1732 returns the sum of all resulting multiplications. Must provide equal
1733 length lists.
1734
1735 Usage: lsummult(list1,list2)
1736 """
1737 if len(list1) <> len(list2):
1738 raise ValueError, "Lists not equal length in summult."
1739 s = 0
1740 for item1,item2 in pstat.abut(list1,list2):
1741 s = s + item1*item2
1742 return s
1743
1744
1746 """
1747 Takes pairwise differences of the values in lists x and y, squares
1748 these differences, and returns the sum of these squares.
1749
1750 Usage: lsumdiffsquared(x,y)
1751 Returns: sum[(x[i]-y[i])**2]
1752 """
1753 sds = 0
1754 for i in range(len(x)):
1755 sds = sds + (x[i]-y[i])**2
1756 return sds
1757
1758
1760 """
1761 Adds the values in the passed list, squares the sum, and returns
1762 the result.
1763
1764 Usage: lsquare_of_sums(inlist)
1765 Returns: sum(inlist[i])**2
1766 """
1767 s = sum(inlist)
1768 return float(s)*s
1769
1770
1772 """
1773 Shellsort algorithm. Sorts a 1D-list.
1774
1775 Usage: lshellsort(inlist)
1776 Returns: sorted-inlist, sorting-index-vector (for original list)
1777 """
1778 n = len(inlist)
1779 svec = copy.deepcopy(inlist)
1780 ivec = range(n)
1781 gap = n/2
1782 while gap >0:
1783 for i in range(gap,n):
1784 for j in range(i-gap,-1,-gap):
1785 while j>=0 and svec[j]>svec[j+gap]:
1786 temp = svec[j]
1787 svec[j] = svec[j+gap]
1788 svec[j+gap] = temp
1789 itemp = ivec[j]
1790 ivec[j] = ivec[j+gap]
1791 ivec[j+gap] = itemp
1792 gap = gap / 2
1793
1794 return svec, ivec
1795
1796
1798 """
1799 Ranks the data in inlist, dealing with ties appropritely. Assumes
1800 a 1D inlist. Adapted from Gary Perlman's |Stat ranksort.
1801
1802 Usage: lrankdata(inlist)
1803 Returns: a list of length equal to inlist, containing rank scores
1804 """
1805 n = len(inlist)
1806 svec, ivec = shellsort(inlist)
1807 sumranks = 0
1808 dupcount = 0
1809 newlist = [0]*n
1810 for i in range(n):
1811 sumranks = sumranks + i
1812 dupcount = dupcount + 1
1813 if i==n-1 or svec[i] <> svec[i+1]:
1814 averank = sumranks / float(dupcount) + 1
1815 for j in range(i-dupcount+1,i+1):
1816 newlist[ivec[j]] = averank
1817 sumranks = 0
1818 dupcount = 0
1819 return newlist
1820
1821
1822 -def outputpairedstats(fname,writemode,name1,n1,m1,se1,min1,max1,name2,n2,m2,se2,min2,max2,statname,stat,prob):
1823 """
1824 Prints or write to a file stats for two groups, using the name, n,
1825 mean, sterr, min and max for each group, as well as the statistic name,
1826 its value, and the associated p-value.
1827
1828 Usage: outputpairedstats(fname,writemode,
1829 name1,n1,mean1,stderr1,min1,max1,
1830 name2,n2,mean2,stderr2,min2,max2,
1831 statname,stat,prob)
1832 Returns: None
1833 """
1834 suffix = ''
1835 try:
1836 x = prob.shape
1837 prob = prob[0]
1838 except:
1839 pass
1840 if prob < 0.001: suffix = ' ***'
1841 elif prob < 0.01: suffix = ' **'
1842 elif prob < 0.05: suffix = ' *'
1843 title = [['Name','N','Mean','SD','Min','Max']]
1844 lofl = title+[[name1,n1,round(m1,3),round(math.sqrt(se1),3),min1,max1],
1845 [name2,n2,round(m2,3),round(math.sqrt(se2),3),min2,max2]]
1846 if type(fname)<>StringType or len(fname)==0:
1847 print
1848 print statname
1849 print
1850 pstat.printcc(lofl)
1851 print
1852 try:
1853 if stat.shape == ():
1854 stat = stat[0]
1855 if prob.shape == ():
1856 prob = prob[0]
1857 except:
1858 pass
1859 print 'Test statistic = ',round(stat,3),' p = ',round(prob,3),suffix
1860 print
1861 else:
1862 file = open(fname,writemode)
1863 file.write('\n'+statname+'\n\n')
1864 file.close()
1865 writecc(lofl,fname,'a')
1866 file = open(fname,'a')
1867 try:
1868 if stat.shape == ():
1869 stat = stat[0]
1870 if prob.shape == ():
1871 prob = prob[0]
1872 except:
1873 pass
1874 file.write(pstat.list2string(['\nTest statistic = ',round(stat,4),' p = ',round(prob,4),suffix,'\n\n']))
1875 file.close()
1876 return None
1877
1878
1880 """
1881 Returns an integer representing a binary vector, where 1=within-
1882 subject factor, 0=between. Input equals the entire data 2D list (i.e.,
1883 column 0=random factor, column -1=measured values (those two are skipped).
1884 Note: input data is in |Stat format ... a list of lists ("2D list") with
1885 one row per measured value, first column=subject identifier, last column=
1886 score, one in-between column per factor (these columns contain level
1887 designations on each factor). See also stats.anova.__doc__.
1888
1889 Usage: lfindwithin(data) data in |Stat format
1890 """
1891
1892 numfact = len(data[0])-1
1893 withinvec = 0
1894 for col in range(1,numfact):
1895 examplelevel = pstat.unique(pstat.colex(data,col))[0]
1896 rows = pstat.linexand(data,col,examplelevel)
1897 factsubjs = pstat.unique(pstat.colex(rows,0))
1898 allsubjs = pstat.unique(pstat.colex(data,0))
1899 if len(factsubjs) == len(allsubjs):
1900 withinvec = withinvec + (1 << col)
1901 return withinvec
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911 geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), )
1912 harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), )
1913 mean = Dispatch ( (lmean, (ListType, TupleType)), )
1914 median = Dispatch ( (lmedian, (ListType, TupleType)), )
1915 medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), )
1916 mode = Dispatch ( (lmode, (ListType, TupleType)), )
1917
1918
1919 moment = Dispatch ( (lmoment, (ListType, TupleType)), )
1920 variation = Dispatch ( (lvariation, (ListType, TupleType)), )
1921 skew = Dispatch ( (lskew, (ListType, TupleType)), )
1922 kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), )
1923 describe = Dispatch ( (ldescribe, (ListType, TupleType)), )
1924
1925
1926 itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), )
1927 scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), )
1928 percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), )
1929 histogram = Dispatch ( (lhistogram, (ListType, TupleType)), )
1930 cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), )
1931 relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), )
1932
1933
1934 obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), )
1935 samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), )
1936 samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), )
1937 var = Dispatch ( (lvar, (ListType, TupleType)), )
1938 stdev = Dispatch ( (lstdev, (ListType, TupleType)), )
1939 sterr = Dispatch ( (lsterr, (ListType, TupleType)), )
1940 sem = Dispatch ( (lsem, (ListType, TupleType)), )
1941 z = Dispatch ( (lz, (ListType, TupleType)), )
1942 zs = Dispatch ( (lzs, (ListType, TupleType)), )
1943
1944
1945 trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), )
1946 trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), )
1947
1948
1949 paired = Dispatch ( (lpaired, (ListType, TupleType)), )
1950 pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), )
1951 spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), )
1952 pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), )
1953 kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), )
1954 linregress = Dispatch ( (llinregress, (ListType, TupleType)), )
1955
1956
1957 ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), )
1958 ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), )
1959 ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), )
1960 chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), )
1961 ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), )
1962 mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), )
1963 ranksums = Dispatch ( (lranksums, (ListType, TupleType)), )
1964 tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), )
1965 wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), )
1966 kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), )
1967 friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), )
1968
1969
1970 chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), )
1971 zprob = Dispatch ( (lzprob, (IntType, FloatType)), )
1972 ksprob = Dispatch ( (lksprob, (IntType, FloatType)), )
1973 fprob = Dispatch ( (lfprob, (IntType, FloatType)), )
1974 betacf = Dispatch ( (lbetacf, (IntType, FloatType)), )
1975 betai = Dispatch ( (lbetai, (IntType, FloatType)), )
1976 erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), )
1977 gammln = Dispatch ( (lgammln, (IntType, FloatType)), )
1978
1979
1980 F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), )
1981 F_value = Dispatch ( (lF_value, (ListType, TupleType)), )
1982
1983
1984 incr = Dispatch ( (lincr, (ListType, TupleType)), )
1985 sum = Dispatch ( (lsum, (ListType, TupleType)), )
1986 cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), )
1987 ss = Dispatch ( (lss, (ListType, TupleType)), )
1988 summult = Dispatch ( (lsummult, (ListType, TupleType)), )
1989 square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), )
1990 sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), )
1991 shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), )
1992 rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), )
1993 findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), )
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016 try:
2017 import numpy as N
2018 import numpy.linalg as LA
2019
2020
2021
2022
2023
2024
2026 """
2027 Calculates the geometric mean of the values in the passed array.
2028 That is: n-th root of (x1 * x2 * ... * xn). Defaults to ALL values in
2029 the passed array. Use dimension=None to flatten array first. REMEMBER: if
2030 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2031 if dimension is a sequence, it collapses over all specified dimensions. If
2032 keepdims is set to 1, the resulting array will have as many dimensions as
2033 inarray, with only 1 'level' per dim that was collapsed over.
2034
2035 Usage: ageometricmean(inarray,dimension=None,keepdims=0)
2036 Returns: geometric mean computed over dim(s) listed in dimension
2037 """
2038 inarray = N.array(inarray,N.float_)
2039 if dimension == None:
2040 inarray = N.ravel(inarray)
2041 size = len(inarray)
2042 mult = N.power(inarray,1.0/size)
2043 mult = N.multiply.reduce(mult)
2044 elif type(dimension) in [IntType,FloatType]:
2045 size = inarray.shape[dimension]
2046 mult = N.power(inarray,1.0/size)
2047 mult = N.multiply.reduce(mult,dimension)
2048 if keepdims == 1:
2049 shp = list(inarray.shape)
2050 shp[dimension] = 1
2051 sum = N.reshape(sum,shp)
2052 else:
2053 dims = list(dimension)
2054 dims.sort()
2055 dims.reverse()
2056 size = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.float_)
2057 mult = N.power(inarray,1.0/size)
2058 for dim in dims:
2059 mult = N.multiply.reduce(mult,dim)
2060 if keepdims == 1:
2061 shp = list(inarray.shape)
2062 for dim in dims:
2063 shp[dim] = 1
2064 mult = N.reshape(mult,shp)
2065 return mult
2066
2067
2069 """
2070 Calculates the harmonic mean of the values in the passed array.
2071 That is: n / (1/x1 + 1/x2 + ... + 1/xn). Defaults to ALL values in
2072 the passed array. Use dimension=None to flatten array first. REMEMBER: if
2073 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2074 if dimension is a sequence, it collapses over all specified dimensions. If
2075 keepdims is set to 1, the resulting array will have as many dimensions as
2076 inarray, with only 1 'level' per dim that was collapsed over.
2077
2078 Usage: aharmonicmean(inarray,dimension=None,keepdims=0)
2079 Returns: harmonic mean computed over dim(s) in dimension
2080 """
2081 inarray = inarray.astype(N.float_)
2082 if dimension == None:
2083 inarray = N.ravel(inarray)
2084 size = len(inarray)
2085 s = N.add.reduce(1.0 / inarray)
2086 elif type(dimension) in [IntType,FloatType]:
2087 size = float(inarray.shape[dimension])
2088 s = N.add.reduce(1.0/inarray, dimension)
2089 if keepdims == 1:
2090 shp = list(inarray.shape)
2091 shp[dimension] = 1
2092 s = N.reshape(s,shp)
2093 else:
2094 dims = list(dimension)
2095 dims.sort()
2096 nondims = []
2097 for i in range(len(inarray.shape)):
2098 if i not in dims:
2099 nondims.append(i)
2100 tinarray = N.transpose(inarray,nondims+dims)
2101 idx = [0] *len(nondims)
2102 if idx == []:
2103 size = len(N.ravel(inarray))
2104 s = asum(1.0 / inarray)
2105 if keepdims == 1:
2106 s = N.reshape([s],N.ones(len(inarray.shape)))
2107 else:
2108 idx[0] = -1
2109 loopcap = N.array(tinarray.shape[0:len(nondims)]) -1
2110 s = N.zeros(loopcap+1,N.float_)
2111 while incr(idx,loopcap) <> -1:
2112 s[idx] = asum(1.0/tinarray[idx])
2113 size = N.multiply.reduce(N.take(inarray.shape,dims))
2114 if keepdims == 1:
2115 shp = list(inarray.shape)
2116 for dim in dims:
2117 shp[dim] = 1
2118 s = N.reshape(s,shp)
2119 return size / s
2120
2121
2122 - def amean (inarray,dimension=None,keepdims=0):
2123 """
2124 Calculates the arithmatic mean of the values in the passed array.
2125 That is: 1/n * (x1 + x2 + ... + xn). Defaults to ALL values in the
2126 passed array. Use dimension=None to flatten array first. REMEMBER: if
2127 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2128 if dimension is a sequence, it collapses over all specified dimensions. If
2129 keepdims is set to 1, the resulting array will have as many dimensions as
2130 inarray, with only 1 'level' per dim that was collapsed over.
2131
2132 Usage: amean(inarray,dimension=None,keepdims=0)
2133 Returns: arithematic mean calculated over dim(s) in dimension
2134 """
2135 if inarray.dtype in [N.int_, N.short,N.ubyte]:
2136 inarray = inarray.astype(N.float_)
2137 if dimension == None:
2138 inarray = N.ravel(inarray)
2139 sum = N.add.reduce(inarray)
2140 denom = float(len(inarray))
2141 elif type(dimension) in [IntType,FloatType]:
2142 sum = asum(inarray,dimension)
2143 denom = float(inarray.shape[dimension])
2144 if keepdims == 1:
2145 shp = list(inarray.shape)
2146 shp[dimension] = 1
2147 sum = N.reshape(sum,shp)
2148 else:
2149 dims = list(dimension)
2150 dims.sort()
2151 dims.reverse()
2152 sum = inarray *1.0
2153 for dim in dims:
2154 sum = N.add.reduce(sum,dim)
2155 denom = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.float_)
2156 if keepdims == 1:
2157 shp = list(inarray.shape)
2158 for dim in dims:
2159 shp[dim] = 1
2160 sum = N.reshape(sum,shp)
2161 return sum/denom
2162
2163
2186
2187
2211
2212
2213 - def amode(a, dimension=None):
2214 """
2215 Returns an array of the modal (most common) score in the passed array.
2216 If there is more than one such score, ONLY THE FIRST is returned.
2217 The bin-count for the modal values is also returned. Operates on whole
2218 array (dimension=None), or on a given dimension.
2219
2220 Usage: amode(a, dimension=None)
2221 Returns: array of bin-counts for mode(s), array of corresponding modal values
2222 """
2223
2224 if dimension == None:
2225 a = N.ravel(a)
2226 dimension = 0
2227 scores = pstat.aunique(N.ravel(a))
2228 testshape = list(a.shape)
2229 testshape[dimension] = 1
2230 oldmostfreq = N.zeros(testshape)
2231 oldcounts = N.zeros(testshape)
2232 for score in scores:
2233 template = N.equal(a,score)
2234 counts = asum(template,dimension,1)
2235 mostfrequent = N.where(counts>oldcounts,score,oldmostfreq)
2236 oldcounts = N.where(counts>oldcounts,counts,oldcounts)
2237 oldmostfreq = mostfrequent
2238 return oldcounts, mostfrequent
2239
2240
2241 - def atmean(a,limits=None,inclusive=(1,1)):
2242 """
2243 Returns the arithmetic mean of all values in an array, ignoring values
2244 strictly outside the sequence passed to 'limits'. Note: either limit
2245 in the sequence, or the value of limits itself, can be set to None. The
2246 inclusive list/tuple determines whether the lower and upper limiting bounds
2247 (respectively) are open/exclusive (0) or closed/inclusive (1).
2248
2249 Usage: atmean(a,limits=None,inclusive=(1,1))
2250 """
2251 if a.dtype in [N.int_, N.short,N.ubyte]:
2252 a = a.astype(N.float_)
2253 if limits == None:
2254 return mean(a)
2255 assert type(limits) in [ListType,TupleType,N.ndarray], "Wrong type for limits in atmean"
2256 if inclusive[0]: lowerfcn = N.greater_equal
2257 else: lowerfcn = N.greater
2258 if inclusive[1]: upperfcn = N.less_equal
2259 else: upperfcn = N.less
2260 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2261 raise ValueError, "No array values within given limits (atmean)."
2262 elif limits[0]==None and limits[1]<>None:
2263 mask = upperfcn(a,limits[1])
2264 elif limits[0]<>None and limits[1]==None:
2265 mask = lowerfcn(a,limits[0])
2266 elif limits[0]<>None and limits[1]<>None:
2267 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2268 s = float(N.add.reduce(N.ravel(a*mask)))
2269 n = float(N.add.reduce(N.ravel(mask)))
2270 return s/n
2271
2272
2273 - def atvar(a,limits=None,inclusive=(1,1)):
2274 """
2275 Returns the sample variance of values in an array, (i.e., using N-1),
2276 ignoring values strictly outside the sequence passed to 'limits'.
2277 Note: either limit in the sequence, or the value of limits itself,
2278 can be set to None. The inclusive list/tuple determines whether the lower
2279 and upper limiting bounds (respectively) are open/exclusive (0) or
2280 closed/inclusive (1). ASSUMES A FLAT ARRAY (OR ELSE PREFLATTENS).
2281
2282 Usage: atvar(a,limits=None,inclusive=(1,1))
2283 """
2284 a = a.astype(N.float_)
2285 if limits == None or limits == [None,None]:
2286 return avar(a)
2287 assert type(limits) in [ListType,TupleType,N.ndarray], "Wrong type for limits in atvar"
2288 if inclusive[0]: lowerfcn = N.greater_equal
2289 else: lowerfcn = N.greater
2290 if inclusive[1]: upperfcn = N.less_equal
2291 else: upperfcn = N.less
2292 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2293 raise ValueError, "No array values within given limits (atvar)."
2294 elif limits[0]==None and limits[1]<>None:
2295 mask = upperfcn(a,limits[1])
2296 elif limits[0]<>None and limits[1]==None:
2297 mask = lowerfcn(a,limits[0])
2298 elif limits[0]<>None and limits[1]<>None:
2299 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2300
2301 a = N.compress(mask,a)
2302 return avar(a)
2303
2304
2305 - def atmin(a,lowerlimit=None,dimension=None,inclusive=1):
2306 """
2307 Returns the minimum value of a, along dimension, including only values less
2308 than (or equal to, if inclusive=1) lowerlimit. If the limit is set to None,
2309 all values in the array are used.
2310
2311 Usage: atmin(a,lowerlimit=None,dimension=None,inclusive=1)
2312 """
2313 if inclusive: lowerfcn = N.greater
2314 else: lowerfcn = N.greater_equal
2315 if dimension == None:
2316 a = N.ravel(a)
2317 dimension = 0
2318 if lowerlimit == None:
2319 lowerlimit = N.minimum.reduce(N.ravel(a))-11
2320 biggest = N.maximum.reduce(N.ravel(a))
2321 ta = N.where(lowerfcn(a,lowerlimit),a,biggest)
2322 return N.minimum.reduce(ta,dimension)
2323
2324
2325 - def atmax(a,upperlimit,dimension=None,inclusive=1):
2326 """
2327 Returns the maximum value of a, along dimension, including only values greater
2328 than (or equal to, if inclusive=1) upperlimit. If the limit is set to None,
2329 a limit larger than the max value in the array is used.
2330
2331 Usage: atmax(a,upperlimit,dimension=None,inclusive=1)
2332 """
2333 if inclusive: upperfcn = N.less
2334 else: upperfcn = N.less_equal
2335 if dimension == None:
2336 a = N.ravel(a)
2337 dimension = 0
2338 if upperlimit == None:
2339 upperlimit = N.maximum.reduce(N.ravel(a))+1
2340 smallest = N.minimum.reduce(N.ravel(a))
2341 ta = N.where(upperfcn(a,upperlimit),a,smallest)
2342 return N.maximum.reduce(ta,dimension)
2343
2344
2345 - def atstdev(a,limits=None,inclusive=(1,1)):
2346 """
2347 Returns the standard deviation of all values in an array, ignoring values
2348 strictly outside the sequence passed to 'limits'. Note: either limit
2349 in the sequence, or the value of limits itself, can be set to None. The
2350 inclusive list/tuple determines whether the lower and upper limiting bounds
2351 (respectively) are open/exclusive (0) or closed/inclusive (1).
2352
2353 Usage: atstdev(a,limits=None,inclusive=(1,1))
2354 """
2355 return N.sqrt(tvar(a,limits,inclusive))
2356
2357
2358 - def atsem(a,limits=None,inclusive=(1,1)):
2359 """
2360 Returns the standard error of the mean for the values in an array,
2361 (i.e., using N for the denominator), ignoring values strictly outside
2362 the sequence passed to 'limits'. Note: either limit in the sequence,
2363 or the value of limits itself, can be set to None. The inclusive list/tuple
2364 determines whether the lower and upper limiting bounds (respectively) are
2365 open/exclusive (0) or closed/inclusive (1).
2366
2367 Usage: atsem(a,limits=None,inclusive=(1,1))
2368 """
2369 sd = tstdev(a,limits,inclusive)
2370 if limits == None or limits == [None,None]:
2371 n = float(len(N.ravel(a)))
2372 limits = [min(a)-1, max(a)+1]
2373 assert type(limits) in [ListType,TupleType,N.ndarray], "Wrong type for limits in atsem"
2374 if inclusive[0]: lowerfcn = N.greater_equal
2375 else: lowerfcn = N.greater
2376 if inclusive[1]: upperfcn = N.less_equal
2377 else: upperfcn = N.less
2378 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2379 raise ValueError, "No array values within given limits (atsem)."
2380 elif limits[0]==None and limits[1]<>None:
2381 mask = upperfcn(a,limits[1])
2382 elif limits[0]<>None and limits[1]==None:
2383 mask = lowerfcn(a,limits[0])
2384 elif limits[0]<>None and limits[1]<>None:
2385 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2386 term1 = N.add.reduce(N.ravel(a*a*mask))
2387 n = float(N.add.reduce(N.ravel(mask)))
2388 return sd/math.sqrt(n)
2389
2390
2391
2392
2393
2394
2395 - def amoment(a,moment=1,dimension=None):
2396 """
2397 Calculates the nth moment about the mean for a sample (defaults to the
2398 1st moment). Generally used to calculate coefficients of skewness and
2399 kurtosis. Dimension can equal None (ravel array first), an integer
2400 (the dimension over which to operate), or a sequence (operate over
2401 multiple dimensions).
2402
2403 Usage: amoment(a,moment=1,dimension=None)
2404 Returns: appropriate moment along given dimension
2405 """
2406 if dimension == None:
2407 a = N.ravel(a)
2408 dimension = 0
2409 if moment == 1:
2410 return 0.0
2411 else:
2412 mn = amean(a,dimension,1)
2413 s = N.power((a-mn),moment)
2414 return amean(s,dimension)
2415
2416
2418 """
2419 Returns the coefficient of variation, as defined in CRC Standard
2420 Probability and Statistics, p.6. Dimension can equal None (ravel array
2421 first), an integer (the dimension over which to operate), or a
2422 sequence (operate over multiple dimensions).
2423
2424 Usage: avariation(a,dimension=None)
2425 """
2426 return 100.0*asamplestdev(a,dimension)/amean(a,dimension)
2427
2428
2429 - def askew(a,dimension=None):
2430 """
2431 Returns the skewness of a distribution (normal ==> 0.0; >0 means extra
2432 weight in left tail). Use askewtest() to see if it's close enough.
2433 Dimension can equal None (ravel array first), an integer (the
2434 dimension over which to operate), or a sequence (operate over multiple
2435 dimensions).
2436
2437 Usage: askew(a, dimension=None)
2438 Returns: skew of vals in a along dimension, returning ZERO where all vals equal
2439 """
2440 denom = N.power(amoment(a,2,dimension),1.5)
2441 zero = N.equal(denom,0)
2442 if type(denom) == N.ndarray and asum(zero) <> 0:
2443 print "Number of zeros in askew: ",asum(zero)
2444 denom = denom + zero
2445 return N.where(zero, 0, amoment(a,3,dimension)/denom)
2446
2447
2449 """
2450 Returns the kurtosis of a distribution (normal ==> 3.0; >3 means
2451 heavier in the tails, and usually more peaked). Use akurtosistest()
2452 to see if it's close enough. Dimension can equal None (ravel array
2453 first), an integer (the dimension over which to operate), or a
2454 sequence (operate over multiple dimensions).
2455
2456 Usage: akurtosis(a,dimension=None)
2457 Returns: kurtosis of values in a along dimension, and ZERO where all vals equal
2458 """
2459 denom = N.power(amoment(a,2,dimension),2)
2460 zero = N.equal(denom,0)
2461 if type(denom) == N.ndarray and asum(zero) <> 0:
2462 print "Number of zeros in akurtosis: ",asum(zero)
2463 denom = denom + zero
2464 return N.where(zero,0,amoment(a,4,dimension)/denom)
2465
2466
2468 """
2469 Returns several descriptive statistics of the passed array. Dimension
2470 can equal None (ravel array first), an integer (the dimension over
2471 which to operate), or a sequence (operate over multiple dimensions).
2472
2473 Usage: adescribe(inarray,dimension=None)
2474 Returns: n, (min,max), mean, standard deviation, skew, kurtosis
2475 """
2476 if dimension == None:
2477 inarray = N.ravel(inarray)
2478 dimension = 0
2479 n = inarray.shape[dimension]
2480 mm = (N.minimum.reduce(inarray),N.maximum.reduce(inarray))
2481 m = amean(inarray,dimension)
2482 sd = astdev(inarray,dimension)
2483 skew = askew(inarray,dimension)
2484 kurt = akurtosis(inarray,dimension)
2485 return n, mm, m, sd, skew, kurt
2486
2487
2488
2489
2490
2491
2493 """
2494 Tests whether the skew is significantly different from a normal
2495 distribution. Dimension can equal None (ravel array first), an
2496 integer (the dimension over which to operate), or a sequence (operate
2497 over multiple dimensions).
2498
2499 Usage: askewtest(a,dimension=None)
2500 Returns: z-score and 2-tail z-probability
2501 """
2502 if dimension == None:
2503 a = N.ravel(a)
2504 dimension = 0
2505 b2 = askew(a,dimension)
2506 n = float(a.shape[dimension])
2507 y = b2 * N.sqrt(((n+1)*(n+3)) / (6.0*(n-2)) )
2508 beta2 = ( 3.0*(n*n+27*n-70)*(n+1)*(n+3) ) / ( (n-2.0)*(n+5)*(n+7)*(n+9) )
2509 W2 = -1 + N.sqrt(2*(beta2-1))
2510 delta = 1/N.sqrt(N.log(N.sqrt(W2)))
2511 alpha = N.sqrt(2/(W2-1))
2512 y = N.where(y==0,1,y)
2513 Z = delta*N.log(y/alpha + N.sqrt((y/alpha)**2+1))
2514 return Z, (1.0-zprob(Z))*2
2515
2516
2518 """
2519 Tests whether a dataset has normal kurtosis (i.e.,
2520 kurtosis=3(n-1)/(n+1)) Valid only for n>20. Dimension can equal None
2521 (ravel array first), an integer (the dimension over which to operate),
2522 or a sequence (operate over multiple dimensions).
2523
2524 Usage: akurtosistest(a,dimension=None)
2525 Returns: z-score and 2-tail z-probability, returns 0 for bad pixels
2526 """
2527 if dimension == None:
2528 a = N.ravel(a)
2529 dimension = 0
2530 n = float(a.shape[dimension])
2531 if n<20:
2532 print "akurtosistest only valid for n>=20 ... continuing anyway, n=",n
2533 b2 = akurtosis(a,dimension)
2534 E = 3.0*(n-1) /(n+1)
2535 varb2 = 24.0*n*(n-2)*(n-3) / ((n+1)*(n+1)*(n+3)*(n+5))
2536 x = (b2-E)/N.sqrt(varb2)
2537 sqrtbeta1 = 6.0*(n*n-5*n+2)/((n+7)*(n+9)) * N.sqrt((6.0*(n+3)*(n+5))/
2538 (n*(n-2)*(n-3)))
2539 A = 6.0 + 8.0/sqrtbeta1 *(2.0/sqrtbeta1 + N.sqrt(1+4.0/(sqrtbeta1**2)))
2540 term1 = 1 -2/(9.0*A)
2541 denom = 1 +x*N.sqrt(2/(A-4.0))
2542 denom = N.where(N.less(denom,0), 99, denom)
2543 term2 = N.where(N.equal(denom,0), term1, N.power((1-2.0/A)/denom,1/3.0))
2544 Z = ( term1 - term2 ) / N.sqrt(2/(9.0*A))
2545 Z = N.where(N.equal(denom,99), 0, Z)
2546 return Z, (1.0-zprob(Z))*2
2547
2548
2550 """
2551 Tests whether skew and/OR kurtosis of dataset differs from normal
2552 curve. Can operate over multiple dimensions. Dimension can equal
2553 None (ravel array first), an integer (the dimension over which to
2554 operate), or a sequence (operate over multiple dimensions).
2555
2556 Usage: anormaltest(a,dimension=None)
2557 Returns: z-score and 2-tail probability
2558 """
2559 if dimension == None:
2560 a = N.ravel(a)
2561 dimension = 0
2562 s,p = askewtest(a,dimension)
2563 k,p = akurtosistest(a,dimension)
2564 k2 = N.power(s,2) + N.power(k,2)
2565 return k2, achisqprob(k2,2)
2566
2567
2568
2569
2570
2571
2573 """
2574 Returns a 2D array of item frequencies. Column 1 contains item values,
2575 column 2 contains their respective counts. Assumes a 1D array is passed.
2576 @@@sorting OK?
2577
2578 Usage: aitemfreq(a)
2579 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
2580 """
2581 scores = pstat.aunique(a)
2582 scores = N.sort(scores)
2583 freq = N.zeros(len(scores))
2584 for i in range(len(scores)):
2585 freq[i] = N.add.reduce(N.equal(a,scores[i]))
2586 return N.array(pstat.aabut(scores, freq))
2587
2588
2590 """
2591 Usage: ascoreatpercentile(inarray,percent) 0<percent<100
2592 Returns: score at given percentile, relative to inarray distribution
2593 """
2594 percent = percent / 100.0
2595 targetcf = percent*len(inarray)
2596 h, lrl, binsize, extras = histogram(inarray)
2597 cumhist = cumsum(h*1)
2598 for i in range(len(cumhist)):
2599 if cumhist[i] >= targetcf:
2600 break
2601 score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)
2602 return score
2603
2604
2606 """
2607 Note: result of this function depends on the values used to histogram
2608 the data(!).
2609
2610 Usage: apercentileofscore(inarray,score,histbins=10,defaultlimits=None)
2611 Returns: percentile-position of score (0-100) relative to inarray
2612 """
2613 h, lrl, binsize, extras = histogram(inarray,histbins,defaultlimits)
2614 cumhist = cumsum(h*1)
2615 i = int((score - lrl)/float(binsize))
2616 pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inarray)) * 100
2617 return pct
2618
2619
2620 - def ahistogram (inarray,numbins=10,defaultlimits=None,printextras=1):
2621 """
2622 Returns (i) an array of histogram bin counts, (ii) the smallest value
2623 of the histogram binning, and (iii) the bin width (the last 2 are not
2624 necessarily integers). Default number of bins is 10. Defaultlimits
2625 can be None (the routine picks bins spanning all the numbers in the
2626 inarray) or a 2-sequence (lowerlimit, upperlimit). Returns all of the
2627 following: array of bin values, lowerreallimit, binsize, extrapoints.
2628
2629 Usage: ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1)
2630 Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range)
2631 """
2632 inarray = N.ravel(inarray)
2633 if (defaultlimits <> None):
2634 lowerreallimit = defaultlimits[0]
2635 upperreallimit = defaultlimits[1]
2636 binsize = (upperreallimit-lowerreallimit) / float(numbins)
2637 else:
2638 Min = N.minimum.reduce(inarray)
2639 Max = N.maximum.reduce(inarray)
2640 estbinwidth = float(Max - Min)/float(numbins) + 1e-6
2641 binsize = (Max-Min+estbinwidth)/float(numbins)
2642 lowerreallimit = Min - binsize/2.0
2643 bins = N.zeros(numbins)
2644 extrapoints = 0
2645 for num in inarray:
2646 try:
2647 if (num-lowerreallimit) < 0:
2648 extrapoints = extrapoints + 1
2649 else:
2650 bintoincrement = int((num-lowerreallimit) / float(binsize))
2651 bins[bintoincrement] = bins[bintoincrement] + 1
2652 except:
2653 extrapoints = extrapoints + 1
2654 if (extrapoints > 0 and printextras == 1):
2655 print '\nPoints outside given histogram range =',extrapoints
2656 return (bins, lowerreallimit, binsize, extrapoints)
2657
2658
2659 - def acumfreq(a,numbins=10,defaultreallimits=None):
2660 """
2661 Returns a cumulative frequency histogram, using the histogram function.
2662 Defaultreallimits can be None (use all data), or a 2-sequence containing
2663 lower and upper limits on values to include.
2664
2665 Usage: acumfreq(a,numbins=10,defaultreallimits=None)
2666 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2667 """
2668 h,l,b,e = histogram(a,numbins,defaultreallimits)
2669 cumhist = cumsum(h*1)
2670 return cumhist,l,b,e
2671
2672
2673 - def arelfreq(a,numbins=10,defaultreallimits=None):
2674 """
2675 Returns a relative frequency histogram, using the histogram function.
2676 Defaultreallimits can be None (use all data), or a 2-sequence containing
2677 lower and upper limits on values to include.
2678
2679 Usage: arelfreq(a,numbins=10,defaultreallimits=None)
2680 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2681 """
2682 h,l,b,e = histogram(a,numbins,defaultreallimits)
2683 h = N.array(h/float(a.shape[0]))
2684 return h,l,b,e
2685
2686
2687
2688
2689
2690
2727
2728
2729 - def asamplevar (inarray,dimension=None,keepdims=0):
2730 """
2731 Returns the sample standard deviation of the values in the passed
2732 array (i.e., using N). Dimension can equal None (ravel array first),
2733 an integer (the dimension over which to operate), or a sequence
2734 (operate over multiple dimensions). Set keepdims=1 to return an array
2735 with the same number of dimensions as inarray.
2736
2737 Usage: asamplevar(inarray,dimension=None,keepdims=0)
2738 """
2739 if dimension == None:
2740 inarray = N.ravel(inarray)
2741 dimension = 0
2742 if dimension == 1:
2743 mn = amean(inarray,dimension)[:,N.NewAxis]
2744 else:
2745 mn = amean(inarray,dimension,keepdims=1)
2746 deviations = inarray - mn
2747 if type(dimension) == ListType:
2748 n = 1
2749 for d in dimension:
2750 n = n*inarray.shape[d]
2751 else:
2752 n = inarray.shape[dimension]
2753 svar = ass(deviations,dimension,keepdims) / float(n)
2754 return svar
2755
2756
2758 """
2759 Returns the sample standard deviation of the values in the passed
2760 array (i.e., using N). Dimension can equal None (ravel array first),
2761 an integer (the dimension over which to operate), or a sequence
2762 (operate over multiple dimensions). Set keepdims=1 to return an array
2763 with the same number of dimensions as inarray.
2764
2765 Usage: asamplestdev(inarray,dimension=None,keepdims=0)
2766 """
2767 return N.sqrt(asamplevar(inarray,dimension,keepdims))
2768
2769
2771 """
2772 Calculates signal-to-noise. Dimension can equal None (ravel array
2773 first), an integer (the dimension over which to operate), or a
2774 sequence (operate over multiple dimensions).
2775
2776 Usage: asignaltonoise(instack,dimension=0):
2777 Returns: array containing the value of (mean/stdev) along dimension,
2778 or 0 when stdev=0
2779 """
2780 m = mean(instack,dimension)
2781 sd = stdev(instack,dimension)
2782 return N.where(sd==0,0,m/sd)
2783
2784
2785 - def acov (x,y, dimension=None,keepdims=0):
2786 """
2787 Returns the estimated covariance of the values in the passed
2788 array (i.e., N-1). Dimension can equal None (ravel array first), an
2789 integer (the dimension over which to operate), or a sequence (operate
2790 over multiple dimensions). Set keepdims=1 to return an array with the
2791 same number of dimensions as inarray.
2792
2793 Usage: acov(x,y,dimension=None,keepdims=0)
2794 """
2795 if dimension == None:
2796 x = N.ravel(x)
2797 y = N.ravel(y)
2798 dimension = 0
2799 xmn = amean(x,dimension,1)
2800 xdeviations = x - xmn
2801 ymn = amean(y,dimension,1)
2802 ydeviations = y - ymn
2803 if type(dimension) == ListType:
2804 n = 1
2805 for d in dimension:
2806 n = n*x.shape[d]
2807 else:
2808 n = x.shape[dimension]
2809 covar = N.sum(xdeviations*ydeviations)/float(n-1)
2810 return covar
2811
2812
2813 - def avar (inarray, dimension=None,keepdims=0):
2814 """
2815 Returns the estimated population variance of the values in the passed
2816 array (i.e., N-1). Dimension can equal None (ravel array first), an
2817 integer (the dimension over which to operate), or a sequence (operate
2818 over multiple dimensions). Set keepdims=1 to return an array with the
2819 same number of dimensions as inarray.
2820
2821 Usage: avar(inarray,dimension=None,keepdims=0)
2822 """
2823 if dimension == None:
2824 inarray = N.ravel(inarray)
2825 dimension = 0
2826 mn = amean(inarray,dimension,1)
2827 deviations = inarray - mn
2828 if type(dimension) == ListType:
2829 n = 1
2830 for d in dimension:
2831 n = n*inarray.shape[d]
2832 else:
2833 n = inarray.shape[dimension]
2834 var = ass(deviations,dimension,keepdims)/float(n-1)
2835 return var
2836
2837
2838 - def astdev (inarray, dimension=None, keepdims=0):
2839 """
2840 Returns the estimated population standard deviation of the values in
2841 the passed array (i.e., N-1). Dimension can equal None (ravel array
2842 first), an integer (the dimension over which to operate), or a
2843 sequence (operate over multiple dimensions). Set keepdims=1 to return
2844 an array with the same number of dimensions as inarray.
2845
2846 Usage: astdev(inarray,dimension=None,keepdims=0)
2847 """
2848 return N.sqrt(avar(inarray,dimension,keepdims))
2849
2850
2851 - def asterr (inarray, dimension=None, keepdims=0):
2852 """
2853 Returns the estimated population standard error of the values in the
2854 passed array (i.e., N-1). Dimension can equal None (ravel array
2855 first), an integer (the dimension over which to operate), or a
2856 sequence (operate over multiple dimensions). Set keepdims=1 to return
2857 an array with the same number of dimensions as inarray.
2858
2859 Usage: asterr(inarray,dimension=None,keepdims=0)
2860 """
2861 if dimension == None:
2862 inarray = N.ravel(inarray)
2863 dimension = 0
2864 return astdev(inarray,dimension,keepdims) / float(N.sqrt(inarray.shape[dimension]))
2865
2866
2867 - def asem (inarray, dimension=None, keepdims=0):
2868 """
2869 Returns the standard error of the mean (i.e., using N) of the values
2870 in the passed array. Dimension can equal None (ravel array first), an
2871 integer (the dimension over which to operate), or a sequence (operate
2872 over multiple dimensions). Set keepdims=1 to return an array with the
2873 same number of dimensions as inarray.
2874
2875 Usage: asem(inarray,dimension=None, keepdims=0)
2876 """
2877 if dimension == None:
2878 inarray = N.ravel(inarray)
2879 dimension = 0
2880 if type(dimension) == ListType:
2881 n = 1
2882 for d in dimension:
2883 n = n*inarray.shape[d]
2884 else:
2885 n = inarray.shape[dimension]
2886 s = asamplestdev(inarray,dimension,keepdims) / N.sqrt(n-1)
2887 return s
2888
2889
2890 - def az (a, score):
2891 """
2892 Returns the z-score of a given input score, given thearray from which
2893 that score came. Not appropriate for population calculations, nor for
2894 arrays > 1D.
2895
2896 Usage: az(a, score)
2897 """
2898 z = (score-amean(a)) / asamplestdev(a)
2899 return z
2900
2901
2903 """
2904 Returns a 1D array of z-scores, one for each score in the passed array,
2905 computed relative to the passed array.
2906
2907 Usage: azs(a)
2908 """
2909 zscores = []
2910 for item in a:
2911 zscores.append(z(a,item))
2912 return N.array(zscores)
2913
2914
2915 - def azmap (scores, compare, dimension=0):
2916 """
2917 Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to
2918 array passed to compare (e.g., [time,x,y]). Assumes collapsing over dim 0
2919 of the compare array.
2920
2921 Usage: azs(scores, compare, dimension=0)
2922 """
2923 mns = amean(compare,dimension)
2924 sstd = asamplestdev(compare,0)
2925 return (scores - mns) / sstd
2926
2927
2928
2929
2930
2931
2932
2933
2934 - def athreshold(a,threshmin=None,threshmax=None,newval=0):
2935 """
2936 Like Numeric.clip() except that values <threshmid or >threshmax are replaced
2937 by newval instead of by threshmin/threshmax (respectively).
2938
2939 Usage: athreshold(a,threshmin=None,threshmax=None,newval=0)
2940 Returns: a, with values <threshmin or >threshmax replaced with newval
2941 """
2942 mask = N.zeros(a.shape)
2943 if threshmin <> None:
2944 mask = mask + N.where(a<threshmin,1,0)
2945 if threshmax <> None:
2946 mask = mask + N.where(a>threshmax,1,0)
2947 mask = N.clip(mask,0,1)
2948 return N.where(mask,newval,a)
2949
2950
2952 """
2953 Slices off the passed proportion of items from BOTH ends of the passed
2954 array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND
2955 'rightmost' 10% of scores. You must pre-sort the array if you want
2956 "proper" trimming. Slices off LESS if proportion results in a
2957 non-integer slice index (i.e., conservatively slices off
2958 proportiontocut).
2959
2960 Usage: atrimboth (a,proportiontocut)
2961 Returns: trimmed version of array a
2962 """
2963 lowercut = int(proportiontocut*len(a))
2964 uppercut = len(a) - lowercut
2965 return a[lowercut:uppercut]
2966
2967
2968 - def atrim1 (a,proportiontocut,tail='right'):
2969 """
2970 Slices off the passed proportion of items from ONE end of the passed
2971 array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
2972 10% of scores). Slices off LESS if proportion results in a non-integer
2973 slice index (i.e., conservatively slices off proportiontocut).
2974
2975 Usage: atrim1(a,proportiontocut,tail='right') or set tail='left'
2976 Returns: trimmed version of array a
2977 """
2978 if string.lower(tail) == 'right':
2979 lowercut = 0
2980 uppercut = len(a) - int(proportiontocut*len(a))
2981 elif string.lower(tail) == 'left':
2982 lowercut = int(proportiontocut*len(a))
2983 uppercut = len(a)
2984 return a[lowercut:uppercut]
2985
2986
2987
2988
2989
2990
2992 """
2993 Computes the covariance matrix of a matrix X. Requires a 2D matrix input.
2994
2995 Usage: acovariance(X)
2996 Returns: covariance matrix of X
2997 """
2998 if len(X.shape) <> 2:
2999 raise TypeError, "acovariance requires 2D matrices"
3000 n = X.shape[0]
3001 mX = amean(X,0)
3002 return N.dot(N.transpose(X),X) / float(n) - N.multiply.outer(mX,mX)
3003
3004
3006 """
3007 Computes the correlation matrix of a matrix X. Requires a 2D matrix input.
3008
3009 Usage: acorrelation(X)
3010 Returns: correlation matrix of X
3011 """
3012 C = acovariance(X)
3013 V = N.diagonal(C)
3014 return C / N.sqrt(N.multiply.outer(V,V))
3015
3016
3018 """
3019 Interactively determines the type of data in x and y, and then runs the
3020 appropriated statistic for paired group data.
3021
3022 Usage: apaired(x,y) x,y = the two arrays of values to be compared
3023 Returns: appropriate statistic name, value, and probability
3024 """
3025 samples = ''
3026 while samples not in ['i','r','I','R','c','C']:
3027 print '\nIndependent or related samples, or correlation (i,r,c): ',
3028 samples = raw_input()
3029
3030 if samples in ['i','I','r','R']:
3031 print '\nComparing variances ...',
3032
3033 r = obrientransform(x,y)
3034 f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1))
3035 if p<0.05:
3036 vartype='unequal, p='+str(round(p,4))
3037 else:
3038 vartype='equal'
3039 print vartype
3040 if samples in ['i','I']:
3041 if vartype[0]=='e':
3042 t,p = ttest_ind(x,y,None,0)
3043 print '\nIndependent samples t-test: ', round(t,4),round(p,4)
3044 else:
3045 if len(x)>20 or len(y)>20:
3046 z,p = ranksums(x,y)
3047 print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4)
3048 else:
3049 u,p = mannwhitneyu(x,y)
3050 print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4)
3051
3052 else:
3053 if vartype[0]=='e':
3054 t,p = ttest_rel(x,y,0)
3055 print '\nRelated samples t-test: ', round(t,4),round(p,4)
3056 else:
3057 t,p = ranksums(x,y)
3058 print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4)
3059 else:
3060 corrtype = ''
3061 while corrtype not in ['c','C','r','R','d','D']:
3062 print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
3063 corrtype = raw_input()
3064 if corrtype in ['c','C']:
3065 m,b,r,p,see = linregress(x,y)
3066 print '\nLinear regression for continuous variables ...'
3067 lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]]
3068 pstat.printcc(lol)
3069 elif corrtype in ['r','R']:
3070 r,p = spearmanr(x,y)
3071 print '\nCorrelation for ranked variables ...'
3072 print "Spearman's r: ",round(r,4),round(p,4)
3073 else:
3074 r,p = pointbiserialr(x,y)
3075 print '\nAssuming x contains a dichotomous variable ...'
3076 print 'Point Biserial r: ',round(r,4),round(p,4)
3077 print '\n\n'
3078 return None
3079
3080
3082 """
3083 Calculates Dice's coefficient ... (2*number of common terms)/(number of terms in x +
3084 number of terms in y). Returns a value between 0 (orthogonal) and 1.
3085
3086 Usage: dices(x,y)
3087 """
3088 import sets
3089 x = sets.Set(x)
3090 y = sets.Set(y)
3091 common = len(x.intersection(y))
3092 total = float(len(x) + len(y))
3093 return 2*common/total
3094
3095
3096 - def icc(x,y=None,verbose=0):
3097 """
3098 Calculates intraclass correlation coefficients using simple, Type I sums of squares.
3099 If only one variable is passed, assumed it's an Nx2 matrix
3100
3101 Usage: icc(x,y=None,verbose=0)
3102 Returns: icc rho, prob ####PROB IS A GUESS BASED ON PEARSON
3103 """
3104 TINY = 1.0e-20
3105 if y:
3106 all = N.concatenate([x,y],0)
3107 else:
3108 all = x+0
3109 x = all[:,0]
3110 y = all[:,1]
3111 totalss = ass(all-mean(all))
3112 pairmeans = (x+y)/2.
3113 withinss = ass(x-pairmeans) + ass(y-pairmeans)
3114 withindf = float(len(x))
3115 betwdf = float(len(x)-1)
3116 withinms = withinss / withindf
3117 betweenms = (totalss-withinss) / betwdf
3118 rho = (betweenms-withinms)/(withinms+betweenms)
3119 t = rho*math.sqrt(betwdf/((1.0-rho+TINY)*(1.0+rho+TINY)))
3120 prob = abetai(0.5*betwdf,0.5,betwdf/(betwdf+t*t),verbose)
3121 return rho, prob
3122
3123
3125 """
3126 Calculates Lin's concordance correlation coefficient.
3127
3128 Usage: alincc(x,y) where x, y are equal-length arrays
3129 Returns: Lin's CC
3130 """
3131 x = N.ravel(x)
3132 y = N.ravel(y)
3133 covar = acov(x,y)*(len(x)-1)/float(len(x))
3134 xvar = avar(x)*(len(x)-1)/float(len(x))
3135 yvar = avar(y)*(len(y)-1)/float(len(y))
3136 lincc = (2 * covar) / ((xvar+yvar) +((amean(x)-amean(y))**2))
3137 return lincc
3138
3139
3141 """
3142 Calculates a Pearson correlation coefficient and returns p. Taken
3143 from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195.
3144
3145 Usage: apearsonr(x,y,verbose=1) where x,y are equal length arrays
3146 Returns: Pearson's r, two-tailed p-value
3147 """
3148 TINY = 1.0e-20
3149 n = len(x)
3150 xmean = amean(x)
3151 ymean = amean(y)
3152 r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y)
3153 r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y)))
3154 r = (r_num / r_den)
3155 df = n-2
3156 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
3157 prob = abetai(0.5*df,0.5,df/(df+t*t),verbose)
3158 return r,prob
3159
3160
3162 """
3163 Calculates a Spearman rank-order correlation coefficient. Taken
3164 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
3165
3166 Usage: aspearmanr(x,y) where x,y are equal-length arrays
3167 Returns: Spearman's r, two-tailed p-value
3168 """
3169 TINY = 1e-30
3170 n = len(x)
3171 rankx = rankdata(x)
3172 ranky = rankdata(y)
3173 dsq = N.add.reduce((rankx-ranky)**2)
3174 rs = 1 - 6*dsq / float(n*(n**2-1))
3175 t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
3176 df = n-2
3177 probrs = abetai(0.5*df,0.5,df/(df+t*t))
3178
3179
3180 return rs, probrs
3181
3182
3184 """
3185 Calculates a point-biserial correlation coefficient and the associated
3186 probability value. Taken from Heiman's Basic Statistics for the Behav.
3187 Sci (1st), p.194.
3188
3189 Usage: apointbiserialr(x,y) where x,y are equal length arrays
3190 Returns: Point-biserial r, two-tailed p-value
3191 """
3192 TINY = 1e-30
3193 categories = pstat.aunique(x)
3194 data = pstat.aabut(x,y)
3195 if len(categories) <> 2:
3196 raise ValueError, "Exactly 2 categories required (in x) for pointbiserialr()."
3197 else:
3198 codemap = pstat.aabut(categories,N.arange(2))
3199 recoded = pstat.arecode(data,codemap,0)
3200 x = pstat.alinexand(data,0,categories[0])
3201 y = pstat.alinexand(data,0,categories[1])
3202 xmean = amean(pstat.acolex(x,1))
3203 ymean = amean(pstat.acolex(y,1))
3204 n = len(data)
3205 adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n)))
3206 rpb = (ymean - xmean)/asamplestdev(pstat.acolex(data,1))*adjust
3207 df = n-2
3208 t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY)))
3209 prob = abetai(0.5*df,0.5,df/(df+t*t))
3210 return rpb, prob
3211
3212
3214 """
3215 Calculates Kendall's tau ... correlation of ordinal data. Adapted
3216 from function kendl1 in Numerical Recipies. Needs good test-cases.@@@
3217
3218 Usage: akendalltau(x,y)
3219 Returns: Kendall's tau, two-tailed p-value
3220 """
3221 n1 = 0
3222 n2 = 0
3223 iss = 0
3224 for j in range(len(x)-1):
3225 for k in range(j,len(y)):
3226 a1 = x[j] - x[k]
3227 a2 = y[j] - y[k]
3228 aa = a1 * a2
3229 if (aa):
3230 n1 = n1 + 1
3231 n2 = n2 + 1
3232 if aa > 0:
3233 iss = iss + 1
3234 else:
3235 iss = iss -1
3236 else:
3237 if (a1):
3238 n1 = n1 + 1
3239 else:
3240 n2 = n2 + 1
3241 tau = iss / math.sqrt(n1*n2)
3242 svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1))
3243 z = tau / math.sqrt(svar)
3244 prob = erfcc(abs(z)/1.4142136)
3245 return tau, prob
3246
3247
3249 """
3250 Calculates a regression line on two arrays, x and y, corresponding to x,y
3251 pairs. If a single 2D array is passed, alinregress finds dim with 2 levels
3252 and splits data into x,y pairs along that dim.
3253
3254 Usage: alinregress(*args) args=2 equal-length arrays, or one 2D array
3255 Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate, n
3256 """
3257 TINY = 1.0e-20
3258 if len(args) == 1:
3259 args = args[0]
3260 if len(args) == 2:
3261 x = args[0]
3262 y = args[1]
3263 else:
3264 x = args[:,0]
3265 y = args[:,1]
3266 else:
3267 x = args[0]
3268 y = args[1]
3269 n = len(x)
3270 xmean = amean(x)
3271 ymean = amean(y)
3272 r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y)
3273 r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y)))
3274 r = r_num / r_den
3275 z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY))
3276 df = n-2
3277 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
3278 prob = abetai(0.5*df,0.5,df/(df+t*t))
3279 slope = r_num / (float(n)*ass(x) - asquare_of_sums(x))
3280 intercept = ymean - slope*xmean
3281 sterrest = math.sqrt(1-r*r)*asamplestdev(y)
3282 return slope, intercept, r, prob, sterrest, n
3283
3285 """
3286 Calculates a regression line on one 1D array (x) and one N-D array (y).
3287
3288 Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate, n
3289 """
3290 TINY = 1.0e-20
3291 if len(args) == 1:
3292 args = args[0]
3293 if len(args) == 2:
3294 x = N.ravel(args[0])
3295 y = args[1]
3296 else:
3297 x = N.ravel(args[:,0])
3298 y = args[:,1]
3299 else:
3300 x = args[0]
3301 y = args[1]
3302 x = x.astype(N.float_)
3303 y = y.astype(N.float_)
3304 n = len(x)
3305 xmean = amean(x)
3306 ymean = amean(y,0)
3307 shp = N.ones(len(y.shape))
3308 shp[0] = len(x)
3309 x.shape = shp
3310 print x.shape, y.shape
3311 r_num = n*(N.add.reduce(x*y,0)) - N.add.reduce(x)*N.add.reduce(y,0)
3312 r_den = N.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y,0)-asquare_of_sums(y,0)))
3313 zerodivproblem = N.equal(r_den,0)
3314 r_den = N.where(zerodivproblem,1,r_den)
3315 r = r_num / r_den
3316 r = N.where(zerodivproblem,0.0,r)
3317 z = 0.5*N.log((1.0+r+TINY)/(1.0-r+TINY))
3318 df = n-2
3319 t = r*N.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
3320 prob = abetai(0.5*df,0.5,df/(df+t*t))
3321
3322 ss = float(n)*ass(x)-asquare_of_sums(x)
3323 s_den = N.where(ss==0,1,ss)
3324 slope = r_num / s_den
3325 intercept = ymean - slope*xmean
3326 sterrest = N.sqrt(1-r*r)*asamplestdev(y,0)
3327 return slope, intercept, r, prob, sterrest, n
3328
3329
3330
3331
3332
3333
3334 - def attest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
3335 """
3336 Calculates the t-obtained for the independent samples T-test on ONE group
3337 of scores a, given a population mean. If printit=1, results are printed
3338 to the screen. If printit='filename', the results are output to 'filename'
3339 using the given writemode (default=append). Returns t-value, and prob.
3340
3341 Usage: attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
3342 Returns: t-value, two-tailed prob
3343 """
3344 if type(a) != N.ndarray:
3345 a = N.array(a)
3346 x = amean(a)
3347 v = avar(a)
3348 n = len(a)
3349 df = n-1
3350 svar = ((n-1)*v) / float(df)
3351 t = (x-popmean)/math.sqrt(svar*(1.0/n))
3352 prob = abetai(0.5*df,0.5,df/(df+t*t))
3353
3354 if printit <> 0:
3355 statname = 'Single-sample T-test.'
3356 outputpairedstats(printit,writemode,
3357 'Population','--',popmean,0,0,0,
3358 name,n,x,v,N.minimum.reduce(N.ravel(a)),
3359 N.maximum.reduce(N.ravel(a)),
3360 statname,t,prob)
3361 return t,prob
3362
3363
3364 - def attest_ind (a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2',writemode='a'):
3365 """
3366 Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores
3367 a, and b. From Numerical Recipies, p.483. If printit=1, results are
3368 printed to the screen. If printit='filename', the results are output
3369 to 'filename' using the given writemode (default=append). Dimension
3370 can equal None (ravel array first), or an integer (the dimension over
3371 which to operate on a and b).
3372
3373 Usage: attest_ind (a,b,dimension=None,printit=0,
3374 Name1='Samp1',Name2='Samp2',writemode='a')
3375 Returns: t-value, two-tailed p-value
3376 """
3377 if dimension == None:
3378 a = N.ravel(a)
3379 b = N.ravel(b)
3380 dimension = 0
3381 x1 = amean(a,dimension)
3382 x2 = amean(b,dimension)
3383 v1 = avar(a,dimension)
3384 v2 = avar(b,dimension)
3385 n1 = a.shape[dimension]
3386 n2 = b.shape[dimension]
3387 df = n1+n2-2
3388 svar = ((n1-1)*v1+(n2-1)*v2) / float(df)
3389 zerodivproblem = N.equal(svar,0)
3390 svar = N.where(zerodivproblem,1,svar)
3391 t = (x1-x2)/N.sqrt(svar*(1.0/n1 + 1.0/n2))
3392 t = N.where(zerodivproblem,1.0,t)
3393 probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
3394
3395 if type(t) == N.ndarray:
3396 probs = N.reshape(probs,t.shape)
3397 if probs.shape == (1,):
3398 probs = probs[0]
3399
3400 if printit <> 0:
3401 if type(t) == N.ndarray:
3402 t = t[0]
3403 if type(probs) == N.ndarray:
3404 probs = probs[0]
3405 statname = 'Independent samples T-test.'
3406 outputpairedstats(printit,writemode,
3407 name1,n1,x1,v1,N.minimum.reduce(N.ravel(a)),
3408 N.maximum.reduce(N.ravel(a)),
3409 name2,n2,x2,v2,N.minimum.reduce(N.ravel(b)),
3410 N.maximum.reduce(N.ravel(b)),
3411 statname,t,probs)
3412 return
3413 return t, probs
3414
3415 - def ap2t(pval,df):
3416 """
3417 Tries to compute a t-value from a p-value (or pval array) and associated df.
3418 SLOW for large numbers of elements(!) as it re-computes p-values 20 times
3419 (smaller step-sizes) at which point it decides it's done. Keeps the signs
3420 of the input array. Returns 1000 (or -1000) if t>100.
3421
3422 Usage: ap2t(pval,df)
3423 Returns: an array of t-values with the shape of pval
3424 """
3425 pval = N.array(pval)
3426 signs = N.sign(pval)
3427 pval = abs(pval)
3428 t = N.ones(pval.shape,N.float_)*50
3429 step = N.ones(pval.shape,N.float_)*25
3430 print "Initial ap2t() prob calc"
3431 prob = abetai(0.5*df,0.5,float(df)/(df+t*t))
3432 print 'ap2t() iter: ',
3433 for i in range(10):
3434 print i,' ',
3435 t = N.where(pval<prob,t+step,t-step)
3436 prob = abetai(0.5*df,0.5,float(df)/(df+t*t))
3437 step = step/2
3438 print
3439
3440 t = N.where(t>99.9,1000,t)
3441 t = t+signs
3442 return t
3443
3444
3445 - def attest_rel (a,b,dimension=None,printit=0,name1='Samp1',name2='Samp2',writemode='a'):
3446 """
3447 Calculates the t-obtained T-test on TWO RELATED samples of scores, a
3448 and b. From Numerical Recipies, p.483. If printit=1, results are
3449 printed to the screen. If printit='filename', the results are output
3450 to 'filename' using the given writemode (default=append). Dimension
3451 can equal None (ravel array first), or an integer (the dimension over
3452 which to operate on a and b).
3453
3454 Usage: attest_rel(a,b,dimension=None,printit=0,
3455 name1='Samp1',name2='Samp2',writemode='a')
3456 Returns: t-value, two-tailed p-value
3457 """
3458 if dimension == None:
3459 a = N.ravel(a)
3460 b = N.ravel(b)
3461 dimension = 0
3462 if len(a)<>len(b):
3463 raise ValueError, 'Unequal length arrays.'
3464 x1 = amean(a,dimension)
3465 x2 = amean(b,dimension)
3466 v1 = avar(a,dimension)
3467 v2 = avar(b,dimension)
3468 n = a.shape[dimension]
3469 df = float(n-1)
3470 d = (a-b).astype('d')
3471
3472 denom = N.sqrt((n*N.add.reduce(d*d,dimension) - N.add.reduce(d,dimension)**2) /df)
3473 zerodivproblem = N.equal(denom,0)
3474 denom = N.where(zerodivproblem,1,denom)
3475 t = N.add.reduce(d,dimension) / denom
3476 t = N.where(zerodivproblem,1.0,t)
3477 probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
3478 if type(t) == N.ndarray:
3479 probs = N.reshape(probs,t.shape)
3480 if probs.shape == (1,):
3481 probs = probs[0]
3482
3483 if printit <> 0:
3484 statname = 'Related samples T-test.'
3485 outputpairedstats(printit,writemode,
3486 name1,n,x1,v1,N.minimum.reduce(N.ravel(a)),
3487 N.maximum.reduce(N.ravel(a)),
3488 name2,n,x2,v2,N.minimum.reduce(N.ravel(b)),
3489 N.maximum.reduce(N.ravel(b)),
3490 statname,t,probs)
3491 return
3492 return t, probs
3493
3494
3496 """
3497 Calculates a one-way chi square for array of observed frequencies and returns
3498 the result. If no expected frequencies are given, the total N is assumed to
3499 be equally distributed across all groups.
3500 @@@NOT RIGHT??
3501
3502 Usage: achisquare(f_obs, f_exp=None) f_obs = array of observed cell freq.
3503 Returns: chisquare-statistic, associated p-value
3504 """
3505
3506 k = len(f_obs)
3507 if f_exp == None:
3508 f_exp = N.array([sum(f_obs)/float(k)] * len(f_obs),N.float_)
3509 f_exp = f_exp.astype(N.float_)
3510 chisq = N.add.reduce((f_obs-f_exp)**2 / f_exp)
3511 return chisq, achisqprob(chisq, k-1)
3512
3513
3515 """
3516 Computes the Kolmogorov-Smirnof statistic on 2 samples. Modified from
3517 Numerical Recipies in C, page 493. Returns KS D-value, prob. Not ufunc-
3518 like.
3519
3520 Usage: aks_2samp(data1,data2) where data1 and data2 are 1D arrays
3521 Returns: KS D-value, p-value
3522 """
3523 j1 = 0
3524 j2 = 0
3525 fn1 = 0.0
3526 fn2 = 0.0
3527 n1 = data1.shape[0]
3528 n2 = data2.shape[0]
3529 en1 = n1*1
3530 en2 = n2*1
3531 d = N.zeros(data1.shape[1:],N.float_)
3532 data1 = N.sort(data1,0)
3533 data2 = N.sort(data2,0)
3534 while j1 < n1 and j2 < n2:
3535 d1=data1[j1]
3536 d2=data2[j2]
3537 if d1 <= d2:
3538 fn1 = (j1)/float(en1)
3539 j1 = j1 + 1
3540 if d2 <= d1:
3541 fn2 = (j2)/float(en2)
3542 j2 = j2 + 1
3543 dt = (fn2-fn1)
3544 if abs(dt) > abs(d):
3545 d = dt
3546
3547 en = math.sqrt(en1*en2/float(en1+en2))
3548 prob = aksprob((en+0.12+0.11/en)*N.fabs(d))
3549
3550
3551 return d, prob
3552
3553
3555 """
3556 Calculates a Mann-Whitney U statistic on the provided scores and
3557 returns the result. Use only when the n in each condition is < 20 and
3558 you have 2 independent samples of ranks. REMEMBER: Mann-Whitney U is
3559 significant if the u-obtained is LESS THAN or equal to the critical
3560 value of U.
3561
3562 Usage: amannwhitneyu(x,y) where x,y are arrays of values for 2 conditions
3563 Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
3564 """
3565 n1 = len(x)
3566 n2 = len(y)
3567 ranked = rankdata(N.concatenate((x,y)))
3568 rankx = ranked[0:n1]
3569 ranky = ranked[n1:]
3570 u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx)
3571 u2 = n1*n2 - u1
3572 bigu = max(u1,u2)
3573 smallu = min(u1,u2)
3574 T = math.sqrt(tiecorrect(ranked))
3575 if T == 0:
3576 raise ValueError, 'All numbers are identical in amannwhitneyu'
3577 sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0)
3578 z = abs((bigu-n1*n2/2.0) / sd)
3579 return smallu, 1.0 - azprob(z)
3580
3581
3583 """
3584 Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests.
3585 See Siegel, S. (1956) Nonparametric Statistics for the Behavioral
3586 Sciences. New York: McGraw-Hill. Code adapted from |Stat rankind.c
3587 code.
3588
3589 Usage: atiecorrect(rankvals)
3590 Returns: T correction factor for U or H
3591 """
3592 sorted,posn = ashellsort(N.array(rankvals))
3593 n = len(sorted)
3594 T = 0.0
3595 i = 0
3596 while (i<n-1):
3597 if sorted[i] == sorted[i+1]:
3598 nties = 1
3599 while (i<n-1) and (sorted[i] == sorted[i+1]):
3600 nties = nties +1
3601 i = i +1
3602 T = T + nties**3 - nties
3603 i = i+1
3604 T = T / float(n**3-n)
3605 return 1.0 - T
3606
3607
3609 """
3610 Calculates the rank sums statistic on the provided scores and returns
3611 the result.
3612
3613 Usage: aranksums(x,y) where x,y are arrays of values for 2 conditions
3614 Returns: z-statistic, two-tailed p-value
3615 """
3616 n1 = len(x)
3617 n2 = len(y)
3618 alldata = N.concatenate((x,y))
3619 ranked = arankdata(alldata)
3620 x = ranked[:n1]
3621 y = ranked[n1:]
3622 s = sum(x)
3623 expected = n1*(n1+n2+1) / 2.0
3624 z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0)
3625 prob = 2*(1.0 - azprob(abs(z)))
3626 return z, prob
3627
3628
3630 """
3631 Calculates the Wilcoxon T-test for related samples and returns the
3632 result. A non-parametric T-test.
3633
3634 Usage: awilcoxont(x,y) where x,y are equal-length arrays for 2 conditions
3635 Returns: t-statistic, two-tailed p-value
3636 """
3637 if len(x) <> len(y):
3638 raise ValueError, 'Unequal N in awilcoxont. Aborting.'
3639 d = x-y
3640 d = N.compress(N.not_equal(d,0),d)
3641 count = len(d)
3642 absd = abs(d)
3643 absranked = arankdata(absd)
3644 r_plus = 0.0
3645 r_minus = 0.0
3646 for i in range(len(absd)):
3647 if d[i] < 0:
3648 r_minus = r_minus + absranked[i]
3649 else:
3650 r_plus = r_plus + absranked[i]
3651 wt = min(r_plus, r_minus)
3652 mn = count * (count+1) * 0.25
3653 se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0)
3654 z = math.fabs(wt-mn) / se
3655 z = math.fabs(wt-mn) / se
3656 prob = 2*(1.0 -zprob(abs(z)))
3657 return wt, prob
3658
3659
3661 """
3662 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
3663 groups, requiring at least 5 subjects in each group. This function
3664 calculates the Kruskal-Wallis H and associated p-value for 3 or more
3665 independent samples.
3666
3667 Usage: akruskalwallish(*args) args are separate arrays for 3+ conditions
3668 Returns: H-statistic (corrected for ties), associated p-value
3669 """
3670 assert len(args) == 3, "Need at least 3 groups in stats.akruskalwallish()"
3671 args = list(args)
3672 n = [0]*len(args)
3673 n = map(len,args)
3674 all = []
3675 for i in range(len(args)):
3676 all = all + args[i].tolist()
3677 ranked = rankdata(all)
3678 T = tiecorrect(ranked)
3679 for i in range(len(args)):
3680 args[i] = ranked[0:n[i]]
3681 del ranked[0:n[i]]
3682 rsums = []
3683 for i in range(len(args)):
3684 rsums.append(sum(args[i])**2)
3685 rsums[i] = rsums[i] / float(n[i])
3686 ssbn = sum(rsums)
3687 totaln = sum(n)
3688 h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1)
3689 df = len(args) - 1
3690 if T == 0:
3691 raise ValueError, 'All numbers are identical in akruskalwallish'
3692 h = h / float(T)
3693 return h, chisqprob(h,df)
3694
3695
3697 """
3698 Friedman Chi-Square is a non-parametric, one-way within-subjects
3699 ANOVA. This function calculates the Friedman Chi-square test for
3700 repeated measures and returns the result, along with the associated
3701 probability value. It assumes 3 or more repeated measures. Only 3
3702 levels requires a minimum of 10 subjects in the study. Four levels
3703 requires 5 subjects per level(??).
3704
3705 Usage: afriedmanchisquare(*args) args are separate arrays for 2+ conditions
3706 Returns: chi-square statistic, associated p-value
3707 """
3708 k = len(args)
3709 if k < 3:
3710 raise ValueError, '\nLess than 3 levels. Friedman test not appropriate.\n'
3711 n = len(args[0])
3712 data = apply(pstat.aabut,args)
3713 data = data.astype(N.float_)
3714 for i in range(len(data)):
3715 data[i] = arankdata(data[i])
3716 ssbn = asum(asum(args,1)**2)
3717 chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1)
3718 return chisq, achisqprob(chisq,k-1)
3719
3720
3721
3722
3723
3724
3726 """
3727 Returns the (1-tail) probability value associated with the provided chi-square
3728 value and df. Heavily modified from chisq.c in Gary Perlman's |Stat. Can
3729 handle multiple dimensions.
3730
3731 Usage: achisqprob(chisq,df) chisq=chisquare stat., df=degrees of freedom
3732 """
3733 BIG = 200.0
3734 def ex(x):
3735 BIG = 200.0
3736 exponents = N.where(N.less(x,-BIG),-BIG,x)
3737 return N.exp(exponents)
3738
3739 if type(chisq) == N.ndarray:
3740 arrayflag = 1
3741 else:
3742 arrayflag = 0
3743 chisq = N.array([chisq])
3744 if df < 1:
3745 return N.ones(chisq.shape,N.float)
3746 probs = N.zeros(chisq.shape,N.float_)
3747 probs = N.where(N.less_equal(chisq,0),1.0,probs)
3748 a = 0.5 * chisq
3749 if df > 1:
3750 y = ex(-a)
3751 if df%2 == 0:
3752 even = 1
3753 s = y*1
3754 s2 = s*1
3755 else:
3756 even = 0
3757 s = 2.0 * azprob(-N.sqrt(chisq))
3758 s2 = s*1
3759 if (df > 2):
3760 chisq = 0.5 * (df - 1.0)
3761 if even:
3762 z = N.ones(probs.shape,N.float_)
3763 else:
3764 z = 0.5 *N.ones(probs.shape,N.float_)
3765 if even:
3766 e = N.zeros(probs.shape,N.float_)
3767 else:
3768 e = N.log(N.sqrt(N.pi)) *N.ones(probs.shape,N.float_)
3769 c = N.log(a)
3770 mask = N.zeros(probs.shape)
3771 a_big = N.greater(a,BIG)
3772 a_big_frozen = -1 *N.ones(probs.shape,N.float_)
3773 totalelements = N.multiply.reduce(N.array(probs.shape))
3774 while asum(mask)<>totalelements:
3775 e = N.log(z) + e
3776 s = s + ex(c*z-a-e)
3777 z = z + 1.0
3778
3779 newmask = N.greater(z,chisq)
3780 a_big_frozen = N.where(newmask*N.equal(mask,0)*a_big, s, a_big_frozen)
3781 mask = N.clip(newmask+mask,0,1)
3782 if even:
3783 z = N.ones(probs.shape,N.float_)
3784 e = N.ones(probs.shape,N.float_)
3785 else:
3786 z = 0.5 *N.ones(probs.shape,N.float_)
3787 e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape,N.float_)
3788 c = 0.0
3789 mask = N.zeros(probs.shape)
3790 a_notbig_frozen = -1 *N.ones(probs.shape,N.float_)
3791 while asum(mask)<>totalelements:
3792 e = e * (a/z.astype(N.float_))
3793 c = c + e
3794 z = z + 1.0
3795
3796 newmask = N.greater(z,chisq)
3797 a_notbig_frozen = N.where(newmask*N.equal(mask,0)*(1-a_big),
3798 c*y+s2, a_notbig_frozen)
3799 mask = N.clip(newmask+mask,0,1)
3800 probs = N.where(N.equal(probs,1),1,
3801 N.where(N.greater(a,BIG),a_big_frozen,a_notbig_frozen))
3802 return probs
3803 else:
3804 return s
3805
3806
3808 """
3809 Returns the complementary error function erfc(x) with fractional error
3810 everywhere less than 1.2e-7. Adapted from Numerical Recipies. Can
3811 handle multiple dimensions.
3812
3813 Usage: aerfcc(x)
3814 """
3815 z = abs(x)
3816 t = 1.0 / (1.0+0.5*z)
3817 ans = t * N.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277)))))))))
3818 return N.where(N.greater_equal(x,0), ans, 2.0-ans)
3819
3820
3822 """
3823 Returns the area under the normal curve 'to the left of' the given z value.
3824 Thus,
3825 for z<0, zprob(z) = 1-tail probability
3826 for z>0, 1.0-zprob(z) = 1-tail probability
3827 for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
3828 Adapted from z.c in Gary Perlman's |Stat. Can handle multiple dimensions.
3829
3830 Usage: azprob(z) where z is a z-value
3831 """
3832 def yfunc(y):
3833 x = (((((((((((((-0.000045255659 * y
3834 +0.000152529290) * y -0.000019538132) * y
3835 -0.000676904986) * y +0.001390604284) * y
3836 -0.000794620820) * y -0.002034254874) * y
3837 +0.006549791214) * y -0.010557625006) * y
3838 +0.011630447319) * y -0.009279453341) * y
3839 +0.005353579108) * y -0.002141268741) * y
3840 +0.000535310849) * y +0.999936657524
3841 return x
3842
3843 def wfunc(w):
3844 x = ((((((((0.000124818987 * w
3845 -0.001075204047) * w +0.005198775019) * w
3846 -0.019198292004) * w +0.059054035642) * w
3847 -0.151968751364) * w +0.319152932694) * w
3848 -0.531923007300) * w +0.797884560593) * N.sqrt(w) * 2.0
3849 return x
3850
3851 Z_MAX = 6.0
3852 x = N.zeros(z.shape,N.float_)
3853 y = 0.5 * N.fabs(z)
3854 x = N.where(N.less(y,1.0),wfunc(y*y),yfunc(y-2.0))
3855 x = N.where(N.greater(y,Z_MAX*0.5),1.0,x)
3856 prob = N.where(N.greater(z,0),(x+1)*0.5,(1-x)*0.5)
3857 return prob
3858
3859
3861 """
3862 Returns the probability value for a K-S statistic computed via ks_2samp.
3863 Adapted from Numerical Recipies. Can handle multiple dimensions.
3864
3865 Usage: aksprob(alam)
3866 """
3867 if type(alam) == N.ndarray:
3868 frozen = -1 *N.ones(alam.shape,N.float64)
3869 alam = alam.astype(N.float64)
3870 arrayflag = 1
3871 else:
3872 frozen = N.array(-1.)
3873 alam = N.array(alam,N.float64)
3874 arrayflag = 1
3875 mask = N.zeros(alam.shape)
3876 fac = 2.0 *N.ones(alam.shape,N.float_)
3877 sum = N.zeros(alam.shape,N.float_)
3878 termbf = N.zeros(alam.shape,N.float_)
3879 a2 = N.array(-2.0*alam*alam,N.float64)
3880 totalelements = N.multiply.reduce(N.array(mask.shape))
3881 for j in range(1,201):
3882 if asum(mask) == totalelements:
3883 break
3884 exponents = (a2*j*j)
3885 overflowmask = N.less(exponents,-746)
3886 frozen = N.where(overflowmask,0,frozen)
3887 mask = mask+overflowmask
3888 term = fac*N.exp(exponents)
3889 sum = sum + term
3890 newmask = N.where(N.less_equal(abs(term),(0.001*termbf)) +
3891 N.less(abs(term),1.0e-8*sum), 1, 0)
3892 frozen = N.where(newmask*N.equal(mask,0), sum, frozen)
3893 mask = N.clip(mask+newmask,0,1)
3894 fac = -fac
3895 termbf = abs(term)
3896 if arrayflag:
3897 return N.where(N.equal(frozen,-1), 1.0, frozen)
3898 else:
3899 return N.where(N.equal(frozen,-1), 1.0, frozen)[0]
3900
3901
3902 - def afprob (dfnum, dfden, F):
3903 """
3904 Returns the 1-tailed significance level (p-value) of an F statistic
3905 given the degrees of freedom for the numerator (dfR-dfF) and the degrees
3906 of freedom for the denominator (dfF). Can handle multiple dims for F.
3907
3908 Usage: afprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn
3909 """
3910 if type(F) == N.ndarray:
3911 return abetai(0.5*dfden, 0.5*dfnum, dfden/(1.0*dfden+dfnum*F))
3912 else:
3913 return abetai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F))
3914
3915
3917 """
3918 Evaluates the continued fraction form of the incomplete Beta function,
3919 betai. (Adapted from: Numerical Recipies in C.) Can handle multiple
3920 dimensions for x.
3921
3922 Usage: abetacf(a,b,x,verbose=1)
3923 """
3924 ITMAX = 200
3925 EPS = 3.0e-7
3926
3927 arrayflag = 1
3928 if type(x) == N.ndarray:
3929 frozen = N.ones(x.shape,N.float_) *-1
3930 else:
3931 arrayflag = 0
3932 frozen = N.array([-1])
3933 x = N.array([x])
3934 mask = N.zeros(x.shape)
3935 bm = az = am = 1.0
3936 qab = a+b
3937 qap = a+1.0
3938 qam = a-1.0
3939 bz = 1.0-qab*x/qap
3940 for i in range(ITMAX+1):
3941 if N.sum(N.ravel(N.equal(frozen,-1)))==0:
3942 break
3943 em = float(i+1)
3944 tem = em + em
3945 d = em*(b-em)*x/((qam+tem)*(a+tem))
3946 ap = az + d*am
3947 bp = bz+d*bm
3948 d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem))
3949 app = ap+d*az
3950 bpp = bp+d*bz
3951 aold = az*1
3952 am = ap/bpp
3953 bm = bp/bpp
3954 az = app/bpp
3955 bz = 1.0
3956 newmask = N.less(abs(az-aold),EPS*abs(az))
3957 frozen = N.where(newmask*N.equal(mask,0), az, frozen)
3958 mask = N.clip(mask+newmask,0,1)
3959 noconverge = asum(N.equal(frozen,-1))
3960 if noconverge <> 0 and verbose:
3961 print 'a or b too big, or ITMAX too small in Betacf for ',noconverge,' elements'
3962 if arrayflag:
3963 return frozen
3964 else:
3965 return frozen[0]
3966
3967
3969 """
3970 Returns the gamma function of xx.
3971 Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
3972 Adapted from: Numerical Recipies in C. Can handle multiple dims ... but
3973 probably doesn't normally have to.
3974
3975 Usage: agammln(xx)
3976 """
3977 coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
3978 0.120858003e-2, -0.536382e-5]
3979 x = xx - 1.0
3980 tmp = x + 5.5
3981 tmp = tmp - (x+0.5)*N.log(tmp)
3982 ser = 1.0
3983 for j in range(len(coeff)):
3984 x = x + 1
3985 ser = ser + coeff[j]/x
3986 return -tmp + N.log(2.50662827465*ser)
3987
3988
3989 - def abetai(a,b,x,verbose=1):
3990 """
3991 Returns the incomplete beta function:
3992
3993 I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
3994
3995 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
3996 function of a. The continued fraction formulation is implemented
3997 here, using the betacf function. (Adapted from: Numerical Recipies in
3998 C.) Can handle multiple dimensions.
3999
4000 Usage: abetai(a,b,x,verbose=1)
4001 """
4002 TINY = 1e-15
4003 if type(a) == N.ndarray:
4004 if asum(N.less(x,0)+N.greater(x,1)) <> 0:
4005 raise ValueError, 'Bad x in abetai'
4006 x = N.where(N.equal(x,0),TINY,x)
4007 x = N.where(N.equal(x,1.0),1-TINY,x)
4008
4009 bt = N.where(N.equal(x,0)+N.equal(x,1), 0, -1)
4010 exponents = ( gammln(a+b)-gammln(a)-gammln(b)+a*N.log(x)+b*
4011 N.log(1.0-x) )
4012
4013 exponents = N.where(N.less(exponents,-740),-740,exponents)
4014 bt = N.exp(exponents)
4015 if type(x) == N.ndarray:
4016 ans = N.where(N.less(x,(a+1)/(a+b+2.0)),
4017 bt*abetacf(a,b,x,verbose)/float(a),
4018 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b))
4019 else:
4020 if x<(a+1)/(a+b+2.0):
4021 ans = bt*abetacf(a,b,x,verbose)/float(a)
4022 else:
4023 ans = 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b)
4024 return ans
4025
4026
4027
4028
4029
4030
4031 import LinearAlgebra, operator
4032 LA = LinearAlgebra
4033
4034 - def aglm(data,para):
4035 """
4036 Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken
4037 from:
4038 Peterson et al. Statistical limitations in functional neuroimaging
4039 I. Non-inferential methods and statistical models. Phil Trans Royal Soc
4040 Lond B 354: 1239-1260.
4041
4042 Usage: aglm(data,para)
4043 Returns: statistic, p-value ???
4044 """
4045 if len(para) <> len(data):
4046 print "data and para must be same length in aglm"
4047 return
4048 n = len(para)
4049 p = pstat.aunique(para)
4050 x = N.zeros((n,len(p)))
4051 for l in range(len(p)):
4052 x[:,l] = N.equal(para,p[l])
4053 b = N.dot(N.dot(LA.inv(N.dot(N.transpose(x),x)),
4054 N.transpose(x)),
4055 data)
4056 diffs = (data - N.dot(x,b))
4057 s_sq = 1./(n-len(p)) * N.dot(N.transpose(diffs), diffs)
4058
4059 if len(p) == 2:
4060 c = N.array([1,-1])
4061 df = n-2
4062 fact = asum(1.0/asum(x,0))
4063 t = N.dot(c,b) / N.sqrt(s_sq*fact)
4064 probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
4065 return t, probs
4066
4067
4069 """
4070 Performs a 1-way ANOVA, returning an F-value and probability given
4071 any number of groups. From Heiman, pp.394-7.
4072
4073 Usage: aF_oneway (*args) where *args is 2 or more arrays, one per
4074 treatment group
4075 Returns: f-value, probability
4076 """
4077 na = len(args)
4078 means = [0]*na
4079 vars = [0]*na
4080 ns = [0]*na
4081 alldata = []
4082 tmp = map(N.array,args)
4083 means = map(amean,tmp)
4084 vars = map(avar,tmp)
4085 ns = map(len,args)
4086 alldata = N.concatenate(args)
4087 bign = len(alldata)
4088 sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign))
4089 ssbn = 0
4090 for a in args:
4091 ssbn = ssbn + asquare_of_sums(N.array(a))/float(len(a))
4092 ssbn = ssbn - (asquare_of_sums(alldata)/float(bign))
4093 sswn = sstot-ssbn
4094 dfbn = na-1
4095 dfwn = bign - na
4096 msb = ssbn/float(dfbn)
4097 msw = sswn/float(dfwn)
4098 f = msb/msw
4099 prob = fprob(dfbn,dfwn,f)
4100 return f, prob
4101
4102
4104 """
4105 Returns an F-statistic given the following:
4106 ER = error associated with the null hypothesis (the Restricted model)
4107 EF = error associated with the alternate hypothesis (the Full model)
4108 dfR = degrees of freedom the Restricted model
4109 dfF = degrees of freedom associated with the Restricted model
4110 """
4111 return ((ER-EF)/float(dfR-dfF) / (EF/float(dfF)))
4112
4113
4115 Enum = round(Enum,3)
4116 Eden = round(Eden,3)
4117 dfnum = round(Enum,3)
4118 dfden = round(dfden,3)
4119 f = round(f,3)
4120 prob = round(prob,3)
4121 suffix = ''
4122 if prob < 0.001: suffix = ' ***'
4123 elif prob < 0.01: suffix = ' **'
4124 elif prob < 0.05: suffix = ' *'
4125 title = [['EF/ER','DF','Mean Square','F-value','prob','']]
4126 lofl = title+[[Enum, dfnum, round(Enum/float(dfnum),3), f, prob, suffix],
4127 [Eden, dfden, round(Eden/float(dfden),3),'','','']]
4128 pstat.printcc(lofl)
4129 return
4130
4131
4133 """
4134 Returns an F-statistic given the following:
4135 ER = error associated with the null hypothesis (the Restricted model)
4136 EF = error associated with the alternate hypothesis (the Full model)
4137 dfR = degrees of freedom the Restricted model
4138 dfF = degrees of freedom associated with the Restricted model
4139 where ER and EF are matrices from a multivariate F calculation.
4140 """
4141 if type(ER) in [IntType, FloatType]:
4142 ER = N.array([[ER]])
4143 if type(EF) in [IntType, FloatType]:
4144 EF = N.array([[EF]])
4145 n_um = (LA.det(ER) - LA.det(EF)) / float(dfnum)
4146 d_en = LA.det(EF) / float(dfden)
4147 return n_um / d_en
4148
4149
4150
4151
4152
4153
4155 """
4156 Usage: asign(a)
4157 Returns: array shape of a, with -1 where a<0 and +1 where a>=0
4158 """
4159 a = N.asarray(a)
4160 if ((type(a) == type(1.4)) or (type(a) == type(1))):
4161 return a-a-N.less(a,0)+N.greater(a,0)
4162 else:
4163 return N.zeros(N.shape(a))-N.less(a,0)+N.greater(a,0)
4164
4165
4166 - def asum (a, dimension=None,keepdims=0):
4167 """
4168 An alternative to the Numeric.add.reduce function, which allows one to
4169 (1) collapse over multiple dimensions at once, and/or (2) to retain
4170 all dimensions in the original array (squashing one down to size.
4171 Dimension can equal None (ravel array first), an integer (the
4172 dimension over which to operate), or a sequence (operate over multiple
4173 dimensions). If keepdims=1, the resulting array will have as many
4174 dimensions as the input array.
4175
4176 Usage: asum(a, dimension=None, keepdims=0)
4177 Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1
4178 """
4179 if type(a) == N.ndarray and a.dtype in [N.int_, N.short, N.ubyte]:
4180 a = a.astype(N.float_)
4181 if dimension == None:
4182 s = N.sum(N.ravel(a))
4183 elif type(dimension) in [IntType,FloatType]:
4184 s = N.add.reduce(a, dimension)
4185 if keepdims == 1:
4186 shp = list(a.shape)
4187 shp[dimension] = 1
4188 s = N.reshape(s,shp)
4189 else:
4190 dims = list(dimension)
4191 dims.sort()
4192 dims.reverse()
4193 s = a *1.0
4194 for dim in dims:
4195 s = N.add.reduce(s,dim)
4196 if keepdims == 1:
4197 shp = list(a.shape)
4198 for dim in dims:
4199 shp[dim] = 1
4200 s = N.reshape(s,shp)
4201 return s
4202
4203
4205 """
4206 Returns an array consisting of the cumulative sum of the items in the
4207 passed array. Dimension can equal None (ravel array first), an
4208 integer (the dimension over which to operate), or a sequence (operate
4209 over multiple dimensions, but this last one just barely makes sense).
4210
4211 Usage: acumsum(a,dimension=None)
4212 """
4213 if dimension == None:
4214 a = N.ravel(a)
4215 dimension = 0
4216 if type(dimension) in [ListType, TupleType, N.ndarray]:
4217 dimension = list(dimension)
4218 dimension.sort()
4219 dimension.reverse()
4220 for d in dimension:
4221 a = N.add.accumulate(a,d)
4222 return a
4223 else:
4224 return N.add.accumulate(a,dimension)
4225
4226
4227 - def ass(inarray, dimension=None, keepdims=0):
4228 """
4229 Squares each value in the passed array, adds these squares & returns
4230 the result. Unfortunate function name. :-) Defaults to ALL values in
4231 the array. Dimension can equal None (ravel array first), an integer
4232 (the dimension over which to operate), or a sequence (operate over
4233 multiple dimensions). Set keepdims=1 to maintain the original number
4234 of dimensions.
4235
4236 Usage: ass(inarray, dimension=None, keepdims=0)
4237 Returns: sum-along-'dimension' for (inarray*inarray)
4238 """
4239 if dimension == None:
4240 inarray = N.ravel(inarray)
4241 dimension = 0
4242 return asum(inarray*inarray,dimension,keepdims)
4243
4244
4245 - def asummult (array1,array2,dimension=None,keepdims=0):
4246 """
4247 Multiplies elements in array1 and array2, element by element, and
4248 returns the sum (along 'dimension') of all resulting multiplications.
4249 Dimension can equal None (ravel array first), an integer (the
4250 dimension over which to operate), or a sequence (operate over multiple
4251 dimensions). A trivial function, but included for completeness.
4252
4253 Usage: asummult(array1,array2,dimension=None,keepdims=0)
4254 """
4255 if dimension == None:
4256 array1 = N.ravel(array1)
4257 array2 = N.ravel(array2)
4258 dimension = 0
4259 return asum(array1*array2,dimension,keepdims)
4260
4261
4263 """
4264 Adds the values in the passed array, squares that sum, and returns the
4265 result. Dimension can equal None (ravel array first), an integer (the
4266 dimension over which to operate), or a sequence (operate over multiple
4267 dimensions). If keepdims=1, the returned array will have the same
4268 NUMBER of dimensions as the original.
4269
4270 Usage: asquare_of_sums(inarray, dimension=None, keepdims=0)
4271 Returns: the square of the sum over dim(s) in dimension
4272 """
4273 if dimension == None:
4274 inarray = N.ravel(inarray)
4275 dimension = 0
4276 s = asum(inarray,dimension,keepdims)
4277 if type(s) == N.ndarray:
4278 return s.astype(N.float_)*s
4279 else:
4280 return float(s)*s
4281
4282
4284 """
4285 Takes pairwise differences of the values in arrays a and b, squares
4286 these differences, and returns the sum of these squares. Dimension
4287 can equal None (ravel array first), an integer (the dimension over
4288 which to operate), or a sequence (operate over multiple dimensions).
4289 keepdims=1 means the return shape = len(a.shape) = len(b.shape)
4290
4291 Usage: asumdiffsquared(a,b)
4292 Returns: sum[ravel(a-b)**2]
4293 """
4294 if dimension == None:
4295 inarray = N.ravel(a)
4296 dimension = 0
4297 return asum((a-b)**2,dimension,keepdims)
4298
4299
4301 """
4302 Shellsort algorithm. Sorts a 1D-array.
4303
4304 Usage: ashellsort(inarray)
4305 Returns: sorted-inarray, sorting-index-vector (for original array)
4306 """
4307 n = len(inarray)
4308 svec = inarray *1.0
4309 ivec = range(n)
4310 gap = n/2
4311 while gap >0:
4312 for i in range(gap,n):
4313 for j in range(i-gap,-1,-gap):
4314 while j>=0 and svec[j]>svec[j+gap]:
4315 temp = svec[j]
4316 svec[j] = svec[j+gap]
4317 svec[j+gap] = temp
4318 itemp = ivec[j]
4319 ivec[j] = ivec[j+gap]
4320 ivec[j+gap] = itemp
4321 gap = gap / 2
4322
4323 return svec, ivec
4324
4325
4327 """
4328 Ranks the data in inarray, dealing with ties appropritely. Assumes
4329 a 1D inarray. Adapted from Gary Perlman's |Stat ranksort.
4330
4331 Usage: arankdata(inarray)
4332 Returns: array of length equal to inarray, containing rank scores
4333 """
4334 n = len(inarray)
4335 svec, ivec = ashellsort(inarray)
4336 sumranks = 0
4337 dupcount = 0
4338 newarray = N.zeros(n,N.float_)
4339 for i in range(n):
4340 sumranks = sumranks + i
4341 dupcount = dupcount + 1
4342 if i==n-1 or svec[i] <> svec[i+1]:
4343 averank = sumranks / float(dupcount) + 1
4344 for j in range(i-dupcount+1,i+1):
4345 newarray[ivec[j]] = averank
4346 sumranks = 0
4347 dupcount = 0
4348 return newarray
4349
4350
4352 """
4353 Returns a binary vector, 1=within-subject factor, 0=between. Input
4354 equals the entire data array (i.e., column 1=random factor, last
4355 column = measured values.
4356
4357 Usage: afindwithin(data) data in |Stat format
4358 """
4359 numfact = len(data[0])-2
4360 withinvec = [0]*numfact
4361 for col in range(1,numfact+1):
4362 rows = pstat.linexand(data,col,pstat.unique(pstat.colex(data,1))[0])
4363 if len(pstat.unique(pstat.colex(rows,0))) < len(rows):
4364 withinvec[col-1] = 1
4365 return withinvec
4366
4367
4368
4369
4370
4371
4372
4373
4374
4375 geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)),
4376 (ageometricmean, (N.ndarray,)) )
4377 harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)),
4378 (aharmonicmean, (N.ndarray,)) )
4379 mean = Dispatch ( (lmean, (ListType, TupleType)),
4380 (amean, (N.ndarray,)) )
4381 median = Dispatch ( (lmedian, (ListType, TupleType)),
4382 (amedian, (N.ndarray,)) )
4383 medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)),
4384 (amedianscore, (N.ndarray,)) )
4385 mode = Dispatch ( (lmode, (ListType, TupleType)),
4386 (amode, (N.ndarray,)) )
4387 tmean = Dispatch ( (atmean, (N.ndarray,)) )
4388 tvar = Dispatch ( (atvar, (N.ndarray,)) )
4389 tstdev = Dispatch ( (atstdev, (N.ndarray,)) )
4390 tsem = Dispatch ( (atsem, (N.ndarray,)) )
4391
4392
4393 moment = Dispatch ( (lmoment, (ListType, TupleType)),
4394 (amoment, (N.ndarray,)) )
4395 variation = Dispatch ( (lvariation, (ListType, TupleType)),
4396 (avariation, (N.ndarray,)) )
4397 skew = Dispatch ( (lskew, (ListType, TupleType)),
4398 (askew, (N.ndarray,)) )
4399 kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)),
4400 (akurtosis, (N.ndarray,)) )
4401 describe = Dispatch ( (ldescribe, (ListType, TupleType)),
4402 (adescribe, (N.ndarray,)) )
4403
4404
4405
4406 skewtest = Dispatch ( (askewtest, (ListType, TupleType)),
4407 (askewtest, (N.ndarray,)) )
4408 kurtosistest = Dispatch ( (akurtosistest, (ListType, TupleType)),
4409 (akurtosistest, (N.ndarray,)) )
4410 normaltest = Dispatch ( (anormaltest, (ListType, TupleType)),
4411 (anormaltest, (N.ndarray,)) )
4412
4413
4414 itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)),
4415 (aitemfreq, (N.ndarray,)) )
4416 scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)),
4417 (ascoreatpercentile, (N.ndarray,)) )
4418 percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)),
4419 (apercentileofscore, (N.ndarray,)) )
4420 histogram = Dispatch ( (lhistogram, (ListType, TupleType)),
4421 (ahistogram, (N.ndarray,)) )
4422 cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)),
4423 (acumfreq, (N.ndarray,)) )
4424 relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)),
4425 (arelfreq, (N.ndarray,)) )
4426
4427
4428 obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)),
4429 (aobrientransform, (N.ndarray,)) )
4430 samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)),
4431 (asamplevar, (N.ndarray,)) )
4432 samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)),
4433 (asamplestdev, (N.ndarray,)) )
4434 signaltonoise = Dispatch( (asignaltonoise, (N.ndarray,)),)
4435 var = Dispatch ( (lvar, (ListType, TupleType)),
4436 (avar, (N.ndarray,)) )
4437 stdev = Dispatch ( (lstdev, (ListType, TupleType)),
4438 (astdev, (N.ndarray,)) )
4439 sterr = Dispatch ( (lsterr, (ListType, TupleType)),
4440 (asterr, (N.ndarray,)) )
4441 sem = Dispatch ( (lsem, (ListType, TupleType)),
4442 (asem, (N.ndarray,)) )
4443 z = Dispatch ( (lz, (ListType, TupleType)),
4444 (az, (N.ndarray,)) )
4445 zs = Dispatch ( (lzs, (ListType, TupleType)),
4446 (azs, (N.ndarray,)) )
4447
4448
4449 threshold = Dispatch( (athreshold, (N.ndarray,)),)
4450 trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)),
4451 (atrimboth, (N.ndarray,)) )
4452 trim1 = Dispatch ( (ltrim1, (ListType, TupleType)),
4453 (atrim1, (N.ndarray,)) )
4454
4455
4456 paired = Dispatch ( (lpaired, (ListType, TupleType)),
4457 (apaired, (N.ndarray,)) )
4458 lincc = Dispatch ( (llincc, (ListType, TupleType)),
4459 (alincc, (N.ndarray,)) )
4460 pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)),
4461 (apearsonr, (N.ndarray,)) )
4462 spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)),
4463 (aspearmanr, (N.ndarray,)) )
4464 pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)),
4465 (apointbiserialr, (N.ndarray,)) )
4466 kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)),
4467 (akendalltau, (N.ndarray,)) )
4468 linregress = Dispatch ( (llinregress, (ListType, TupleType)),
4469 (alinregress, (N.ndarray,)) )
4470
4471
4472 ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)),
4473 (attest_1samp, (N.ndarray,)) )
4474 ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)),
4475 (attest_ind, (N.ndarray,)) )
4476 ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)),
4477 (attest_rel, (N.ndarray,)) )
4478 chisquare = Dispatch ( (lchisquare, (ListType, TupleType)),
4479 (achisquare, (N.ndarray,)) )
4480 ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)),
4481 (aks_2samp, (N.ndarray,)) )
4482 mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)),
4483 (amannwhitneyu, (N.ndarray,)) )
4484 tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)),
4485 (atiecorrect, (N.ndarray,)) )
4486 ranksums = Dispatch ( (lranksums, (ListType, TupleType)),
4487 (aranksums, (N.ndarray,)) )
4488 wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)),
4489 (awilcoxont, (N.ndarray,)) )
4490 kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)),
4491 (akruskalwallish, (N.ndarray,)) )
4492 friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)),
4493 (afriedmanchisquare, (N.ndarray,)) )
4494
4495
4496 chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)),
4497 (achisqprob, (N.ndarray,)) )
4498 zprob = Dispatch ( (lzprob, (IntType, FloatType)),
4499 (azprob, (N.ndarray,)) )
4500 ksprob = Dispatch ( (lksprob, (IntType, FloatType)),
4501 (aksprob, (N.ndarray,)) )
4502 fprob = Dispatch ( (lfprob, (IntType, FloatType)),
4503 (afprob, (N.ndarray,)) )
4504 betacf = Dispatch ( (lbetacf, (IntType, FloatType)),
4505 (abetacf, (N.ndarray,)) )
4506 betai = Dispatch ( (lbetai, (IntType, FloatType)),
4507 (abetai, (N.ndarray,)) )
4508 erfcc = Dispatch ( (lerfcc, (IntType, FloatType)),
4509 (aerfcc, (N.ndarray,)) )
4510 gammln = Dispatch ( (lgammln, (IntType, FloatType)),
4511 (agammln, (N.ndarray,)) )
4512
4513
4514 F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)),
4515 (aF_oneway, (N.ndarray,)) )
4516 F_value = Dispatch ( (lF_value, (ListType, TupleType)),
4517 (aF_value, (N.ndarray,)) )
4518
4519
4520 incr = Dispatch ( (lincr, (ListType, TupleType, N.ndarray)), )
4521 sum = Dispatch ( (lsum, (ListType, TupleType)),
4522 (asum, (N.ndarray,)) )
4523 cumsum = Dispatch ( (lcumsum, (ListType, TupleType)),
4524 (acumsum, (N.ndarray,)) )
4525 ss = Dispatch ( (lss, (ListType, TupleType)),
4526 (ass, (N.ndarray,)) )
4527 summult = Dispatch ( (lsummult, (ListType, TupleType)),
4528 (asummult, (N.ndarray,)) )
4529 square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)),
4530 (asquare_of_sums, (N.ndarray,)) )
4531 sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)),
4532 (asumdiffsquared, (N.ndarray,)) )
4533 shellsort = Dispatch ( (lshellsort, (ListType, TupleType)),
4534 (ashellsort, (N.ndarray,)) )
4535 rankdata = Dispatch ( (lrankdata, (ListType, TupleType)),
4536 (arankdata, (N.ndarray,)) )
4537 findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)),
4538 (afindwithin, (N.ndarray,)) )
4539
4540
4541
4542
4543
4544 except ImportError:
4545 pass
4546