acgc.stats.partial_corr
Partial Correlation in Python
Clone of Matlab's partialcorr, written by Fabian Pedregosa-Izquierdo, f@bianp.net
1#!/usr/bin/env python3 2# -*- coding: utf-8 -*- 3"""Partial Correlation in Python 4 5Clone of Matlab's partialcorr, written by Fabian Pedregosa-Izquierdo, f@bianp.net 6""" 7 8import numpy as np 9from scipy import stats, linalg 10 11__all__ = ['partial_corr'] 12 13def partial_corr(C): 14 """Returns the sample linear partial correlation coefficients between pairs of variables in C, 15 controlling for the remaining variables in C. 16 17 This uses the linear regression approach to compute the partial 18 correlation (might be slow for a huge number of variables). The 19 algorithm is detailed here: 20 http://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression 21 Taking X and Y two variables of interest and Z the matrix with all the variable minus {X, Y}, 22 the algorithm can be summarized as 23 1) perform a normal linear least-squares regression with X as the target and Z as the predictor 24 2) calculate the residuals in Step #1 25 3) perform a normal linear least-squares regression with Y as the target and Z as the predictor 26 4) calculate the residuals in Step #3 27 5) calculate the correlation coefficient between the residuals from Steps #2 and #4; 28 The result is the partial correlation between X and Y while controlling for the effect of Z 29 30 Parameters 31 ---------- 32 C : array-like, (n, p) 33 Array with the different variables. Each column of C is taken as a variable 34 Returns 35 ------- 36 P : array-like, (p, p) 37 P[i, j] contains the partial correlation of C[:, i] and C[:, j] controlling 38 for the remaining variables in C. 39 """ 40 41 #C = np.column_stack([C0, np.ones(C0.shape[0])]) 42 43 C = np.asarray(C) 44 p = C.shape[1] 45 P_corr = np.zeros((p, p), dtype=np.float) 46 for i in range(p): 47 P_corr[i, i] = 1 48 for j in range(i+1, p): 49 idx = np.ones(p, dtype=np.bool) 50 idx[i] = False 51 idx[j] = False 52 beta_i = linalg.lstsq(C[:, idx], C[:, j])[0] 53 beta_j = linalg.lstsq(C[:, idx], C[:, i])[0] 54 55 res_j = C[:, j] - C[:, idx].dot( beta_i) 56 res_i = C[:, i] - C[:, idx].dot(beta_j) 57 58 corr = stats.pearsonr(res_i, res_j)[0] 59 P_corr[i, j] = corr 60 P_corr[j, i] = corr 61 62 return P_corr
14def partial_corr(C): 15 """Returns the sample linear partial correlation coefficients between pairs of variables in C, 16 controlling for the remaining variables in C. 17 18 This uses the linear regression approach to compute the partial 19 correlation (might be slow for a huge number of variables). The 20 algorithm is detailed here: 21 http://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression 22 Taking X and Y two variables of interest and Z the matrix with all the variable minus {X, Y}, 23 the algorithm can be summarized as 24 1) perform a normal linear least-squares regression with X as the target and Z as the predictor 25 2) calculate the residuals in Step #1 26 3) perform a normal linear least-squares regression with Y as the target and Z as the predictor 27 4) calculate the residuals in Step #3 28 5) calculate the correlation coefficient between the residuals from Steps #2 and #4; 29 The result is the partial correlation between X and Y while controlling for the effect of Z 30 31 Parameters 32 ---------- 33 C : array-like, (n, p) 34 Array with the different variables. Each column of C is taken as a variable 35 Returns 36 ------- 37 P : array-like, (p, p) 38 P[i, j] contains the partial correlation of C[:, i] and C[:, j] controlling 39 for the remaining variables in C. 40 """ 41 42 #C = np.column_stack([C0, np.ones(C0.shape[0])]) 43 44 C = np.asarray(C) 45 p = C.shape[1] 46 P_corr = np.zeros((p, p), dtype=np.float) 47 for i in range(p): 48 P_corr[i, i] = 1 49 for j in range(i+1, p): 50 idx = np.ones(p, dtype=np.bool) 51 idx[i] = False 52 idx[j] = False 53 beta_i = linalg.lstsq(C[:, idx], C[:, j])[0] 54 beta_j = linalg.lstsq(C[:, idx], C[:, i])[0] 55 56 res_j = C[:, j] - C[:, idx].dot( beta_i) 57 res_i = C[:, i] - C[:, idx].dot(beta_j) 58 59 corr = stats.pearsonr(res_i, res_j)[0] 60 P_corr[i, j] = corr 61 P_corr[j, i] = corr 62 63 return P_corr
Returns the sample linear partial correlation coefficients between pairs of variables in C, controlling for the remaining variables in C.
This uses the linear regression approach to compute the partial correlation (might be slow for a huge number of variables). The algorithm is detailed here: http://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression Taking X and Y two variables of interest and Z the matrix with all the variable minus {X, Y}, the algorithm can be summarized as 1) perform a normal linear least-squares regression with X as the target and Z as the predictor 2) calculate the residuals in Step #1 3) perform a normal linear least-squares regression with Y as the target and Z as the predictor 4) calculate the residuals in Step #3 5) calculate the correlation coefficient between the residuals from Steps #2 and #4; The result is the partial correlation between X and Y while controlling for the effect of Z
Parameters
- C (array-like, (n, p)): Array with the different variables. Each column of C is taken as a variable
Returns
- P (array-like, (p, p)): P[i, j] contains the partial correlation of C[:, i] and C[:, j] controlling for the remaining variables in C.