acgc.stats.partial_corr

Partial Correlation in Python

Clone of Matlab's partialcorr, written by Fabian Pedregosa-Izquierdo, f@bianp.net

 1#!/usr/bin/env python3
 2# -*- coding: utf-8 -*-
 3"""Partial Correlation in Python 
 4
 5Clone of Matlab's partialcorr, written by Fabian Pedregosa-Izquierdo, f@bianp.net
 6"""
 7
 8import numpy as np
 9from scipy import stats, linalg
10
11__all__ = ['partial_corr']
12
13def partial_corr(C):
14    """Returns the sample linear partial correlation coefficients between pairs of variables in C, 
15    controlling for the remaining variables in C.
16
17    This uses the linear regression approach to compute the partial 
18    correlation (might be slow for a huge number of variables). The 
19    algorithm is detailed here:
20    http://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression
21    Taking X and Y two variables of interest and Z the matrix with all the variable minus {X, Y},
22    the algorithm can be summarized as
23        1) perform a normal linear least-squares regression with X as the target and Z as the predictor
24        2) calculate the residuals in Step #1
25        3) perform a normal linear least-squares regression with Y as the target and Z as the predictor
26        4) calculate the residuals in Step #3
27        5) calculate the correlation coefficient between the residuals from Steps #2 and #4; 
28    The result is the partial correlation between X and Y while controlling for the effect of Z
29
30    Parameters
31    ----------
32    C : array-like, (n, p)
33        Array with the different variables. Each column of C is taken as a variable
34    Returns
35    -------
36    P : array-like, (p, p)
37        P[i, j] contains the partial correlation of C[:, i] and C[:, j] controlling
38        for the remaining variables in C.
39    """
40
41    #C = np.column_stack([C0, np.ones(C0.shape[0])])
42
43    C = np.asarray(C)
44    p = C.shape[1]
45    P_corr = np.zeros((p, p), dtype=np.float)
46    for i in range(p):
47        P_corr[i, i] = 1
48        for j in range(i+1, p):
49            idx = np.ones(p, dtype=np.bool)
50            idx[i] = False
51            idx[j] = False
52            beta_i = linalg.lstsq(C[:, idx], C[:, j])[0]
53            beta_j = linalg.lstsq(C[:, idx], C[:, i])[0]
54
55            res_j = C[:, j] - C[:, idx].dot( beta_i)
56            res_i = C[:, i] - C[:, idx].dot(beta_j)
57
58            corr = stats.pearsonr(res_i, res_j)[0]
59            P_corr[i, j] = corr
60            P_corr[j, i] = corr
61
62    return P_corr
def partial_corr(C):
14def partial_corr(C):
15    """Returns the sample linear partial correlation coefficients between pairs of variables in C, 
16    controlling for the remaining variables in C.
17
18    This uses the linear regression approach to compute the partial 
19    correlation (might be slow for a huge number of variables). The 
20    algorithm is detailed here:
21    http://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression
22    Taking X and Y two variables of interest and Z the matrix with all the variable minus {X, Y},
23    the algorithm can be summarized as
24        1) perform a normal linear least-squares regression with X as the target and Z as the predictor
25        2) calculate the residuals in Step #1
26        3) perform a normal linear least-squares regression with Y as the target and Z as the predictor
27        4) calculate the residuals in Step #3
28        5) calculate the correlation coefficient between the residuals from Steps #2 and #4; 
29    The result is the partial correlation between X and Y while controlling for the effect of Z
30
31    Parameters
32    ----------
33    C : array-like, (n, p)
34        Array with the different variables. Each column of C is taken as a variable
35    Returns
36    -------
37    P : array-like, (p, p)
38        P[i, j] contains the partial correlation of C[:, i] and C[:, j] controlling
39        for the remaining variables in C.
40    """
41
42    #C = np.column_stack([C0, np.ones(C0.shape[0])])
43
44    C = np.asarray(C)
45    p = C.shape[1]
46    P_corr = np.zeros((p, p), dtype=np.float)
47    for i in range(p):
48        P_corr[i, i] = 1
49        for j in range(i+1, p):
50            idx = np.ones(p, dtype=np.bool)
51            idx[i] = False
52            idx[j] = False
53            beta_i = linalg.lstsq(C[:, idx], C[:, j])[0]
54            beta_j = linalg.lstsq(C[:, idx], C[:, i])[0]
55
56            res_j = C[:, j] - C[:, idx].dot( beta_i)
57            res_i = C[:, i] - C[:, idx].dot(beta_j)
58
59            corr = stats.pearsonr(res_i, res_j)[0]
60            P_corr[i, j] = corr
61            P_corr[j, i] = corr
62
63    return P_corr

Returns the sample linear partial correlation coefficients between pairs of variables in C, controlling for the remaining variables in C.

This uses the linear regression approach to compute the partial correlation (might be slow for a huge number of variables). The algorithm is detailed here: http://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression Taking X and Y two variables of interest and Z the matrix with all the variable minus {X, Y}, the algorithm can be summarized as 1) perform a normal linear least-squares regression with X as the target and Z as the predictor 2) calculate the residuals in Step #1 3) perform a normal linear least-squares regression with Y as the target and Z as the predictor 4) calculate the residuals in Step #3 5) calculate the correlation coefficient between the residuals from Steps #2 and #4; The result is the partial correlation between X and Y while controlling for the effect of Z

Parameters
  • C (array-like, (n, p)): Array with the different variables. Each column of C is taken as a variable
Returns
  • P (array-like, (p, p)): P[i, j] contains the partial correlation of C[:, i] and C[:, j] controlling for the remaining variables in C.