Correlation Case vs. Gene Expression Control

I have data on gene expression in 77 cancer patients. I have one set of patients blood, one set of patients' tumors and one set of healing tissue:

data1 <- ExpressionBlood
data2 <- ExpressionCancerTissue
data3 <- ExpressionHealtyTissue

I would like to analyze if the expression in the tumor tissue correlated with the expression in the blood for all my genes. What is the best way to do this?

+4
source share
1 answer

If you are familiar with python, I would use pandas . It uses "DataFrames" similar to R, so you can take the concept and apply it to R.

, data1 - , :

GeneName | ExpValue |
gene1       300.0
gene2       250.0

, DataFrame:

dfblood = pd.read_csv('path/to/data1',delimiter='\t')
dftissue = pd.read_csv('path/to/data2',delimiter='\t')
dftumor = pd.read_csv('path/to/data3',delimiter='\t')

merge DataFrame df.

dftmp = pd.merge(dfblood,dftissue,on='GeneName',how='inner')
df = pd.merge(dftmp,dftumor,on='GeneName',how='inner')

, , .

df.columns = ['GeneName','blood','tissue','tumor']

( ) .

df = df.set_index('GeneName') # allows you to perform computations on the entire dataset
df_norm = (df - df.mean()) / (df.max() - df.min())

df_norm.corr() . numpy , .

          blood      tissue       tumor
blood   1.000000    0.395160    0.581629
tissue  0.395160    1.000000    0.840973
tumor   0.581629    0.840973    1.000000

HTH .

Student T, , numpy.log

import numpy as np

df[['blood','tissue','tumor']] = df[['blood','tissue','tumor']]+1
# +1 to avoid taking the log of 0
df_log = np.log(df[['blood','tissue','tumor']])

, df_log DataFrame.

df_log['logFCBloodTumor'] = df_log['blood'] - df_log['tumor']
df_log['logFCBloodTissue'] = df_log['blood'] - df_log['tissue']
+3

Source: https://habr.com/ru/post/1624414/


All Articles