Input format for the Kruskal-Wallis test in Python

Question

Input format for the Kruskal-Wallis test in Python

I compare regions in DNA with structural fractures in cancer patients and healthy people. I am trying to run the Kruskal-Wallis test (SciPy Stats) on the number of breaks for each region to find out if there is a difference between the two distributions. I am not sure if the input for Kruskal-Wallis should be an array (documentation) or a list of arrays (elsewhere on the Internet).

First I tried an array for sample + control as follows:

controls = ['1', '2', '3', '4', '5']
samples = ['10', '20', '30', '40', '50']
n=0
for item in controls:
    array_item = np.array([item, samples[n]])
    kw_test = stats.mstats.kruskalwallis(array_item)
    print(kw_test)
    n+=1

This gave me the following result for all elements:

(0.0, nan)

I also tried converting individual data points into arrays and then running a KW test.

controls = ['1', '2', '3', '4', '5']
samples = ['10', '20', '30', '40', '50']
n=0
kw_results = []
for item in controls:
    array_controls = np.array([item])
    array_samples = np.array([samples[n]])
    kw_test = stats.mstats.kruskalwallis(array_samples, array_controls)
    kw_results.append(kw_test)
    n+=1
print(kw_results)

This gave (1.0, 0.31731050786291404)for all comparisons, even when I changed one of the lists a lot.

, , , , ( , ) "(0.0, nan)", .

controls = ['1', '2', '3', '4', '5']
samples = ['10', '20', '30', '40', '50']
list_ = []
n=0
for item in controls:
    array_item = np.array([item, samples[n]])
    list_.append(array_item)
    n+=1
kw_test = stats.mstats.kruskalwallis(list_)
print(kw_test)

:

TypeError: Not implemented for this type

, / , , - !

+4

python arrays kruskal-wallis

Annevv 21 '15 12:47

3

Osian · Answer 1 · 2015-07-25T11:51:04+0000

scipy.stats.mstats.kruskalwallis . .

CSV , - :

import pandas
from scipy.stats import mstats

Data = pandas.read_csv("CSVfile.csv")
Col_1 = Data['Colname1']
Col_2 = Data['Colname2']
Col_3 = Data['Colname3']
Col_4 = Data['Colname4']

print("Kruskal Wallis H-test test:")

H, pval = mstats.kruskalwallis(Col_1, Col_2, Col_3, Col_4)

print("H-statistic:", H)
print("P-Value:", pval)

if pval < 0.05:
    print("Reject NULL hypothesis - Significant differences exist between groups.")
if pval > 0.05:
    print("Accept NULL hypothesis - No significant difference between groups.")

Patrick · Answer 2 · 2016-03-03T16:35:50+0000

. , , , , . Osian, .

import pandas, sys
from scipy.stats import mstats

Data = pandas.read_csv(sys.argv[1], index_col=0, sep='\t')
H, pval = mstats.kruskalwallis([Data[col] for col in Data.columns])


print "H-statistic:\t%s\nP-value:\t%s" % (str(H),str(pval))
if pval < 0.05:
    print("Reject NULL hypothesis - Significant differences exist between groups.")
if pval > 0.05:
    print("Accept NULL hypothesis - No significant difference between groups.")

Ricecakes · Answer 3 · 2017-11-29T00:26:46+0000

, , p , , .

kruskal, , mstats.kruskalwallis(* args). ( Kruskal-Wallis H-test)

import pandas, sys
from scipy.stats import mstats

H, pval = mstats.kruskalwallis(*args)
controls = ['1', '2', '3', '4', '5']
samples = ['10', '20', '30', '70', '50']
n=0
kw_results = []
list_ = []
for item in controls:
    array_item = np.array([item, samples[n]])
    list_.append(array_item)
    n+=1
args=[l for l in list_]
kw_test =  mstats.kruskalwallis(*args)
print(kw_results)

, , kruskal, * args.

import pandas, sys
from scipy.stats import mstats

Data = pandas.read_csv(sys.argv[1], index_col=0, sep='\t')
args = [Data[col] for col in Data.columns]
H, pval = mstats.kruskalwallis(*args)

Input format for the Kruskal-Wallis test in Python

More articles: