My initial observations look like this:
name analyte
spring 0.1
winter 0.4
To calculate the p-value, I did a self-tuning simulation:
name analyte
spring 0.001
winter 0
spring 0
winter 0.2
spring 0.03
winter 0
spring 0.01
winter 0.02
spring 0.1
winter 0.5
spring 0
winter 0.04
spring 0.2
winter 0
spring 0
winter 0.06
spring 0
winter 0
.....
Now I want to calculate the empirical value of p: in the initial data, winter Analyte = 0.4 - if the winter analyte was analyzed in the downloaded data> = 0.4 (for example, 1 time), and the download was started (for example, 100 times), then the empirical Value p for winter analyte is calculated:
1/100 = 0.01
(How many times the data was the same or higher than in the original data divided by the total number of observations) For spring, the analyte p-value:
2/100 = 0.02
I want to calculate these p values ββwith awk. My solution for spring:
awk -v VAR="spring" '($1==VAR && $2>=0.1) {n++} END {print VAR,"p-value=",n/100}'
spring p-value = 0.02 I need help to transfer the source file (with the names spring and winter and their analytes, observations and the number of observations) to awk and assign them.
source share