How to calculate the sum of numbers in lines in a space delimited file?

I have a space delimited file that looks like this:

probeset_id submitted_id chr snp_pos alleleA alleleB 562_201 562_202 562_203 562_204 562_205 562_206 562_207 562_208 562_209 562_210 562_211 562_212 562_213 562_214 562_215 562_216 562_217 562_218 562_219 562_220 562_221 562_222 562_223 562_224 562_225 562_226 562_227 562_228 562_229 562_230 562_231 562_232 562_233 562_234 562_235 562_236 562_237 562_238 562_239 562_240 562_241 562_242 562_243 562_244 562_245 562_246 562_247 562_248 562_249 562_250 562_251 562_252 562_253 562_254 562_255 562_256 562_257 562_258 562_259 562_260 562_261 562_262 562_263 562_264 562_265 562_266 562_267 562_268 562_269 562_270 562_271 562_272 562_273 562_274 562_275 562_276 562_277 562_278 562_279 562_280 562_281 562_283 562_284 562_285 562_289 562_291 562_292 562_294 562_295 562_296 562_400 562_401 562_402 562_403 562_404 562_405 AX-75448119 Chr1_41908741 1 41908741 TC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 0 1 0 0 0 0 2 2 0 0 0 0 0 1 0 0 0 0 0 AX-75448118 Chr1_41908545 1 41908545 TC 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 1 2 2 2 2 2 2 2 2 2 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 2 2 2 0 1 1 1 2 -1 1 2 0 0 2 1 1 0 1 0 1 2 1 0 0 1 2 2 1 2 2 0 1 2 2 2 2 2 2 0 1 0 0 0 1 2 2 2 2 0 

what I would like to do is to have the sum of all the numbers in each line, and if there is a negative number (only -1 exists), just ignore it, so I would like to have this as a result:

 AX-75448119 Chr1_41908741 1 41908741 TC 13 

(which is 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 2 + 2 + 1)

and

 AX-75448118 Chr1_41908545 1 41908545 TC 98 

which in this case -1 is ignored!

I was thinking about using awk on Linux, which I usually use for a space delimited file, but I only know how to use it for columns, not rows

+4
source share
4 answers

Perhaps this is what you are looking for using pure awk .

 awk 'NR >=2 {for (i=7;i<=NF;i++) if ($i !~ /^-/) sum += $i; print $1,$2,$3,$4,$5,$6,sum; sum = 0}' data.txt 

Output:

 AX-75448119 Chr1_41908741 1 41908741 TC 13 AX-75448118 Chr1_41908545 1 41908545 TC 98 
+2
source

I would suggest a Perl script:

 #!/usr/bin/env perl while(<>) { my ($line,$sum,$next); # repeat while there are two (or more) integers after the "... TC" prefix: while (/^(AX-\d+\s+\S+\s+\d+\s+\d+\s+\w+\s+\w+\s+)(\d+)\s+(-?\d+)/) { $line = $1; $sum = $2; $next = $3; $sum += $next if ($next > 0); # do not add negative numbers. # replace the two integers by their sum. s/$line\d+\s+$next/$line$sum/; } print; } 

which you can run as: cat data | ./script.pl cat data | ./script.pl

I get:

 probeset_id submitted_id chr snp_pos alleleA alleleB 562_201 562_202 562_203 562_204 562_205 562_206 562_207 562_208 562_209 562_210 562_211 562_212 562_213 562_214 562_215 562_216 562_217 562_218 562_219 562_220 562_221 562_222 562_223 562_224 562_225 562_226 562_227 562_228 562_229 562_230 562_231 562_232 562_233 562_234 562_235 562_236 562_237 562_238 562_239 562_240 562_241 562_242 562_243 562_244 562_245 562_246 562_247 562_248 562_249 562_250 562_251 562_252 562_253 562_254 562_255 562_256 562_257 562_258 562_259 562_260 562_261 562_262 562_263 562_264 562_265 562_266 562_267 562_268 562_269 562_270 562_271 562_272 562_273 562_274 562_275 562_276 562_277 562_278 562_279 562_280 562_281 562_283 562_284 562_285 562_289 562_291 562_292 562_294 562_295 562_296 562_400 562_401 562_402 562_403 562_404 562_405 AX-75448119 Chr1_41908741 1 41908741 TC 13 AX-75448118 Chr1_41908545 1 41908545 TC 98 562_203 562_204 562_205 562_206 562_207 562_208 562_209 562_210 562_211 562_212 562_213 562_214 562_215 562_216 562_217 562_218 562_219 562_220 562_221 562_222 562_223 562_224 562_225 562_226 562_227 562_228 562_229 562_230 562_231 562_232 562_233 562_234 562_235 562_236 562_237 562_238 562_239 562_240 562_241 562_242 562_243 562_244 probeset_id submitted_id chr snp_pos alleleA alleleB 562_201 562_202 562_203 562_204 562_205 562_206 562_207 562_208 562_209 562_210 562_211 562_212 562_213 562_214 562_215 562_216 562_217 562_218 562_219 562_220 562_221 562_222 562_223 562_224 562_225 562_226 562_227 562_228 562_229 562_230 562_231 562_232 562_233 562_234 562_235 562_236 562_237 562_238 562_239 562_240 562_241 562_242 562_243 562_244 562_245 562_246 562_247 562_248 562_249 562_250 562_251 562_252 562_253 562_254 562_255 562_256 562_257 562_258 562_259 562_260 562_261 562_262 562_263 562_264 562_265 562_266 562_267 562_268 562_269 562_270 562_271 562_272 562_273 562_274 562_275 562_276 562_277 562_278 562_279 562_280 562_281 562_283 562_284 562_285 562_289 562_291 562_292 562_294 562_295 562_296 562_400 562_401 562_402 562_403 562_404 562_405 AX-75448119 Chr1_41908741 1 41908741 TC 13 AX-75448118 Chr1_41908545 1 41908545 TC 98 
+1
source

If you really wanted to avoid perl (why?), You could do this hacker thing, which obviously doesn't work too well:

 while read f1 f2 f3 f4 f5 f6 line do echo "$f1 $f2 $f3 $f4 $f5 $f6 $(echo "$line" | xargs -n1 | grep -v '^-' | paste -sd+ | bc)" done < input 

I get:

 AX-75448119 Chr1_41908741 1 41908741 TC 13 AX-75448118 Chr1_41908545 1 41908545 TC 98 
+1
source

A slightly modified version of @steve awk's solution

 awk ' NR>1{ s = 0; for (i = 7 ; i <= NF ; i++) { if ($i != -1) { s+=$i; } } for (j = 1 ; j < 7 ; j++) { printf("%s ", $j); } print s; }' file 

Test:

 [jaypal:~/Temp] awk ' NR>1{ s = 0; for (i = 7 ; i <= NF ; i++) { if ($i != -1) { s+=$i; } } for (j = 1 ; j < 7 ; j++) { printf("%s ", $j); } print s; }' file AX-75448119 Chr1_41908741 1 41908741 TC 13 AX-75448118 Chr1_41908545 1 41908545 TC 98 
+1
source

Source: https://habr.com/ru/post/1392435/


All Articles