Effectively calculates floating point arithmetic hundreds of thousands of times in Bash

Question

Effectively calculates floating point arithmetic hundreds of thousands of times in Bash

Background

I work at a research institute that studies storm emissions computationally and try to automate some HPC commands using Bash. Currently, the process is that we download data from NOAA and create a batch file manually, in turn, entering the location of each file along with the time when the program reads data from this file and the wind increase factor. There are hundreds of these data files in every NOAA download that is issued every 6 hours or so when a storm is in progress. This means that most of our time during the storm is spent creating these batch files.

Problem

I am limited in the tools that I can use to automate this process, because I just have a user account and monthly allocation of time on supercomputers; I have no privilege to install new software on them. In addition, some of them are Crays, some of them are IBM, some are HP, and so on. There is no consistent operating system between them; the only similarity is that they are all based on Unix. So I have tools like Bash, Perl, awk and Python, but not necessarily tools like csh, ksh, zsh, bc, et cetera:

$ bc -bash: bc: command not found

In addition, my lead scientist asked that all the code that I write for him be in Bash, because he understands this, with minimal calls to external programs for things, Bash can’t. For example, it cannot perform floating point arithmetic, and I need to be able to add float. I can call Perl from Bash, but slow:

 $ time perl -E 'printf("%.2f", 360.00 + 0.25)' 360.25 real 0m0.052s user 0m0.015s sys 0m0.015s

1/20 seconds does not look like a long time, but when I have to make this call 100 times in one file, it is equivalent to about 5 seconds to process one file. This is not so bad when we do only one of them every 6 hours. However, if this work is diverted to a larger task, then where we indicate 1000 synthetic storms in the Atlantic basin at a time in order to study what could happen if the storm were stronger or along a different path, 5 seconds quickly grows to more than an hour to process text files. When you are billed for an hour, this creates a problem.

Question

What is a good way to speed this up? I currently have this for loop in a script (one that takes 5 seconds to run):

 for FORECAST in $DIRNAME; do echo $HOURCOUNT" "$WINDMAG" "${FORECAST##*/} >> $FILENAME; HOURCOUNT=$(echo "$HOURCOUNT $INCREMENT" | awk '{printf "%.2f", $1 + $2}'); done

I know that one call to awk or Perl to scroll through data files will be hundreds of times faster than calling once for each file in a directory, and that these languages can easily open a file and write it, but the problem I am facing is this is getting data back and forth. I found many resources only in these three languages (awk, Perl, Python), but could not find them when embedding in the Bash script. The closest I could find was to make this shell of the awk command:

 awk -v HOURCOUNT="$HOURCOUNT" -v INCREMENT="$INCREMENT" -v WINDMAG="$WINDMAG" -v DIRNAME="$DIRNAME" -v FILENAME="$FILENAME" 'BEGIN{ for (FORECAST in DIRNAME) do ... }'

But I'm not sure if this is the correct syntax, and if so, if this is the best way to do it, or if it will work at all. For several days I hit my head against the wall and decided to ask for the Internet before connecting.

+6

performance bash shell hpc supercomputers

Jonathan landrum Jul 02 '14 at 18:54

source share

3 answers

Running awk or another command just to make one addition will never be effective. Bash cannot handle floats, so you need to reschedule your perspective. You say that you only need to add floats, and I believe that these floats represent the duration in hours. So use seconds instead.

 for FORECAST in $DIRNAME; do printf "%d.%02d %s %s\n" >> $FILENAME \ $((SECONDCOUNT / 3600)) \ $(((SECONDCOUNT % 3600) * 100 / 3600)) \ $WINDMAG \ ${FORECAST##*/} SECONDCOUNT=$((SECONDCOUNT + $SECONDS_INCREMENT)) done

( printf is standard and much nicer than echo for formatted output)

EDIT: abbreviation as a function and with a bit of demo code:

 function format_as_hours { local seconds=$1 local hours=$((seconds / 3600)) local fraction=$(((seconds % 3600) * 100 / 3600)) printf '%d.%02d' $hours $fraction } # loop for 0 to 2 hours in 5 minute steps for ((i = 0; i <= 7200; i += 300)); do format_as_hours $i printf "\n" done

0

pdw Jul 02 '14 at 19:37

source share

If all these computers are unified, and it is expected that they will perform floating point calculations, then each of them should have an accessible application that supports fp. Thus, the composite compound command on the lines bc -l some-comp || dc some-comp || ... || perl some comp

-2

user985675 Jul 02 '14 at 19:32

source share

David C. Rankin · Accepted Answer · 2014-07-02T20:09:46+0000

Bash is very capable if you have the necessary capabilities. For floating point, you basically have two options: bc (which, at least in the window you are showing, is not set (which is hard to believe)) or calc . calc-2.12.4.13.tar.bz2

Any package is flexible and very capable floating point programs that integrate well with bash. Since the privileges that bash will have, I would investigate installing either bc or calc . (work safety is good)

If your superiors can be convinced to allow either perl or python , then either. If you never programmed any of them, both will have a learning curve, python little more than perl . If you can send bash , then translating perl will be much easier to digest for them than python .

This is a fair plan of options that you indicated in your situation, as you explained it. Regardless of your choice, the task for you should not be so difficult in any of the languages. Just drop the line when you get stuck.

Effectively calculates floating point arithmetic hundreds of thousands of times in Bash

Background

Problem

Question

More articles: