Background
I work at a research institute that studies storm emissions computationally and try to automate some HPC commands using Bash. Currently, the process is that we download data from NOAA and create a batch file manually, in turn, entering the location of each file along with the time when the program reads data from this file and the wind increase factor. There are hundreds of these data files in every NOAA download that is issued every 6 hours or so when a storm is in progress. This means that most of our time during the storm is spent creating these batch files.
Problem
I am limited in the tools that I can use to automate this process, because I just have a user account and monthly allocation of time on supercomputers; I have no privilege to install new software on them. In addition, some of them are Crays, some of them are IBM, some are HP, and so on. There is no consistent operating system between them; the only similarity is that they are all based on Unix. So I have tools like Bash, Perl, awk and Python, but not necessarily tools like csh, ksh, zsh, bc, et cetera:
$ bc -bash: bc: command not found
In addition, my lead scientist asked that all the code that I write for him be in Bash, because he understands this, with minimal calls to external programs for things, Bash canโt. For example, it cannot perform floating point arithmetic, and I need to be able to add float. I can call Perl from Bash, but slow:
$ time perl -E 'printf("%.2f", 360.00 + 0.25)' 360.25 real 0m0.052s user 0m0.015s sys 0m0.015s
1/20 seconds does not look like a long time, but when I have to make this call 100 times in one file, it is equivalent to about 5 seconds to process one file. This is not so bad when we do only one of them every 6 hours. However, if this work is diverted to a larger task, then where we indicate 1000 synthetic storms in the Atlantic basin at a time in order to study what could happen if the storm were stronger or along a different path, 5 seconds quickly grows to more than an hour to process text files. When you are billed for an hour, this creates a problem.
Question
What is a good way to speed this up? I currently have this for loop in a script (one that takes 5 seconds to run):
for FORECAST in $DIRNAME; do echo $HOURCOUNT" "$WINDMAG" "${FORECAST##*/} >> $FILENAME; HOURCOUNT=$(echo "$HOURCOUNT $INCREMENT" | awk '{printf "%.2f", $1 + $2}'); done
I know that one call to awk or Perl to scroll through data files will be hundreds of times faster than calling once for each file in a directory, and that these languages โโcan easily open a file and write it, but the problem I am facing is this is getting data back and forth. I found many resources only in these three languages โโ(awk, Perl, Python), but could not find them when embedding in the Bash script. The closest I could find was to make this shell of the awk command:
awk -v HOURCOUNT="$HOURCOUNT" -v INCREMENT="$INCREMENT" -v WINDMAG="$WINDMAG" -v DIRNAME="$DIRNAME" -v FILENAME="$FILENAME" 'BEGIN{ for (FORECAST in DIRNAME) do ... }'
But I'm not sure if this is the correct syntax, and if so, if this is the best way to do it, or if it will work at all. For several days I hit my head against the wall and decided to ask for the Internet before connecting.