Here's an example bash script:
#!/bin/bash local LOCKFILE=/tmp/rmHugeNumberOfFiles.lock # this process gets ultra-low priority ionice -c2 -n7 -p $$ > /dev/null if [ $? ]; then echo "Could not set disk IO priority. Exiting..." exit fi renice +19 -p $$ > /dev/null if [ $? ]; then echo "Could not renice process. Exiting..." exit fi # check if there an instance running already. If so--exit if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then echo "An instance of this script is already running." exit fi # make sure the lockfile is removed when we exit. Then: claim the lock trap "command rm -f -- $LOCKFILE; exit" INT TERM EXIT echo $$ > $LOCKFILE # also create a tempfile, and make sure that removed too upon exit tmp=$(tempfile) || exit trap "command rm -f -- '$tmp'" INT TERM EXIT # ---------------------------------------- # option 1 # ---------------------------------------- # find your specific files find "$1" -type f [INSERT SPECIFIC SEARCH PATTERN HERE] > "$tmp" cat $tmp | rm # ---------------------------------------- # option 2 # ---------------------------------------- command rm -r "$1" # remove the lockfile, tempfile command rm -f -- "$tmp" $LOCKFILE
This script starts by setting its process priority and diskIO priority to very low values ββto ensure that other running processes are not affected.
He then ensures that this is ONLY such a process.
The core script really depends on your preference. You can use rm -r
if you are sure that the entire directory can be deleted indefinitely (option 2), or you can use find
to delete the file more specifically (option 1, perhaps using the command line options "$ 2" and onw. for convenience).
In the implementation above, option 1 ( find
) first outputs everything to the temp file, so the rm
function is called only once, and not after each file found by find
. When the number of files is really huge, this can be a significant time saver. On the other hand, the size of a temporary file can be a problem, but this is only possible if you delete literally billions of files, and also because diskIO has such a low priority, using a temporary file, followed by one rm
, possibly in overall, it will be slower than using the find (...) -exec rm {} \;
option find (...) -exec rm {} \;
. As always, you should experiment a bit to find out what best suits your needs.
EDIT: as suggested by find (...) -print0 | xargs -0 rm
, you can also skip the whole tempfile and use find (...) -print0 | xargs -0 rm
find (...) -print0 | xargs -0 rm
. This has more memory since all the full paths to all the corresponding files will be inserted into RAM until the find
is completely finished. On the other hand: there is no additional IO file due to writing to tempfile. Which one to choose depends on your use case.
source share