Recursive merging (merging) and renaming text files in a directory tree

I am using Mac OS X Lion.

I have a folder: LITERATURE with the following structure:

 LITERATURE > Y > YATES, DORNFORD > THE BROTHER OF DAPHNE: Chapters 01-05.txt Chapters 06-10.txt Chapters 11-end.txt 

I want to recursively merge chapters divided into several files (not all). Then I want to write the concatenated file to the parent parent directory. The name of the concatenated file must match the name of its parent directory.

For example, after running the script (in the folder structure shown above), I should get the following.

 LITERATURE > Y > YATES, DORNFORD: THE BROTHER OF DAPHNE.txt THE BROTHER OF DAPHNE: Chapters 01-05.txt Chapters 06-10.txt Chapters 11-end.txt 

In this example, the parent directory is THE BROTHER OF DAPHNE , and the parent parent directory is YATES, DORNFORD .


[Updated March 6 - rephrased the question / answer so that the question / answer is easy to find and understand.]

0
source share
4 answers

It's not clear what you mean by "recursively", but that should be enough to get you started.

 #!/bin/bash titlecase () { # adapted from http://stackoverflow.com/a/6969886/874188 local arr arr=("${@,,}") echo "${arr[@]^}" } for book in LITERATURE/?/*/*; do title=$(titlecase ${book##*/}) for file in "$book"/*; do cat "$file" echo done >"$book/$title" echo '# not doing this:' rm "$book"/*.txt done 

This loop goes through LITERATURE / initial / author / BOOK TITLE and creates a Book Title file (where should the space be added?) From the selected files in each book directory. (I would generate it in the parent directory and then completely delete the book directory, assuming that it contains nothing more.) There is no recursion, just a loop over this directory structure.

Deleting partition files is a bit risky, so I am not doing it here. You can remove the echo prefix from the line after the first done to enable it.

If you have book names that contain an asterisk or some other shell metacharacter, this will be pretty tricky - assigning a title suggests that you can use the title of the book without quotes.

Only parameter extension with case conversion goes beyond the basics of Bash. Array operations may also be a little scary if you are a complete newbie. Correct understanding of citation is also often a problem for beginners.

+1
source
 cat Chapters*.txt > FinaleFile.txt.raw Chapters="$( ls -1 Chapters*.txt | sed -n 'H;${x;s/\ //g;s/ *Chapters //g;s/\.txt/ /g;s/ *$//p;}' )" mv FinaleFile.txt.raw "FinaleFile ${Chapters}.txt" 
  • cat all txt at once (assuming sorting the list of names)
  • take the chapter number / ref from the ls folder and with sed to adapt the format
  • rename the concatenation file including chapters
0
source

Thanks for all your input. They made me think, and I was able to combine the files by following these steps:


  • This script replaces spaces in file names with underscores.

 #!/bin/bash # We are going to iterate through the directory tree, up to a maximum depth of 20. for i in `seq 1 20` do # In UNIX based systems, files and directories are the same (Everything is a File!). # The 'find' command lists all files which contain spaces in its name. The | (pipe) … # … forwards the list to a 'while' loop that iterates through each file in the list. find . -name '* *' -maxdepth $i | while read file do # Here, we use 'sed' to replace spaces in the filename with underscores. # The 'echo' prints a message to the console before renaming the file using 'mv'. item=`echo "$file" | sed 's/ /_/g'` echo "Renaming '$file' to '$item'" mv "$file" "$item" done done 

  1. This script combines text files starting with a section, section, section, or book.

 #!/bin/bash # Here, we go through all the directories (up to a depth of 20). for D in `find . -maxdepth 20 -type d` do # Check if the parent directory contains any files of interest. if ls $D/Part*.txt &>/dev/null || ls $D/Chapter*.txt &>/dev/null || ls $D/Section*.txt &>/dev/null || ls $D/Book*.txt &>/dev/null then # If we get here, then there are split files in the directory; we will concatenate them. # First, we trim the full directory path ($D) so that we are left with the path to the … # … files' parent parent directory—We will write the concatenated file here. (✝) ppdir="$(dirname "$D")" # Here, we concatenate the files using 'cat'. The 'awk' command extracts the name of … # … the parent directory from the full directory path ($D) and gives us the filename. # Finally, we write the concatenated file to its parent parent directory. (✝) cat $D/*.txt > $ppdir/`echo $D|awk -F'/' '$0=$(NF-0)'`.txt fi done 

  1. Now we will delete all the files that we have combined so that its parent directory remains empty.

    • find . -name 'Part*' -delete
    • find . -name 'Chapter*' -delete
    • find . -name 'Section*' -delete
    • find . -name 'Book*' -delete

    1. The following command will delete empty directories. (✝) We wrote the concatenated file to the parent parent directory so that its parent directory remains empty after deleting all the split files.

      • find . -type d -empty -delete

[Updated March 6 - rephrased the question / answer so that the question / answer is easy to find and understand.]

0
source

Shell does not like spaces in names. However, over the years, Unix has come up with some tricks to help:

 $ find . -name "Chapters*.txt" -type f -print0 | xargs -0 cat >> final_file.txt 

You can do what you want.

find recursively finds all directory entries in the file tree that match the query (in this case, the type must be a file, and the name matches the Chapter*.txt template).

Normally, find separates the names of entries in the directory with NL, but -print0 says to separate the names of entries with the NUL . NL is a valid character in the file name, but NUL not.

The xargs command prints the result of find and processes it. xargs collects all the names and passes them in bulk to the command you give it - in this case, the cat .

Usually xargs selects files by space, which means that Chapters will be one file, and 01-05.txt - another. However, -0 tells xargs use NUL as a file separator - this is what -print0 does.

0
source

Source: https://habr.com/ru/post/894460/


All Articles