I used HTTRACK to download federal rules from a government website, and the resulting HTML files were not intuitively named. Each file has a set of tags <TITLE></TITLE>that will serve well to label each file in a way that will be used to create electronic books. I want to turn these rules into an e-book for my Kindle so that I can easily adjust the position, and not carry volumes of books with me.
My preferred text / hex editor, UltraEdit Professional 15.20.0.1026, has script commands that let you embed the JavaScript engine. Studying possible solutions to my problem, I found xmlTitleSave on the IDM UltraEdit website.
var regex = "<title>(.*)</title>"
var file_path = UltraEdit.getString("Path to save file at? !! MUST PRE EXIST !!",1);
UltraEdit.activeDocument.top();
UltraEdit.activeDocument.unicodeToASCII();
UltraEdit.activeDocument.findReplace.regExp = true;
UltraEdit.activeDocument.findReplace.find(regex);
var titl = UltraEdit.activeDocument.selection;
t = titl.match(regex);
saveTitle = t[1]+".xml";
UltraEdit.saveAs(file_path + saveTitle);
My question is double:
- Can I change this JavaScript to extract content
<TITLE></TITLE>from an HTML file and rename files? - If JavaScript cannot be easily changed, is there a script / program / black magic / animal sacrifice that can do the same thing?
EDIT: I was able to get the script to work as desired by deleting the line UltraEdit.activeDocument.unicodeToASCII();and changing the file extension to .html. My only problem is that although this script works with individual open files, it does not execute the batch process in the directory.