Using javascript to rename multiple HTML files with <TITLE> </TITLE> in each file

I used HTTRACK to download federal rules from a government website, and the resulting HTML files were not intuitively named. Each file has a set of tags <TITLE></TITLE>that will serve well to label each file in a way that will be used to create electronic books. I want to turn these rules into an e-book for my Kindle so that I can easily adjust the position, and not carry volumes of books with me.

My preferred text / hex editor, UltraEdit Professional 15.20.0.1026, has script commands that let you embed the JavaScript engine. Studying possible solutions to my problem, I found xmlTitleSave on the IDM UltraEdit website.

// ----------------------------------------------------------------------------
// Script Name: xmlTitleSave.js
// Creation Date: 2008-06-09
// Last Modified: 
// Copyright: none
// Purpose: find the <title> value in an XML document, then saves the file as the 
// title.xml in a user-specified directory
// ----------------------------------------------------------------------------

//Some variables we need
var regex = "<title>(.*)</title>" //Perl regular expression to find title string
var file_path = UltraEdit.getString("Path to save file at? !! MUST PRE EXIST !!",1);

// Start at the beginning of the file
UltraEdit.activeDocument.top();

UltraEdit.activeDocument.unicodeToASCII();

// Turn on regular expressions
UltraEdit.activeDocument.findReplace.regExp = true;

// Find it
UltraEdit.activeDocument.findReplace.find(regex);

// Load it into a selection
var titl = UltraEdit.activeDocument.selection;

// Javascript function 'match' will match the regex within the javascript engine 
// so we can extract the actual title via array
t = titl.match(regex);

// 't' is an array of the match from 'titl' based on the var 'regex'
// the 2nd value of the array gives us what we need... then append '.xml'
saveTitle = t[1]+".xml";

UltraEdit.saveAs(file_path + saveTitle);

// Uncomment for debugging
// UltraEdit.outputWindow.write("titl = " + titl);
// UltraEdit.outputWindow.write("t = " + t);

My question is double:

  • Can I change this JavaScript to extract content <TITLE></TITLE>from an HTML file and rename files?
  • If JavaScript cannot be easily changed, is there a script / program / black magic / animal sacrifice that can do the same thing?

EDIT: I was able to get the script to work as desired by deleting the line UltraEdit.activeDocument.unicodeToASCII();and changing the file extension to .html. My only problem is that although this script works with individual open files, it does not execute the batch process in the directory.

+3
4

?

UltraEdit, regex, <title>(.*)</title> XML-, HTML.

.html .xml

saveTitle = t[1]+".html";

, script , ( , UltraEdit), , HTML.

+1

"" , - . Ruby - :

require 'fileutils'

dir = "/your/directory"
files = Dir["#{dir}/*.html"]

files.each do |file|
  html = IO.read file
  title = $1 if html.match /<title>([^<]+)<\/title>/i
  FileUtils.mv file "#{dir}/#{title}.html"
  puts "Renamed #{file} to #{title}.html."
end

, UltraEdit script , , , env, , .

+2

XML HTML script , ; , , :

saveTitle = t[1]+".xml";

:

saveTitle = t[1]+".html";
+1

Windows, TITLE: Flexible Renamer 8.3. - http://hp.vector.co.jp/authors/VA014830/english/FlexRena/, , . @coreyward @Yuji .

0

Source: https://habr.com/ru/post/1785569/


All Articles