Regex - search for elements with attribute name but not id

This problem came to me today. I work in a web project (Struts 2) with a lot of JSPs, and most elements input, select, table and a defined only with the attribute name , no id , for example:

<input name="myname" class="myclass" value="" type="text"/>

So far, so good, except that, unfortunately, there is a lot of javascript validation for these fields, and as far as I could read the code before leaving most, actually refer to elements with document.getElementById .

The trap here is that this is an old application (not so much old actually), compatible only with IE-6 and IE-7 (I did not search the network to understand how IE actually seems to find an element with only with the name attribute, but I think it should do something). No wonder every other browser complains and cries.

So, I'm trying to find a simple solution: find all the JSPs that define the input, select, table and a elements with the name attribute but not id to fix the HTML.

Using my good friend http://rubular.com I came up with the following:

/<(?:(input|select|a|table))\s+((?!id).)*>

This will capture every mentioned element without id . But how can I say that only those with a name matched?

Oh, another important point. The definition of the elements is on the same line, so it is most likely that there are no such things as:

 <input name="..." class="..."/> 
+3
source share
4 answers

Try the following:

<(?:input|select|a|table)\s+(?=[^>]*\bname\s*=)(?![^>]*\bid\s*=)[^>]*>

Explanation:

 < "<" (?:input|select|a|table) One of "input", "select", "a", "table" \s+ Whitespace (?= Positive lookahead [^>]* Anything up to but excluding ">" \b Word boundary name "name" \s* Possible whitespace = "=" ) (?! Negative lookahead [^>]* Anything up to but excluding ">" \b Word boundary id "id" \s* Possible whitespace = "=" ) [^>]* Anything up to but excluding ">" > ">" 
+4
source

Denial of responsibility:

Everyone will tell you NOT to use regex for parsing HTML, and they are right. However, the next regex solution should do a pretty decent job for a one-time task (if 100% reliability is not a concern).

Regex matches a tag that has one attribute, but not another:

The following PHP test script uses a (fully commented) regular expression to match the start tags of INPUT , SELECT , TABLE and A elements that have the NAME attribute but not the ID >. The script inserts a new ID attribute into each start tag that matches the existing NAME attribute:

 <?php // test.php Rev:20121107_2100 $re = '% # Match HTML 4.01 element start tags with NAME but no ID attrib. ( # $1: Everything up to tag close delimiter. < # Start tag open delimiter. (?:input|select|table|a)\b # Element name. (?: # Zero or more attributes before NAME. \s+ # Attributes are separated by whitespace. (?!name\b|id\b) # Only non-NAME, non-ID before NAME attrib. [A-Za-z][\w\-:.]* # Attribute name is required. (?: # Attribute value is optional. \s*=\s* # Name and value separated by = (?: # Group for value alternatives. "[^"]*" # Either a double-quoted string, | \'[^\']*\' # or a single-quoted string, | [\w\-:.]+ # or a non-quoted string. ) # End group of value alternatives. )? # Attribute value is optional. )* # Zero or more attributes before NAME. \s+ # NAME attribute is separated by whitespace. name # NAME attribute name is required. \s*=\s* # Name and value separated by = ( # $2: NAME value. "[^"]*" # Either a double-quoted string, | \'[^\']*\' # or a single-quoted string, | [\w\-:.]+ # or a non-quoted string. ) # $2: NAME value. (?: # Zero or more attributes after NAME. \s+ # Attributes are separated by whitespace. (?!id\b) # Only non-ID attribs after NAME attrib. [A-Za-z][\w\-:.]* # Attribute name is required. (?: # Attribute value is optional. \s*=\s* # Name and value separated by = (?: # Group for value alternatives. "[^"]*" # Either a double-quoted string, | \'[^\']*\' # or a single-quoted string, | [\w\-:.]+ # or a non-quoted string. ) # End group of value alternatives. )? # Attribute value is optional. )* # Zero or more attributes after NAME. ) # $1: Everything up to close delimiter. # Insert missing ID attribute here... (\s*/?>) # $3: Start tag close delimiter. %ix'; $html = file_get_contents('testdata.html'); $html = preg_replace($re, "$1 id=$2$3", $html); file_put_contents('testdata_out.html', $html); ?> 
+2
source

If we use getElementById in javascript, then it works in Internet Explorer, if an element with the same name as in id exists, but it does not work in all other browsers (Firefox, Chrome, Safari, etc.).

It can be fixed using the following code.

 function includeIdIfNotExist(element) { var id = element.getAttribute('id'); var name = element.getAttribute('name'); if (name && !id) { element.id = name; } } function addMissingId() { var elementsToAddId = ['input', 'select']; for (var j = 0; j < elementsToAddId.length; j++) { var inputElements = document.getElementsByTagName(elementsToAddId[j]); for (var i = 0; i < inputElements.length; i++) { includeIdIfNotExist(inputElements[i]); } } } document.onload = addMissingId(); 
0
source

If you want to search for elements with a name, but not id, and set id equal to the name, you can find and replace as follows:

To find:

 (<(?:input|select|table|form|textarea)\s+)(?=[^>]*\bname\s*="(\w+)")((?![^>]*\bid\s*=)[^>]*>) 

It is designed to work with input, selection, table, form, textarea. You can add or remove html tags from input | select | table | form | textarea . It will also check if the id element has

Replaced by:

 $1id="$2" $3 

This will add id = "[nameValue]" to the selected html tag that has a name but not an identifier.

Hope this helps!

0
source

Source: https://habr.com/ru/post/1444972/


All Articles