Re-expression of Visual Studio to remove all comments and blank lines in VB.NET code using a macro

Question

Re-expression of Visual Studio to remove all comments and blank lines in VB.NET code using a macro

I tried to delete all comments and empty lines in a file using a macro. Now I came up with this solution, which removes the comments (there is some error described below), but cannot delete the empty lines between them -

Sub CleanCode() Dim regexComment As String = "(REM [\d\D]*?[\r\n])|(?<SL>\'[\d\D]*?[\r\n])" Dim regexBlank As String = "^[\s|\t]*$\n" Dim replace As String = "" Dim selection As EnvDTE.TextSelection = DTE.ActiveDocument.Selection Dim editPoint As EnvDTE.EditPoint selection.StartOfDocument() selection.EndOfDocument(True) DTE.UndoContext.Open("Custom regex replace") Try Dim content As String = selection.Text Dim resultComment As String = System.Text.RegularExpressions.Regex.Replace(content, regexComment, replace) Dim resultBlank As String = System.Text.RegularExpressions.Regex.Replace(resultComment, regexBlank, replace) selection.Delete() selection.Collapse() Dim ed As EditPoint = selection.TopPoint.CreateEditPoint() ed.Insert(resultBlank) Catch ex As Exception DTE.StatusBar.Text = "Regex Find/Replace could not complete" Finally DTE.UndoContext.Close() DTE.StatusBar.Text = "Regex Find/Replace complete" End Try End Sub

So, this is what it should look like before and after running the macro.

front

 Public Class Class1 Public Sub New() ''asdasdas Dim a As String = "" ''asdasd ''' asd ad asd End Sub Public Sub New(ByVal strg As String) Dim a As String = "" End Sub End Class

after

 Public Class Class1 Public Sub New() Dim a As String = "" End Sub Public Sub New(ByVal strg As String) Dim a As String = "" End Sub End Class

In macro

There are mainly two main problems:

He cannot remove the spaces between them.
If there is a piece of code that looks like this

 Dim a as String = "Name='Soham'"

Then, after running the macro, it becomes

 Dim a as String = "Name='"

+4

comments regex replace visual-studio

Soham dasgupta Mar 01 '12 at 6:16

source share

3 answers

I just checked with two examples above, '+{.+}$ Should do. If you wish, you can go with ('|'')+{.+}$ , But the first solution also replaces the xml descriptions).

 ''' <summary> ''' Method Description ''' </summary> ''' <remarks></remarks> Sub Main() ''first comment Dim a As String = "" 'second comment End Sub

Edit: if you use ('+{.+}$|^$\n) , it removes a) all comments and b) all empty lines. However, if you have a comment and the subsequent function End Sub / Function, it takes up one line, which leads to a compiler error.

Front

  ''' <summary> ''' ''' </summary> ''' <remarks></remarks> Sub Main() ''first comment Dim a As String = "" 'second comment End Sub ''' <summary> ''' ''' </summary> ''' <returns></returns> ''' <remarks></remarks> Public Function asdf() As String Return "" ' returns nothing End Function

After

 Sub Main() Dim a As String = "" End Sub Public Function asdf() As String Return "" End Function

Edit: to remove any blank lines Search Replace the following regex ^$\n with blank.

0

Alex Mar 01 '12 at 7:26

source share

Delete comments first using this regex

'+ \ S * (\ W | \ w). +

'+ - one or more' to start each comment.

\ s * - if there are spaces after the comment.

(\ W \\ w). + - everything that follows with the exception of line terminators.

Then delete the blank lines left with the regular expression provided by Mr. Alan Moore.

0

John Aaron Alcoseba Mar 09 '18 at 6:22

source share

Alan moore · Accepted Answer · 2012-03-05T14:31:21+0000

To get rid of a string containing spaces or nothing, you can use this regex:

 (?m)^[ \t]*[\r\n]+

Your regex ^[\s|\t]*$\n will work if you set multiline mode ( (?m) ), but it is still incorrect. First, | matches literal | ; no need to specify "or" in a character class. For the other, \s matches any space character, including TAB ( \t ), carriage return ( \r ) and linefeed ( \n ), which makes it unnecessarily unnecessary and inefficient. For example, in the first empty line (after the end of the first Sub ) ^[\s|\t]* will first try to match everything before the word Public , then it will return to the end of the previous line, where $\n can match.

But an empty line, in addition to an empty one or containing only horizontal spaces (spaces or TAB), may also contain a comment. I prefer to treat these lines only for comments as empty lines, because it is relatively easy to do, and it simplifies the task of matching comments in non-empty lines, which is much more complicated. Here is my regex:

 ^[ \t]*(?:(?:REM|')[^\r\n]*)?[\r\n]+

After using any leading horizontal space, if I see a REM or ' character denoting a comment, I use this and everything after it until the next line separator. Please note that the only thing required for presence is the line separator itself. Also note the lack of end anchor, $ . This should never be used when you explicitly match string separators, in which case it will break the regex. In multi-line mode, $ matches only before the line ( \n ), and not before the carriage return ( \r ). (This behavior of .NET code is incorrect and rather unexpected, given that Microsoft has long preferred \r\n as a line separator.)

Combining the remaining comments is a fundamentally different task. As you have discovered, just searching for REM or ' does not work, because you can find it in a string literal where this does not mean the beginning of the comment. What you need to do is start at the beginning of the line, consuming and capturing everything that is not the beginning of a comment or string literal. If you find a double quote, continue and use a string literal. If you find REM or ' , stop capturing and continue and use the rest of the line. Then you replace the entire line with only the captured part - i.e. All to comment. Here's the regex:

 (?mn)^(?<line>[^\r\n"R']*(("[^"]*"|(?!REM)R)[^\r\n"R']*)*)(REM|')[^\r\n]*

Or, more readably:

 (?mn) # Multiline and ExplicitCapture modes ^ # beginning of line (?<line> # capture in group "line" [^\r\n"R']* # any number of "safe" characters ( ( "[^"]*" # a string literal | (?!REM)R # 'R' if it not the beginning of 'REM' ) [^\r\n"R']* # more "safe" characters )* ) # stop capturing (?:REM|') # a comment sigil [^\r\n]* # consume the rest of the line

The replacement string will be "${line}" . Some other notes:

Note that this regular expression does not end with [\r\n]+ in order to use the line separator, as the regular expression "empty lines" does.
It does not end with $ for the same reason as before. [^\r\n]* will eagerly consume everything before the line separator, so no anchor is needed.
The only thing that should be present is REM or ' ; we are not trying to match any line that does not contain a comment.
ExplicitCapture mode means that I can use (...) instead of (?:...) for all groups that I don't want to capture, but the named group (?<line>...) still works.
Be that as it may, this regular expression will be much worse if VB supports multi-line comments or if its string literals support backslash.

I do not do VB, but here is a demo in C # .

Re-expression of Visual Studio to remove all comments and blank lines in VB.NET code using a macro

More articles: