Removing a C ++ comment from source

I have C ++ code with /* */and style comments //. I want to have a way to delete them automatically. Apparently, the use of the editor (for example, ultraedit) with some regular expression to search for /*, */and //should carry out the task. But upon closer inspection, a complete solution is not so simple, because the sequences / * or // may not represent a comment if they are inside another comment, a string literal, or a character literal. eg.

printf(" \" \" " "  /* this is not a comment and is surrounded by an unknown number of double-quotes */");

- a sequence of comments inside a double quote. And it's not an easy task to determine if a string is inside a pair of valid double quotes. Although it is

// this is a single line comment /* <--- this does not start a comment block 
// this is a second comment line with an */ within

is a sequence of comments in other comments.

Is there a more complete way to remove comments from a C ++ source, given a string literal and a comment? For example, is it possible to instruct the preprocessor to delete comments until it executes, say, the #include directive?

+3
source share
6 answers

Let someone vote for my own answer to my own question.

Thanks to the idea of ​​Martin York, I found that in Visual Studio the solution looks very simple (subject to further testing). Just rename ALL preprocessor directives to something else (something invalid C ++ syntax is fine) and use cl.exe with / P

cl target.cpp /P

a target.i. . , . , #line, cl.exe.

, , MSDN, :

. Trigraph .

, () , ​​ , . , , .

. . .

. #include , .

escape- . Microsoft C ++ , ASCII.

. , "String" " ".

; .

Tokenization Preprocessing. , ( ), 3 .

.h, /FI, . .i .cpp .h. . # . . , , / , .

, - . ( ).

.

// vc8.cpp : Defines the entry point for the console application.
//

-#include "stdafx.h"
-#include <windows.h>
-#define NOERR
-#ifdef NOERR
  /* comment here */
 whatever error line is ok
-#else
  some error line if NOERR not defined
      // comment here
-#endif
void pr() ;
int _tmain(int argc, _TCHAR* argv[])
{
    pr();
    return 0;
}

/*comment*/

void pr() {
    printf(" /* "); /* comment inside string " */
    // comment terminated by \
    continue a comment line
    printf(" "); /** " " string inside comment */
    printf/* this is valid comment within line continuation */\
("some weird lines \
with line continuation");
}

cl.exe vc8.cpp /P cl.exe ( # )

#line 1 "vc8.cpp"



-#include "stdafx.h"
-#include <windows.h>
-#define NOERR
-#ifdef NOERR

 whatever error line is ok
-#else
  some error line if NOERR not defined

-#endif
void pr() ;
int _tmain(int argc, _TCHAR* argv[])
{
    pr();
    return 0;
}



void pr() {
    printf(" /* "); 


    printf(" "); 
    printf\
("some weird lines \
with line continuation");
}
+1

C .

:

, MACROS #if

> cat t.cpp
/*
 * Normal comment
 */
// this is a single line comment /* <--- this does not start a comment block 
// this is a second comment line with an */ within
#include <stdio.h>

#if __SIZEOF_LONG__ == 4
int bits = 32;
#else
int bits = 16;
#endif

int main()
{
    printf(" \" \" " " /* this is not a comment and is surrounded by an unknown number of double-quotes */");
    /*
     * comment with a single // line comment enbedded.
     */
    int x;
    // A single line comment /* Normal enbedded */ Comment
}

, #if , .
. cpp -E -dM.

#defines , .

> cpp -E -dM t.cpp > /tmp/def
> cat /tmp/def t.cpp | sed -e s/^#inc/-#inc/ | cpp - | sed s/^-#inc/#inc/
# 1 "t.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "t.cpp"






#include <stdio.h>


int bits = 32;




int main()
{
    printf(" \" \" " " /* this is not a comment and is surrounded by an unknown number of double-quotes */");    



    int x;

}
+3

SD ++ Formatter . ++ , , , , .

, , . Formatter .

+2

(, boost:: spirit) . , , . , , .

+1
source

The regular expression is not intended for parsing languages; it is, at best, an unsuccessful attempt.

To do this, you need a full-sized parser. You might want to consider Clangrewriting is the clear purpose of the Clang library set, and there are already existing rewriting devices from which you can get inspiration.

+1
source
#include <iostream>
#include<fstream>
using namespace std;

int main() {
    ifstream fin;
    ofstream fout;
    fin.open("input.txt");
    fout.open("output.txt");
    char ch;
    while(!fin.eof()){
        fin.get(ch);
        if(ch=='/'){
            fin.get(ch);
            if(ch=='/' )
            {   //cout<<"Detected\n";
                fin.get(ch);
                while(!(ch=='\n'||ch=='\0'))
                {
                //cout<<"while";
                fin.get(ch);
                }
            }
            if(ch=='*')
            {
                fin.get(ch);
                while(!(ch=='*')){
                    fin.get(ch);
                }
                fin.get(ch);
                if(ch=='/'){
                //  cout<<"Detected Multi-Line\n";
                    fin.get(ch);
                }

            }
        }
        fout<<ch;
    }
    return 0;
}
-2
source

Source: https://habr.com/ru/post/1779621/


All Articles