I was wondering how stackoverflow parses all kinds of different codes and identifies keywords, special characters, formatting spaces, etc. He does this for most of the codes that I consider, and I noticed that he is even sophisticated enough to understand the relationship between all, for example:
String mystring1 = "inquotes";
String mystring2 = "inquotes//incomment";
String mystring3 =
Many IDEs do this too. How it's done?
Edit: Further explanation. I do not ask about the analysis of the text, my question is when I pass by this part. Is there something like a universal XML schema or a cross-code format hierarchy that describes which lines are keywords, whose characters indicate comments, text lines, logical operators, etc. Or should I become a syntax guru for any language that I want to parse correctly?
source
share