Parsing CS: Python-encoded GO language file

This section is devoted to Parsing a CS: GO script file in a Python theme , but there is another problem. I am working on content from CS: GO, and now I am trying to make the python tool import all data from / scripts / folder into Python dictionaries.

The next step after analyzing the data is to analyze the language resource file from / resources and create the relationship between the dictionaries and the language.

There is an original file for localizing Eng: https://github.com/spec45as/PySteamBot/blob/master/csgo_english.txt

The file format is similar to the previous task, but I ran into other problems. All language files are in UTF-16-LE encoding, I could not understand how to work with encoded files and strings in Python (I mainly work with Java) I tried to make some decisions based on open(fileName, encoding='utf-16-le').read() , but I don’t know how to work with such encoded strings in pyparsing.

pyparsing.ParseException: expected line starting with "ending with" (at char 0), (line: 1, col: 1)

Another problem is strings with type expressions, for example:

 "musickit_midnightriders_01_desc" "\"HAPPY HOLIDAYS, ****ERS!\"\n -Midnight Riders" 

How to parse these characters if I want to leave these lines as they are?

+2
source share
1 answer

There are several new wrinkles in this input file that were not in the original CS: GO example:

  • embedded \" escaped quotes in some value strings
  • some lines of quoted values ​​span multiple lines
  • some of the values ​​end with a final environment condition (for example, [$WIN32] , [$OSX] )
  • embedded comments in the file marked with a '//' sign

The first two are addressed by changing the definition of value_qs . Since values ​​are now more fully functional than keys, I decided to use separate QuotedString definitions for them:

 key_qs = QuotedString('"').setName("key_qs") value_qs = QuotedString('"', escChar='\\', multiline=True).setName("value_qs") 

The third one requires a bit of refactoring. The use of these qualification conditions is similar to the #IFDEF macros in C - they enable / disable the definition only if the environment meets the condition. Some of these conditions were even Boolean expressions:

  • [!$PS3]
  • [$WIN32||$X360||$OSX]
  • [!$X360&&!$PS3]

This can lead to duplicate keys in the definition file, for example, in the following lines:

 "Menu_Dlg_Leaderboards_Lost_Connection" "You must be connected to Xbox LIVE to view Leaderboards. Please check your connection and try again." [$X360] "Menu_Dlg_Leaderboards_Lost_Connection" "You must be connected to PlayStation®Network and Steam to view Leaderboards. Please check your connection and try again." [$PS3] "Menu_Dlg_Leaderboards_Lost_Connection" "You must be connected to Steam to view Leaderboards. Please check your connection and try again." 

which contain 3 definitions for the key "Menu_Dlg_Leaderboards_Lost_Connection", depending on what environment values ​​have been set.

In order not to lose these values ​​when parsing the file, I decided to change the key during parsing, adding a condition if it is present. This code implements the change:

 LBRACK,RBRACK = map(Suppress, "[]") qualExpr = Word(alphanums+'$!&|') qualExprCondition = LBRACK + qualExpr + RBRACK key_value = Group(key_qs + value + Optional(qualExprCondition("qual"))) def addQualifierToKey(tokens): tt = tokens[0] if 'qual' in tt: tt[0] += '/' + tt.pop(-1) key_value.setParseAction(addQualifierToKey) 

So, in the above example, you will get 3 keys:

  • Menu_Dlg_Leaderboards_Lost_Connection / $ X360
  • Menu_Dlg_Leaderboards_Lost_Connection / $ PS3
  • Menu_Dlg_Leaderboards_Lost_Connection

Finally, comment processing is arguably the easiest. Pyparsing has built-in support for skipping comments, as well as spaces. You just need to define an expression for the comment and ignore its top-level parser. To support this function, several common comment forms are predefined in pyparsing. In this case, the solution is only to change the final parsing of the parsing to:

 parser.ignore(dblSlashComment) 

AND LAST LAST, there is a small error in the implementation of QuotedString, in which standard string literals like \t and \n not processed and are simply treated as unnecessarily escaped 't' or 'n. So, now that this string is parsed:

 "SFUI_SteamOverlay_Text" "This feature requires Steam Community In-Game to be enabled.\n\nYou might need to restart the game after you enable this feature in Steam:\nSteam -> File -> Settings -> In-Game: Enable Steam Community In-Game\n" [$WIN32] 

For the string of values ​​you just got:

 This feature requires Steam Community In-Game to be enabled.nnYou might need to restart the game after you enable this feature in Steam:nSteam -> File -> Settings -> In-Game: Enable Steam Community In-Gamen 

instead:

 This feature requires Steam Community In-Game to be enabled. You might need to restart the game after you enable this feature in Steam: Steam -> File -> Settings -> In-Game: Enable Steam Community In-Game 

I will need to fix this behavior in the next version of pyparsing.

Here is the final parser code:

 from pyparsing import (Suppress, QuotedString, Forward, Group, Dict, ZeroOrMore, Word, alphanums, Optional, dblSlashComment) LBRACE,RBRACE = map(Suppress, "{}") key_qs = QuotedString('"').setName("key_qs") value_qs = QuotedString('"', escChar='\\', multiline=True).setName("value_qs") # use this code to convert integer values to ints at parse time def convert_integers(tokens): if tokens[0].isdigit(): tokens[0] = int(tokens[0]) value_qs.setParseAction(convert_integers) LBRACK,RBRACK = map(Suppress, "[]") qualExpr = Word(alphanums+'$!&|') qualExprCondition = LBRACK + qualExpr + RBRACK value = Forward() key_value = Group(key_qs + value + Optional(qualExprCondition("qual"))) def addQualifierToKey(tokens): tt = tokens[0] if 'qual' in tt: tt[0] += '/' + tt.pop(-1) key_value.setParseAction(addQualifierToKey) struct = (LBRACE + Dict(ZeroOrMore(key_value)) + RBRACE).setName("struct") value <<= (value_qs | struct) parser = Dict(key_value) parser.ignore(dblSlashComment) sample = open('cs_go_sample2.txt').read() config = parser.parseString(sample) print (config.keys()) for k in config.lang.keys(): print ('- ' + k) #~ config.lang.pprint() print (config.lang.Tokens.StickerKit_comm01_burn_them_all) print (config.lang.Tokens['SFUI_SteamOverlay_Text/$WIN32']) 
+1
source

Source: https://habr.com/ru/post/1209521/


All Articles