How can I populate C # classes from an XML document with some inline data?

I have an API that returned this:

http://services.aonaware.com/DictService/DictService.asmx?op=DefineInDict

<?xml version="1.0" encoding="utf-8"?> <WordDefinition xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://services.aonaware.com/webservices/"> <Word>abandon</Word> <Definitions> <Definition> <Word>abandon</Word> <Dictionary> <Id>wn</Id> <Name>WordNet (r) 2.0</Name> </Dictionary> <WordDefinition>abandon n 1: the trait of lacking restraint or control; freedom from inhibition or worry; "she danced with abandon" [syn: {wantonness}, {unconstraint}] 2: a feeling of extreme emotional intensity; "the wildness of his anger" [syn: {wildness}] v 1: forsake, leave behind; "We abandoned the old car in the empty parking lot" 2: stop maintaining or insisting on; of ideas, claims, etc.; "He abandoned the thought of asking for her hand in marriage"; "Both sides have to give up some calims in these negociations" [syn: {give up}] 3: give up with the intent of never claiming again; "Abandon your life to God"; "She gave up her children to her ex-husband when she moved to Tahiti"; "We gave the drowning victim up for dead" [syn: {give up}] 4: leave behind empty; move out of; "You must vacate your office by tonight" [syn: {vacate}, {empty}] 5: leave someone who needs or counts on you; leave in the lurch; "The mother deserted her children" [syn: {forsake}, {desolate}, {desert}] </WordDefinition> </Definition> </Definitions> </WordDefinition> 

Here is the code I used to extract the XML data:

  WebRequest request = WebRequest.Create("http://services.aonaware.com/DictService/DictService.asmx/DefineInDict"); request.Method = "POST"; string postData = "dictId=wn&word=abandon"; byte[] byteArray = Encoding.UTF8.GetBytes(postData); request.ContentType = "application/x-www-form-urlencoded"; request.ContentLength = byteArray.Length; Stream dataStream = request.GetRequestStream(); dataStream.Write(byteArray, 0, byteArray.Length); dataStream.Close(); WebResponse response = request.GetResponse(); Console.WriteLine(((HttpWebResponse)response).StatusDescription); dataStream = response.GetResponseStream(); StreamReader reader = new StreamReader(dataStream); string responseFromServer = reader.ReadToEnd(); Console.WriteLine(responseFromServer); reader.Close(); dataStream.Close(); response.Close(); 

I would like to extract data from XML to a list where the definition class looks like this:

 public class Def { public string text { get; set; } public List<string> synonym { get; set; } } public class Definition { public string type { get; set; } // single character: n or v or a public List<Def> Def { get; set; } } 

Can someone give me some tips on how I can do this and show what options are available for me to select class elements from XML and put them in classes .

I think this question can be useful to many other people, I will open up a lot of generosity, so I hope someone can take the time to come up with a good example

Update:

Unfortunately. I was mistaken with a synonym. I changed it now. Hope this makes more sense. Synonyms are just a list, which I also highlight with what I need, since the two answers so far do not seem to answer the question at all. Thanks.

+5
source share
3 answers

I created a simple parser to determine the word (pretty sure there are room for improvement):

Solution 1.0

 class ParseyMcParseface { /// <summary> /// Word definition lines /// </summary> private string[] _text; /// <summary> /// Constructor (Takes the innerText of the WordDefinition tag as input /// </summary> /// <param name="text">innerText of the WordDefinition</param> public ParseyMcParseface(string text) { _text = text.Split(new [] {'\n'}, StringSplitOptions.RemoveEmptyEntries) .Skip(1) // Skip the first line where the word is mentioned .ToArray(); } /// <summary> /// Convert from single letter type to full human readable type /// </summary> /// <param name="c"></param> /// <returns></returns> private string CharToType(char c) { switch (c) { case 'a': return "Adjective"; case 'n': return "Noun"; case 'v': return "Verb"; default: return "Unknown"; } } /// <summary> /// Reorganize the data for easier parsing /// </summary> /// <param name="text">Lines of text</param> /// <returns></returns> private static List<List<string>> MakeLists(IEnumerable<string> text) { List<List<string>> types = new List<List<string>>(); int i = -1; int j = 0; foreach (var line in text) { // New type (Noun, Verb, Adj.) if (Regex.IsMatch(line.Trim(), "^[avn]{1}\\ \\d+")) { types.Add(new List<string> { line.Trim() }); i++; j = 0; } // New definition in the previous type else if (Regex.IsMatch(line.Trim(), "^\\d+")) { j++; types[i].Add(line.Trim()); } // New line of the same definition else { types[i][j] = types[i][j] + " " + line.Trim(); } } return types; } public List<Definition> Parse() { var definitionsLines = MakeLists(_text); List<Definition> definitions = new List<Definition>(); foreach (var type in definitionsLines) { var defs = new List<Def>(); foreach (var def in type) { var match = Regex.Match(def.Trim(), "(?:\\:\\ )(\\w|\\ |;|\"|,|\\.|-)*[\\[]{0,1}"); MatchCollection syns = Regex.Matches(def.Trim(), "\\{(\\w|\\ )+\\}"); List<string> synonymes = new List<string>(); foreach (Match syn in syns) { synonymes.Add(syn.Value.Trim('{', '}')); } defs.Add(new Def() { text = match.Value.Trim(':', '[', ' '), synonym = synonymes }); } definitions.Add(new Definition { type = CharToType(type[0][0]), Def = defs }); } return definitions; } } 

And here is a usage example:

 WebRequest request = WebRequest.Create("http://services.aonaware.com/DictService/DictService.asmx/DefineInDict"); request.Method = "POST"; string postData = "dictId=wn&word=abandon"; byte[] byteArray = Encoding.UTF8.GetBytes(postData); request.ContentType = "application/x-www-form-urlencoded"; request.ContentLength = byteArray.Length; Stream dataStream = request.GetRequestStream(); dataStream.Write(byteArray, 0, byteArray.Length); dataStream.Close(); WebResponse response = request.GetResponse(); Console.WriteLine(((HttpWebResponse)response).StatusDescription); dataStream = response.GetResponseStream(); StreamReader reader = new StreamReader(dataStream); string responseFromServer = reader.ReadToEnd(); var doc = new XmlDocument(); doc.LoadXml(responseFromServer ); var el = doc.GetElementsByTagName("WordDefinition"); ParseyMcParseface parseyMcParseface = new ParseyMcParseface(el[1].InnerText); var parsingResult = parseyMcParseface.Parse(); // parsingResult will contain a list of Definitions // per the format specified in the question. 

And here is a live demo: https://dotnetfiddle.net/24IQ67

You can also avoid manually searching and then parsing the XML by adding a link to this web service.

Solution 2.0

I made a small application that does this and then parses the definition. Here is posted here on GitHub (it's too big to post here on StackOverflow):

 public enum WordTypes { Noun, Verb, Adjective, Adverb, Unknown } public class Definition { public Definition() { Synonyms = new List<string>(); Anotnyms = new List<string>(); } public WordTypes WordType { get; set; } public string DefinitionText { get; set; } public List<string> Synonyms { get; set; } public List<string> Anotnyms { get; set; } } static class DefinitionParser { public static List<Definition> Parse(string wordDefinition) { var wordDefinitionLines = wordDefinition.Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries) .Skip(1) .Select(x => x.Trim()) .ToList(); var flatenedList = MakeLists(wordDefinitionLines).SelectMany(x => x).ToList(); var result = new List<Definition>(); foreach (var wd in flatenedList) { var foundMatch = Regex.Match(wd, @"^(?<matchType>adv|adj|v|n){0,1}\s*(\d*): (?<definition>[\w\s;""',\.\(\)\!\-]+)(?<extraInfoSyns>\[syn: ((?<wordSyn>\{[\w\s\-]+\})|(?:[,\ ]))*\]){0,1}\s*(?<extraInfoAnts>\[ant: ((?<wordAnt>\{[\w\s-]+\})|(?:[,\ ]))*\]){0,1}"); var def = new Definition(); if (foundMatch.Groups["matchType"].Success) { var matchType = foundMatch.Groups["matchType"]; def.WordType = DefinitionTypeToEnum(matchType.Value); } if (foundMatch.Groups["definition"].Success) { var definition = foundMatch.Groups["definition"]; def.DefinitionText = definition.Value; } if (foundMatch.Groups["extraInfoSyns"].Success && foundMatch.Groups["wordSyn"].Success) { foreach (Capture capture in foundMatch.Groups["wordSyn"].Captures) { def.Synonyms.Add(capture.Value.Trim('{','}')); } } if (foundMatch.Groups["extraInfoAnts"].Success && foundMatch.Groups["wordAnt"].Success) { foreach (Capture capture in foundMatch.Groups["wordAnt"].Captures) { def.Anotnyms.Add(capture.Value.Trim('{', '}')); } } result.Add(def); } return result; } private static List<List<string>> MakeLists(IEnumerable<string> text) { List<List<string>> types = new List<List<string>>(); int i = -1; int j = 0; foreach (var line in text) { // New type (Noun, Verb, Adj.) if (Regex.IsMatch(line, "^(adj|v|n|adv){1}\\s\\d*")) { types.Add(new List<string> { line }); i++; j = 0; } // New definition in the previous type else if (Regex.IsMatch(line, "^\\d+")) { j++; types[i].Add(line); } // New line of the same definition else { types[i][j] = types[i][j] + " " + line; } } return types; } private static WordTypes DefinitionTypeToEnum(string input) { switch (input) { case "adj": return WordTypes.Adjective; case "adv": return WordTypes.Adverb; case "n": return WordTypes.Noun; case "v": return WordTypes.Verb; default: return WordTypes.Unknown; } } } 

enter image description here

Notes:

  • This should work as expected.
  • Parsing a text message is not reliable.
  • You should import the service link (as indicated in another answer) instead of parsing the XML manually.
+5
source

Alexander Petrov's answer would be ideal for you, except that you are dealing with a winning xml scheme. If WordNet is a true outfit, they should redesign the scheme to remove nested WordDefinition elements and add new elements for the main parts of the definition.

This quick fix will work for the specific test case you provided, but it relies on many assumptions about the nature of the text. It also uses string manipulations and regular expressions, which are considered inefficient, so they may be too slow and error prone for your requirements. You can get better solutions for this task if you adapt your question to the subject of string manipulation problems. But the correct solution is to get a better xml scheme.

 using System; using System.Collections.Generic; using System.IO; using System.Text.RegularExpressions; using System.Xml; namespace DefinitionTest { class Program { static void Main(string[] args) { List<Definition> definitions = new List<Definition>(); // The starting point after your web service call. string responseFromServer = EmulateWebService(); // Load the string into this object in order to parse the xml. XmlDocument doc = new XmlDocument(); doc.LoadXml(responseFromServer); XmlNode root = doc.DocumentElement.ParentNode; XmlNodeList elemList = doc.GetElementsByTagName("WordDefinition"); for (int i = 0; i < elemList.Count; i++) { XmlNode def = elemList[i]; // We only want WordDefinition elements that have just one child which is the content we need. // Any WordDefinition that has zero children or more than one child is either empty or a parent element. if (def.ChildNodes.Count == 1) { Console.WriteLine(string.Format("Content of WordDefinition {0}", i)); Console.WriteLine(); Console.WriteLine(def.InnerXml); Console.WriteLine(); definitions.Add(ParseWordDefinition(def.InnerXml)); foreach (Definition dd in definitions) { Console.WriteLine(string.Format("Parsed Word Definition for \"{0}\"", dd.wordDefined)); Console.WriteLine(); foreach (Def d in dd.Defs) { string type = string.Empty; switch (d.type) { case "a": type = "Adjective"; break; case "n": type = "Noun"; break; case "v": type = "Verb"; break; default: type = ""; break; } Console.WriteLine(string.Format("Type \"{0}\"", type)); Console.WriteLine(); Console.WriteLine(string.Format("\tDefinition \"{0}\"", d.text)); Console.WriteLine(); if (d.Synonym != null && d.Synonym.Count > 0) { Console.WriteLine("\tSynonyms"); foreach (string syn in d.Synonym) Console.WriteLine("\t\t" + syn); } } } } } } static string EmulateWebService() { string result = string.Empty; // The "definition.xml"file is a copy of the test data you provided. using (StreamReader reader = new StreamReader(@"c:\projects\definitiontest\definitiontest\definition.xml")) { result = reader.ReadToEnd(); } return result; } static Definition ParseWordDefinition(string xmlDef) { // Replace any carriage return/line feed characters with spaces. string oneLine = xmlDef.Replace(System.Environment.NewLine, " "); // Squeeze internal white space. string squeezedLine = Regex.Replace(oneLine, @"\s{2,}", " "); // Assumption 1: The first word in the string is always the word being defined. string[] wordAndDefs = squeezedLine.Split(new char[] { ' ' }, StringSplitOptions.None); string wordDefined = wordAndDefs[0]; string allDefinitions = string.Join(" ", wordAndDefs, 1, wordAndDefs.Length - 1); Definition parsedDefinition = new Definition(); parsedDefinition.wordDefined = wordDefined; parsedDefinition.Defs = new List<Def>(); string type = string.Empty; // Assumption 2: All definitions are delimited by a type letter, a number and a ':' character. string[] subDefinitions = Regex.Split(allDefinitions, @"(n|v|a){0,1}\s\d{1,}:"); foreach (string definitionPart in subDefinitions) { if (string.IsNullOrEmpty(definitionPart)) continue; if (definitionPart == "n" || definitionPart == "v" || definitionPart == "a") { type = definitionPart; } else { Def def = new Def(); def.type = type; // Assumption 3. Synonyms always use the [syn: {..},... ] pattern. string realDef = (Regex.Split(definitionPart, @"\[\s*syn:"))[0]; def.text = realDef; MatchCollection syns = Regex.Matches(definitionPart, @"\{([a-zA-Z\s]{1,})\}"); if (syns.Count > 0) def.Synonym = new List<string>(); foreach (Match match in syns) { string s = match.Groups[0].Value; // A little problem with regex retaining braces, so // remove them here. def.Synonym.Add(s.Replace('{', ' ').Replace('}', ' ').Trim()); int y = 0; } parsedDefinition.Defs.Add(def); } } return parsedDefinition; } } public class Def { // Moved your type from Definition to Def, since it made more sense to me. public string type { get; set; } // single character: n or v or a public string text { get; set; } // Changed your synonym definition here. public List<string> Synonym { get; set; } } public class Definition { public string wordDefined { get; set; } // Changed Def to Defs. public List<Def> Defs { get; set; } } } 
+1
source

Why handmade? Let's do everything automatically, because we are programmers!

Right-click on the project, select Add Service Link .
Place http://services.aonaware.com/DictService/DictService.asmx in the Address field.
Set the desired namespace.
You can also specify additional settings by clicking the "Advanced" button.
Click OK.

A set of classes will be created for working with the service.
Then just use these classes.

Please note that the App.config or Web.config of your application displays the settings necessary for using the service. Then we use them.

An example of using these classes (do not forget to specify the namespace to use):

 var client = new DictServiceSoapClient("DictServiceSoap"); var wordDefinition = client.DefineInDict("wn", "abandon"); 

What all!

In the DictServiceSoapClient constructor DictServiceSoapClient we specify the name from the configuration used for binding.

In wordDefinition , we have a query result. Let me get information from him:

 Console.WriteLine(wordDefinition.Word); Console.WriteLine(); foreach (var definition in wordDefinition.Definitions) { Console.WriteLine("Word: " + definition.Word); Console.WriteLine("Word Definition: " + definition.WordDefinition); Console.WriteLine("Id: " + definition.Dictionary.Id); Console.WriteLine("Name: " + definition.Dictionary.Name); } 
0
source

Source: https://habr.com/ru/post/1247414/


All Articles