Parsing a large string literal in JS with regular expression into an array of objects

So, I'm new to programming, but I'm trying to learn JavaScript. I am currently working on a project where I am trying to parse a large text file (154 Shakespeare's sonnets are found here ) into an array object in the following data structure:

var obj = { property 1: [ 'value 1', 'value 2', ], property 2: [ 'value 1', 'value 2', ], 

etc., where Roman numerals represent the properties of an object, and each line of a sonnet represents a value in each array of properties.

I have to use regular expressions to parse a text file. So far, I have been looking for the correct regular expression to delimit the text, but I don't know if I will do it right. Ultimately, I want to create a drop-down menu in which each value in the list is a sonnet.

Edit: I actually now take the source code from this URL: http://pizzaboys.biz/xxx/sonnets.php

and do the same as above, but instead of doing $ get, I put the text in a variable ...

I tried this:

 $(document).ready(function(){ var data = new SonnetizerArray(); }); function SonnetizerArray(){ this.data = []; var rawText = "text from above link" var rx = /^\\n[CDILVX]/$\\n/g; var array_of_sonnets = rawText.exec(rx); for (var i = 0; i < array_of_sonnets.length; i ++){ var s = $.split(array_of_sonnets[i]); if (s.length > 0) this.data.push(s); } } 
+4
source share
1 answer

Description

This regular expression will parse the text into a Roman numeral and body. Then the body can be split into a new line \n .

^\s+\b([CDMLXVI]{1,12})\b(?:\r|\n|$).*?(?:^.*?)(^.*?)(?=^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$)|\Z)

enter image description here

Capture groups

Group 0 gets the entire compliance section

  • gets a roman numeral
  • gets the body of the partition, not counting the Roman numeral

Javascript Code Example:

Sample text extracted from your link

  VII Lo! in the orient when the gracious light Lifts up his burning head, each under eye Doth homage to his new-appearing sight, VIII Music to hear, why hear'st thou music sadly? Sweets with sweets war not, joy delights in joy: Why lov'st thou that which thou receiv'st not gladly, Or else receiv'st with pleasure thine annoy? IX Is it for fear to wet a widow eye, That thou consum'st thy self in single life? Ah! if thou issueless shalt hap to die, The world will wail thee like a makeless wife; 

Code example

 <script type="text/javascript"> var re = /^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$).*?(?:^.*?)(^.*?)(?=^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$)|\Z)/; var sourcestring = "source string to match with pattern"; var results = []; var i = 0; for (var matches = re.exec(sourcestring); matches != null; matches = re.exec(sourcestring)) { results[i] = matches; for (var j=0; j<matches.length; j++) { alert("results["+i+"]["+j+"] = " + results[i][j]); } i++; } </script> 

Output example

 $matches Array: ( [0] => Array ( [0] => VII Lo! in the orient when the gracious light Lifts up his burning head, each under eye Doth homage to his new-appearing sight, [1] => VIII Music to hear, why hear'st thou music sadly? Sweets with sweets war not, joy delights in joy: Why lov'st thou that which thou receiv'st not gladly, Or else receiv'st with pleasure thine annoy? [2] => IX Is it for fear to wet a widow eye, That thou consum'st thy self in single life? Ah! if thou issueless shalt hap to die, The world will wail thee like a makeless wife; ) [1] => Array ( [0] => VII [1] => VIII [2] => IX ) [2] => Array ( [0] => Lo! in the orient when the gracious light Lifts up his burning head, each under eye Doth homage to his new-appearing sight, [1] => Music to hear, why hear'st thou music sadly? Sweets with sweets war not, joy delights in joy: Why lov'st thou that which thou receiv'st not gladly, Or else receiv'st with pleasure thine annoy? [2] => Is it for fear to wet a widow eye, That thou consum'st thy self in single life? Ah! if thou issueless shalt hap to die, The world will wail thee like a makeless wife; ) [3] => Array ( [0] => VIII [1] => IX [2] => ) ) 

Checking a numeric digit in numbers

The above expression only checks the Roman numeric string, consisting of Roman numeric characters, it does not actually verify the correctness of the number. If you need to confirm that the Roman numeral is also formatted correctly, you can use this expression.

^\s+\b(M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3}))\b(?:\r|\n|$).*?(?:^.*?)(^.*?)(?=^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$)|\Z)

enter image description here

+3
source

Source: https://habr.com/ru/post/1485913/


All Articles