Recommended Way to Share Dates Presented in Various Formats

I have a set of dates in the form of strings entered by users over a specific period of time. Since they come from people with little or no verification, the formats entered for dates vary greatly. The following are some examples (leading numbers are for reference only):

  • August 20, 21, 1897
  • May 31, June 1, 1909.
  • January 29, 2007
  • May 10, 11, 12, 1954.
  • March 26, 27, 28, 29, 30, 2006
  • November 27, 28, 29, 30, December 1, 2006

I would like to parse these dates in C # to end up with sets of DateTime objects, with one DateTime object per day. Thus, (1) above will result in 2 DateTime objects and (6) will lead to 5 DateTime objects.

+3
source share
2 answers

I thought about it, and the solution became obvious. Label the string and parse the tokens in reverse order. This will retrieve the year, then the month and day (s). Here is my solution:

// **** Start definition of the class bcdb_Globals ****
public static class MyGlobals
{
    static Dictionary<string, int> _month2Int = new Dictionary<string, int>
    {
        {"January", 1},
        {"February", 2},
        {"March", 3},
        {"April", 4},
        {"May", 5},
        {"June", 6},
        {"July", 7},
        {"August", 8},
        {"September", 9},
        {"October", 10},
        {"November", 11},
        {"December", 12}
    };
    static public int GetMonthAsInt(string month)
    {
        return( _month2Int[month] );
    }
}


public class MyClass
{
    static char[] gDateSeparators = new char[2] { ',', ' ' };

    static Regex gDayRegex = new Regex("[0-9][0-9]?(st|nd|rd|th)");
    static Regex gMonthRegex = new Regex("January|February|March|April|May|June|July|August|September|October|November|December");
    static Regex gYearRegex = new Regex("[0-9]{4}");

    public void ParseMatchDate(string matchDate)
    {
        Stack matchDateTimes = new Stack();
        string[] tokens = matchDate.Split(gDateSeparators,StringSplitOptions.RemoveEmptyEntries);
        int curYear = int.MinValue;
        int curMonth = int.MinValue;
        int curDay = int.MinValue;

        for (int pos = tokens.Length-1; pos >= 0; --pos)
        {
            if (gYearRegex.IsMatch(tokens[pos]))
            {
                curYear = int.Parse(tokens[pos]);
            }
            else if (gMonthRegex.IsMatch(tokens[pos]))
            {
                curMonth = MyGlobals.GetMonthAsInt(tokens[pos]);
            }
            else if (gDayRegex.IsMatch(tokens[pos]))
            {
                string tok = tokens[pos];
                curDay = int.Parse(tok.Substring(0,(tok.Length-2)));
                // Dates are in reverse order, so using a stack means we'll pull em off in the correct order
                matchDateTimes.Push(new DateTime(curYear, curMonth, curDay));
            }
        }

        // Now get the datetimes
        while (matchDateTimes.Count > 0)
        {
            // Do something with dates here
        }
    }

}

0
source

I would recommend processing them for generalization (basically, removing numbers and names and making them the owners of places), and then grouping them in similar formats so that you have a group of samples that you can work with.

For example, 20th, 21st August 1987then it becomes [number][postfix], [number][postfix] [month] [year](provided that a is <number><st|th|rd|nd>recognized as a number and postfix, and the months are obvious, and the years are four-digit numbers).

, , , . , , , , ( , , , (#th[, $th[, ...]]) .)


, , , ( , ). , , :

(.*?)([0-9]{4})(?:, |$)

(.*?)(January|February|...)(?:, |$)

, :

(?:([0-9]{1,2})(?:st|nd|rd|th)(?:, )?)*(?:, |$)

. , , . , , .


, . , , , - , . , , . , PHP :

  • PHP
  • , , .: :

, . .

<?php
  $samples = array(
    '20th, 21st August 1897',
    '31st May, 1st June 1909',
    '29th January 2007',
    '10th, 11th, 12th May 1954',
    '26th, 27th, 28th, 29th, 30th March 2006',
    '27th, 28th, 29th, 30th November, 1st December 2006',
    '30th, 31st, December 2010, 1st, 2nd January 2011'
  );

  //header('Content-Type: text/plain');

  $months = array('january','february','march','april','may','june','july','august','september','october','november','december');

  foreach ($samples as $sample)
  {
    $dates = array();

    // find yearly information first
    $yearly = null;
    if (preg_match_all('/(?:^|\s)(?<month>.*?)\s?(?<year>[0-9]{4})(?:$|,)/',$sample,$yearly))
    {//var_dump($yearly);
      for ($y = 0; $y < count($yearly[0]); $y++)
      {
        $year = $yearly['year'][$y];
        //echo "year: {$year}\r\n";

        $monthly = null;
        if (preg_match_all('/(?<days>(?:(?:^|\s)[0-9]{1,2}(?:st|nd|rd|th),?)*)\s?(?<month>'.implode('|',$months).')$/i',$yearly['month'][$y],$monthly))
        {//var_dump($monthly);
          for ($m = 0; $m < count($monthly[0]); $m++)
          {
            $month = $monthly['month'][$m];
            //echo "month: {$month}\r\n";

            $daily = null;
            if (preg_match_all('/(?:^|\s)(?<day>[0-9]{1,2})(?:st|nd|rd|th)(?:,|$)/i',$monthly['days'][$m],$daily))
            {//var_dump($daily);
              for ($d = 0; $d < count($daily[0]); $d++)
              {
                $day = $daily['day'][$d];
                //echo "day: {$day}\r\n";

                $dates[] = sprintf("%d-%d-%d", array_search(strtolower($month),$months)+1, $day, $year);
              }
            }
          }
        }
        $data = $yearly[1];
      }
    }

    echo "<p><b>{$sample}</b> was parsed to include:</p><ul>\r\n";
    foreach ($dates as $date)
      echo "<li>{$date}</li>\r\n";
    echo "</ul>\r\n";
  }
?>

20, 21 1897 . :

  • 8-20-1897
  • 8-21-1897

31 , 1 1909 :

  • 6-1-1909

29 2007 . :

1-29-2007

10, 11, 12 1954 . :

  • 5-10-1954
  • 5-11-1954
  • 5-12-1954

26, 27, 28, 29, 30 2006 . :

3-26-2006 3-27-2006 3-28-2006 3-29-2006 3-30-2006

27, 28, 29, 30 , 1 2006 . , :

  • 12-1-2006

30, 31, 2010, 1, 2 2011 , :

  • 12-30-2010
  • 12-31-2010
  • 1-1-2011
  • 1-2-2011

, , http://www.ideone.com/GGMaH

+3

Source: https://habr.com/ru/post/1792686/


All Articles