Regular expression for legal link

How would you create a regular expression to capture a legal link? Here is a paragraph that shows two typical legal quotes:

We insisted on strict control in every context, even for so-called “benign” racial classifications, such as race consciousness, university admission policies, see Grutter v. Bollinger, 539 US 306, 326 (2003), race-based preferences in government contracts, see Adarand, supra, 226, and race-based zoning designed to improve minority representation, see Shaw v. Reno, 509 US 630, 650 (1993).

The quote will either be preceded by a comma and a space, a period and a space, or a “signal”, such as “see” or “see, for example,” and spaces. It’s hard for me to figure out how to accurately indicate the beginning of a quote.

I am most familiar with Perl regular expressions, but I can also understand examples from other languages.

+6
source share
7 answers

In your example, you preceded quotes from what BlueBook considers to be a “signal” (rule 1.2 on page 54 of the nineteenth edition). Other signals include, but are not limited to: for example, agreement, also, compare, and, together with, contra and but. They can be combined in unexpected and unexpected ways., See Also, for example. Watts v. United States, 394 US 705 (1969) (per smokers). Of course, there are also quotes that are not preceded by signals.

Then you will also want to process quotes with unexpected names:

See v. Seattle, 387 US 541 (1967)

Others attack this particular problem by first identifying the reporter’s link (i.e. 387 US 541) with a regular expression like (\ d +) \ s (. +?) \ S (\ d +), and then trying to expand the range from there. Quoting links can be arbitrarily complex, so this path is not without its own traps. Reporter links can also take some interesting forms according to BlueBook rules:

Jones vs. Smith, _ F.3d _ (2011)

For solutions that have not yet been published, for example. Of course, the authors will use variations of the above, including (but not limited to) --- F.3d ---

+3
source

This, of course, is not ideal, but without unnecessary examples, to test against him the best that I can come up with. Thanks to @Paul H. for adding additional signal words.

#!/usr/bin/perl $search_text = <<EOD; "We have insisted on strict scrutiny in every context, even for so-called "benign" racial classifications, such as race-conscious university admissions policies, see Grutter v. Bollinger, 539 US 306, 326 (2003), race-based preferences in government contracts, see Adarand, supra, at 226, and race-based districting intended to improve minority representation, see Shaw v. Reno, 509 US 630, 650 (1993)." In your example, you've preceded the citations with what the BlueBook deems a 'signal' (Rule 1.2 on page 54 of the nineteenth edition). Other signals include but are not limited to : eg, accord, also, cf., compare, and, with, contra, and but. These can be combined in surprising and unexpected ways . . . See also, eg Watts v. United States, 394 US 705 (1969) (per curiam). Of course, there are also citations that are not preceded by signals Then you'll also want to handle case citations with unexpected case names : See v. Seattle, 387 US 541 (1967) Others have attacked this particular problem by first identifying the reporter reference (ie 387 US 541) with a regular expression like (\d+)\s(.+?)\s(\d+) and then trying to expand the range from there. Case citations can be arbitrarily complex so this path is not without its own pitfalls. Reporter references can also take on some interesting forms as per BlueBook rules: EOD while ($search_text =~ m/(\, |\. |\; )?(see(\,|\.|\;)? |e\.g\.(\,|\.|\;)? |accord(\,|\.|\;)? |also(\,|\.|\;)? |cf\.(\,|\.|\;)? |compare(\,|\.|\;)? |with(\,|\.|\;)? |contra(\,|\.|\;)? |but(\,|\.|\;)? )+(.{0,100}\d+ \(\d{4}\))/g) { print "$12\n"; } while ($search_text =~ m/[\n\t]+(.{0,100}\d+ \(\d{4}\))/ig) { print "$1\n"; } 

Exit:

 Grutter v. Bollinger, 539 US 306, 326 (2003) Shaw v. Reno, 509 US 630, 650 (1993) Watts v. United States, 394 US 705 (1969) See v. Seattle, 387 US 541 (1967) 
+3
source

Ok, you can use the following at the beginning. You will need more templates for other launches.

 /(, )|(see )/ 

The end will be a big problem. For example, in “see Adarand, supra, at 226, and race-based ...” there is no clear indicator of the end. I suspect that regular expressions are not enough for this task, you need a higher form of language analysis. Or be content with matching only a subset of all quotes, or sometimes you find too many.

+2
source

I am using http://gskinner.com/RegExr/ to check this syntax

 (?<=see )\w+ v. \w+, \d{3} U\.S\. \d{3}, \d{3} \(\d{4}\) 

As you can see, I am using the "Positive lookbehind"

0
source

I wrote a template (created for JavaScript, since you did not specify a language) that can be used to map the two links mentioned:

 var regex = /(\w+\sv.\s\w+,\s\d*\s?[\w.]*[\d,\s]*\(\d{4}\))/ig; 

You can see it in action here .

It will match the others if they follow the same format:

 Name v. Name, 999 AAA.. 999, 999 (1999) 

Although the presence of some parts becomes optional. Please provide additional information about links that may not match this template if they do not meet your needs.

0
source

For this kind of potentially complex regular expression, I try to break it down into simple parts that can be individually tested and developed.

I use REL , DSL (in Scala), which allows you to reassemble and reuse regex elements. So you can define your regex as follows:

 val NAME = α+ val VS = """v\.?""" ~ """s\.?""".? val CASE = NAME("name1") ~ " " ~ VS ~ " " ~ NAME("name2") val NUM = ß ~ (δ+) ~ ß val US = """U\.? ?S\.? """ val YEAR = ( ("1[7-9]" | "20") ~ δ{2} )("year") val REF = CASE ~ ", " ~ // "Doe v. Smith, " (NUM ~ " ").? ~ // "123 " (optional) US ~ NUM ~ // "US 456" (", " ~ NUM).* ~ // ", 678" 0 to N times " \\(" ~ YEAR ~ "\\)" // "(1999)" 

Then check each bit as follows:

 "NUM" should { "Match 1+ digits" in { "1" must be matching(NUM) "12" must be matching(NUM) "123" must be matching(NUM) "1234" must be matching(NUM) "12345" must be matching(NUM) "123456" must be matching(NUM) } "Match only standalone digits" in { NUM.findFirstIn(" 123 ") must beSome("123") NUM.findFirstIn(" n123 ") must beNone } } 

Additionally, your unit / spec tests can double as your document for this regular expression bit, indicating what matches and what doesn't (which tends to be important when using regular expressions).

I made a gist for this example with the first naive implementation.

In the next version of REL (0.3), you can directly export Regex in, for example, a PCRE flute to use it independently ... Currently, only JavaScript and .NET transformations are implemented, so you can just run the sample using SBT, and it will output a Java-flavored regular expression (although quite simple, I think it can be copied / pasted into Perl).

0
source

I am not too familiar with perl, but if I wanted to do this, I would use some web searches. Firstly, I would find a good set of templates.

I went with this regex:

 (\d{3})\sU\.S\.\s(\d{3}) 

Regular Expression Distribution:

  • (\ d {3}) → looks for 3 numbers, puts them in $ 1
  • \ sU.S. \ s → looks for spaces followed by US followed by other spaces.
  • (\ d {3}) → looks for 3 numbers again, putting them in $ 2

What this means is that he searches for template 539 US 306 and places them in capture groups. This results in the following values ​​in the variables:

 $1 = 539 $2 = 306 

I would go through and find each instance of the template, then I would use something to grab this site from the Internet:

 http://supreme.justia.com/cases/federal/us/$1/$2/case.html 

Which in this case would become:

 http://supreme.justia.com/cases/federal/us/539/306/case.html 

As soon as I had this, I could go through the site tree for the following (I put the whole tree here, because depending on the language, how you do it can be changed):

 <body> <div id="main"> <div id="container"> <div id="maincontent"> <h1> HERE IS THE TITLE OF THE CASE </h1> 

xpath of this parameter //*[@id="maincontent"]/h1 .

Now you have the full link:

 Grutter v. Bollinger - 539 US 306 (2003) 

I'm not a lawyer, so I don’t know if there are other ways to declare them (one of the other answers mentioned something like F.3d ), then it will take a different approach to capture it. If I get some time later, I can write this in PowerShell to see how I do it.

0
source

Source: https://habr.com/ru/post/895000/


All Articles