I am not too familiar with perl, but if I wanted to do this, I would use some web searches. Firstly, I would find a good set of templates.
I went with this regex:
(\d{3})\sU\.S\.\s(\d{3})
Regular Expression Distribution:
- (\ d {3}) → looks for 3 numbers, puts them in $ 1
- \ sU.S. \ s → looks for spaces followed by US followed by other spaces.
- (\ d {3}) → looks for 3 numbers again, putting them in $ 2
What this means is that he searches for template 539 US 306 and places them in capture groups. This results in the following values ​​in the variables:
$1 = 539 $2 = 306
I would go through and find each instance of the template, then I would use something to grab this site from the Internet:
http://supreme.justia.com/cases/federal/us/$1/$2/case.html
Which in this case would become:
http://supreme.justia.com/cases/federal/us/539/306/case.html
As soon as I had this, I could go through the site tree for the following (I put the whole tree here, because depending on the language, how you do it can be changed):
<body> <div id="main"> <div id="container"> <div id="maincontent"> <h1> HERE IS THE TITLE OF THE CASE </h1>
xpath of this parameter //*[@id="maincontent"]/h1 .
Now you have the full link:
Grutter v. Bollinger - 539 US 306 (2003)
I'm not a lawyer, so I don’t know if there are other ways to declare them (one of the other answers mentioned something like F.3d ), then it will take a different approach to capture it. If I get some time later, I can write this in PowerShell to see how I do it.
source share