How to use regex with multiple data formats

I am trying to extract link CVEs from a website that I control. My regex code worked when the format in line0 was like this (Ref # 8957501) (CVE-2015-3600), but it broke when the format changed to this - (Ref # 555237/92073 / CVE-2015 -9042)

How to extract CVE string using both formats?

Here is my current regex code:

cve_pattern = re.compile(r'(CVE-1999-\d{4,7}|CVE-(200[0-9])-\d{4,7}|CVE-(201[0-9])-\d{4,7})', re.IGNORECASE)
    for cve_number_pattern_match in cve_pattern.finditer(row[0]):
        if cve_number_pattern_match is not None:
            logger.info(cve_number_pattern_match.group(0) + " is located on row " + str(row_num))
            cve_number_list[row_num] = cve_number_pattern_match.group(0)
+4
source share
1 answer

you can use

r'\bCVE[\d-]+'

to match the word boundary, substring CVEand 1 + digits or -. See regex demo .

Or you can use more accurate

r'\bCVE-\d+(?:-\d+)?'

More details

  • \b - upper word boundary
  • CVE- - CVE-
  • \d+ - 1
  • (?:-\d+)? - :
    • - -
    • \d+ - 1 .

.

+2

Source: https://habr.com/ru/post/1663250/


All Articles