How to get groups of numbers separated by commas in python?

I have the following text:

Cluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,
       69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,
       202, 212, 220, 227, 250, 252, 253, 259, 262, 267,
       270, 282, 296, 318, 319, 323, 326, 341}
Cluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,
       221, 279, 284, 285, 286, 287, 327, 333, 334, 335,
       336}
Cluster 9: {3, 64, 83, 93, 150, 153, 264, 269, 320, 321, 322}
Cluster 10: {94, 123, 147}

And I want to extract a cluster number in each set.

I tried using regex without much success

I tried:

regex="(Cluster \d+): \{((\d+)[,\}][\n ]+)+|(?:(\d+),[\n ])"

But the groups do not match.

I need a conclusion like:

["Cluster 7", '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', "Cluster 8", '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', "Cluster 9", '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', "Cluster 10", "94", "123", "147"]

Or maybe this is not the best approach for this.

thank

+4
source share
3 answers

You can create a more general regular expression:

import re
s = '\nCluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,\n       69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,\n       202, 212, 220, 227, 250, 252, 253, 259, 262, 267,\n       270, 282, 296, 318, 319, 323, 326, 341}\nCluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,\n       221, 279, 284, 285, 286, 287, 327, 333, 334, 335,\n       336}\nCluster 9: {3, 64, 83, 93, 150, 153, 264, 269, 320, 321, 322}\nCluster 10: {94, 123, 147}\n'
data = re.findall('Cluster \d+|\d+', s)

Conclusion:

['Cluster 7', '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', 'Cluster 8', '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', 'Cluster 9', '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', 'Cluster 10', '94', '123', '147']
+3
source

I would not use regex for this. Your text is in yamlspec and can be loaded directly using the library .

import oyaml as yaml   # pip install oyaml
data = yaml.load(text)

To unpack this dict into the desired β€œflat” structure, this is just a list comprehension:

[x for (k, v) in data.items() for x in (k, *v)]
+4
source

. regex

\w+(?: +\w+)?
  • \w+
  • (?: +\w+)? , :
    • +
    • \w+

,

import re

s = "Cluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,\n       69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,\n       202, 212, 220, 227, 250, 252, 253, 259, 262, 267,\n       270, 282, 296, 318, 319, 323, 326, 341}\nCluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,\n       221, 279, 284, 285, 286, 287, 327, 333, 334, 335,\n       336}\nCluster 9: {3, 64, \n3, 93, 150, 153, 264, 269, 320, 321, 322}\nCluster 10: {94, 123, 147}"
print(re.findall(r"\w+(?: +\w+)?", s))

:

['Cluster 7', '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', 'Cluster 8', '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', 'Cluster 9', '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', 'Cluster 10', '94', '123', '147']
+1

Source: https://habr.com/ru/post/1696223/


All Articles