I am clearing a table using Python with queries and lxml. Data from the table was found using tree.xpath and added to the lists that are written to the CSV file. Unfortunately, the rows in one of the columns of the table contain commas that change the number of values in the list.
Example:
from lxml import html
import requests
page = requests.get('http://url.com/table')
tree = html.fromstring(page.content)
list1 = tree.xpath('//*[@id=block]/div/tr[*]/td[1]/a/text()')
list2 = tree.xpath('//*[@id=block]/div/tr[*]/td[2]/a/text()')
The table I'm scraping:
Column1 | Column2
A,B,C X
D,E Y
F,G,H Z
Current output:
print list1
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
print list2
['X', 'Y', 'Z']
Preferred Output:
print list1
['a b c', 'd e', 'f g h']
print list2
['x', 'y', 'z']
I am having trouble finding the right solution. Is there an easy way to remove commas from values or keep commas when using a different separator in the list? Thanks for the help!
Edit: Here is the creator of the CSV.
csv_out = open('file.csv', 'wb')
writer = csv.writer(csv_out, dialect = 'excel-tab')
writer.writerows(list, list2, etc)
csv_out.close()