I work in Windows 7 and a fragile interactive console (based on IPython).
I take the โAttempts to choose in shellโ step in the tutorial
If I grab some site with a headline of English letters, everything is in order, as in the textbook:
In [5]: hxs.select('//title/text()').re('(\w+):')` Out[5]: [u'Computers', u'Programming', u'Languages', u'Python']`
But if I grab a site with non-English letters (Russian, Unicode), the re () method returns nothing:
In [25]: hxs.select('//title/text()').re('(\w+)') Out[25]: []
There is text in the header, it is not empty:
In [24]: hxs.select('//title/text()').extract() Out[24]: [u'\u041b\u043e\u043a\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0439 \u043f\u043e\u0438\u0441\u043a \u0430\u0431\u043e\u043d\u0435\u043d\u0442\u043e\u0432']
Help me, can I use scrapy 're () with Unicode characters?
source share