A simple question like python / Beautiful Soup

Question

A simple question like python / Beautiful Soup

I am trying to perform some simple manipulations with the href attribute of a hyperlink retrieved using Beautiful Soup :

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<a href="http://www.some-site.com/">Some Hyperlink</a>')
href = soup.find("a")["href"]
print href
print href[href.indexOf('/'):]

All I get is:

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    print href[href.indexOf('/'):]
AttributeError: 'unicode' object has no attribute 'indexOf'

How do I convert everything hrefto a normal string?

+3

python string beautifulsoup

Justin Jul 20 '09 at 12:05

source share

3 answers

href - . ,

regular_string = str(href)

0

Marius 20 . '09 12:11

find(), indexOf().

Python .

0

hughdbrown 20 . '09 13:47

codeape · Accepted Answer · 2009-07-20T12:08:10+0000

Python strings do not have a method indexOf.

Use href.index('/')

href.find('/')is similar. But findreturns -1if the string is not found, but indexcalls a ValueError.

So, it’s correct to use index(since "..." [- 1] will return the last character of the string).

A simple question like python / Beautiful Soup

More articles: