当前位置: 动力学知识库 > 问答 > 编程问答 >

beautifulsoup - Python: How to find text of first anchor tag using Beaytifulsoup

问题描述:

i have a html structure like this

<p class="title">

<a href="abc.com">

Story

</a>

<span class="domain">

<a href="xyz.com">comments</a>

</span>

</p>

i want to extract text of first anchor tage that is Story

Here is how i am using Beautifulsoup to extract text from anchor tag

soup = BeautifulSoup(html, 'html.parser')

soup.prettify()

for link in soup.find_all(class_='title'):

print link.findNext('a').text

and output:

Story

Comments

but i want to extract only text of first anchor tag that is Story

How can i do this using BeautifulSoup in python

网友答案:

You can just access the first a tag by doing

print link.a.text

To strip the extra whitespace

link.a.text.strip()
网友答案:

You can do that by chaining the find() calls and using get_text() method:

soup.find("p", class_="title").a.get_text(strip=True)

where .a is equivalent to .find("a") in BeautifulSoup.

Or, with a CSS selector:

soup.select_one("p.title > a").get_text(strip=True)
网友答案:

If you only want the text of the first anchor, then you don't need find using the class.

You didn't say anything about the class="title".

In [9]: html = """
<p class="title">
  <a href="abc.com">
   Story
  </a>
  <span class="domain">
    <a href="xyz.com">comments</a>
  </span>
</p>
"""
In [10]: soup = BeautifulSoup(html, "html.parser")
In [11]: soup.a.text.strip()
Out[11]: u'Story'
分享给朋友:
您可能感兴趣的文章:
随机阅读: