当前位置: 动力学知识库 > 问答 > 编程问答 >

php - Ignore html tags on preg_match

问题描述:

Im scrapping a site with following html

<a class="name" href="/link" data-hovercard-id="charshere"><span class="highlighted">War</span> World</a>

<a class="name" href="/link" data-hovercard-id="charshere"> World of <span class="highlighted">fun</span></a>

<a class="name" href="/link" data-hovercard-id="charshere">Save the<br>world</a>

<a class="name" href="/link" data-hovercard-id="charshere">world of warcraft</a>

using this code i get the value of links

preg_match_all('/<a class="name" href=".*?" data-hovercard-id=".*?">(.*)<\/a>/i', $file_string, $titles);

but the outcome is

<span class="highlighted">War</span> World

World of <span class="highlighted">fun</span>

Save the<br>world

world of warcraft

How do i ignore the html tags inside of it? so that it would look like this

 War World

World of fun

Save the world

world of warcraft

A DomDocument could be better. Thanks. been trying to use domDocument but I not familiar how to use its xquery.

网友答案:

Use strip_tags(). Here comes an example:

$html = <<<EOF
<span class="highlighted">War</span> World
 World of <span class="highlighted">fun</span>
Save the<br>world
world of warcraft
EOF;

echo strip_tags($html);

Output:

War World
 World of fun
Save theworld
world of warcraft
网友答案:

Just remove the tags after you get the text:

<?php
$string = '<span class="highlighted">War</span> World
 World of <span class="highlighted">fun</span>
Save the<br>world
world of warcraft';
$convert = preg_replace('/<.*?>/','', $string);
print $convert;

Prints:

War World
 World of fun
Save theworld
world of warcraft
网友答案:

You can remove the HTML tags after you've matched your string for the links. For example

$str = preg_replace('/<[^<]+>/', '', $html);
分享给朋友:
您可能感兴趣的文章:
随机阅读: