当前位置: 动力学知识库 > 问答 > 编程问答 >

PHP regex conditional get content and link from HTML anchor tag

问题描述:

I am trying getting the all anchor tags from a given HTML where the content length is more then 30 chars i.e. if i have this HTML with me

<td><a hreh="anything">Content is more then 30 chars........</a>

<a hreh="anything">another link</a>

</td>

I have write this RegEx for this preg_match_all("/<a href=\"(.*)\"[^>]*>([a-zA-Z0-9]{30,999})<\\/[a-zA-Z]+>/si",

$match[0],$posts);

where 30 is putting the limit of minimum 30 char to anchor tag content but unfortunately this is not working.

Anyone out there who can point out what i have made wrong.

Thanks

Note : I am trying fetching this page URL's

This Link

网友答案:

Would something simple as

<a.*?>.{30,}?</a>

not suffice? The above looks for anchor tags, with their content being 30 characters or more. It does not attempt to validate the href attribute or any other attributes of the link. It can be altered if these are required.

This is translated into preg_match_all as (thanks to @php_nub_qq)

preg_match_all("#<a.*?>.{30,}?</a>#", $match[0],$posts);

The URL you have linked contains letters, numbers, and non-alphanumeric characters in the url string. As you have little control over the source, it might be best to generalise the case like above rather than attempt to white list on a per character basis.

网友答案:

Try this:

preg_match_all("/<a href=\"(.*)\"[^>]*>([a-z\d\s]{30,})<\\/[a-z]+>/si", $match[0],$posts);

Since you have the i case-insensitive modifier, you don't need both a-z and A-Z in your classes. And if you're just setting a minimum length of the content, you don't need to specify a maximum of 999; {30,} means 30 or more.

分享给朋友:
您可能感兴趣的文章:
随机阅读: