当前位置: 动力学知识库 > 问答 > 编程问答 >

How to use preg match all in php?

问题描述:

Hi i want to retrieve certain information from a website.

This is what is display on the website with html tags.

 <a href="ProductDisplay?catalogId=10051&amp;storeId=90001&amp;productId=258033&amp;langId=-1" id="WC_CatalogSearchResultDisplay_Link_6_3" class="s_result_name">

SALT - Fine

</a>

What i want to extract is "SALT - FINE" using preg match however i do not know why i cant use it. isit because they are all on different line? cos i realise if they are on a single line i can actually retrieve what i want.

This is my code -

$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3.*<\/a>/';

preg_match_all($pattern, $response, $match);

print_r($match);

I do not get anything in my array. if they are on a single line it works?.why is that so?

网友答案:

Have a look at:

http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

especially the m and s modifiers.

Also, I would recommend, changing the pattern to something like:

$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3"[^>]*>(.*)<\/a>/ims';

Otherwise, you'll match the end of your a-tag.

And on a side note, don't use regex to parse html/xml.

Something like this:

<?php
$dom = DOMDocument::loadHtml($response);
$xpath = new DOMXPath($dom);

$node = $xpath->query('//*[@id="WC_CatalogSearchResultDisplay_Link_6_3"]/text()')->item(0);
if ($node instanceof DOMText) {
    echo trim($node->nodeValue);
}

will also work, and will be a lot more robust.

网友答案:

You should encapsulate what you want to match by (). So i guess your pattern would then become

$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3(.*)<\/a>/';

I however don't fully see how you arrived at this pattern, since it would be simpler to just match everything enclosed by a-tags.

Edit: You also need the s modifier as mentioned by Yoshi so the . matches a newline. I would thus suggest you use this code:

$pattern = '/<a[^>]*>(.+)<\/a>/si';
preg_match_all($pattern, $response, $match);
print_r($match);
网友答案:

You're right, it's because it's a multi-line input string.

You need to add the m and s modifiers to the regex pattern to match multiline strings:

$pattern = '/id="WC_CatalogSearchResultDisplay_Link_6_3.*<\/a>/ms';

The m modifier makes it multi-line.

The s modifier makes the . dot match newline characters as well as all others (by default it doesn't match newlines)

分享给朋友:
您可能感兴趣的文章:
随机阅读: