当前位置: 动力学知识库 > 问答 > 编程问答 >

python - XHTML namespace issues with cssselect in lxml

问题描述:

I have problems using cssselect with a XHTML (or XML with namespace). Although the documentation says how to use namespace in csselect I do not understand it: cssselect namespaces

My Input XHTML string:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<title>Teststylesheet</title>

<style type="text/css">

/*<![CDATA[*/

ol{margin:0;padding:0}

/*]]>*/

</style>

</head>

<body>

</body>

</html>

My Python Script:

parser = etree.XMLParser()

tree = etree.fromstring(xhtmlstring, parser).getroottree()

for style in CSSSelector("style")(tree):

print "HAVE CSS!"

The python script does not print any Have CSS!. Using the etree.HTMLParser instead of etree.XMLParser works but I really want to use the XMLParser and keep everything (namespace, structure) of the XHTML.

Can anybody help me with this namespace problem?

网友答案:

The doc string for cssselect.CSSSelector (version 2.0) shows how to use namespaces:

class CSSSelector(etree.XPath):
    """ ...
    To use CSS namespaces, you need to pass a prefix-to-namespace
    mapping as ``namespaces`` keyword argument::

        >>> rdfns = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
        >>> select_ns = cssselect.CSSSelector('root > rdf|Description',
        ...                                   namespaces={'rdf': rdfns})

        >>> rdf = etree.XML((
        ...     '<root xmlns:rdf="%s">'
        ...       '<rdf:Description>blah</rdf:Description>'
        ...     '</root>') % rdfns)
        >>> [(el.tag, el.text) for el in select_ns(rdf)]
        [('{http://www.w3.org/1999/02/22-rdf-syntax-ns#}Description', 'blah')]
    """

If you've tried this but your version of cssselect.CSSSelector does not have a namespaces parameter, then your version of lxml may need to be upgraded.

分享给朋友:
您可能感兴趣的文章:
随机阅读: