I am using xpath in Python 2.7 with lxml:
from lxml import html
tree = html.fromstring(source)
results = tree.xpath(...xpath string...)
Now the problem is the xpath string and am getting quite lost in this. I am trying to get all the nodes from one path like this:
There are no missing entries in this part and this works fine. But I'm also trying to get a part relative to this as well, like so:
This works fine by itself, but (2) may or may not have nodes for each node in (1). What I would like to do is to have a default value for if (2) is missing/empty for each (1), say "absent". This sounds straightforward and maybe it is, but I'm hitting a brick wall here.
By doing '(1) | (2)' I get all the values needed, but no way to match them. If I do '(1) | concat((2), "absent")', this doesn't work either - concat doesn't seem to work in python, though I've read with xpath that it is valid. I saw here the "Becker method", but that doesn't work either (or I can't get it to).
Hopefully, someone can shine a light on how to get this working or if it's even possible.
Don't make this more complicated than it is:
path1 = '//a[@class="hyperlinkClass"]/span' path2 = './following-sibling::div[@class="divClassName"]/span[@class="spanClassName"]' for link in tree.xpath(path1): other_node = link.xpath(path2) if len(other_node): print(link.text, other_node.text) else: print(link.text, 'n/a')