I have a situation where I have a lot of
As you can see, the second last tag is empty. When I call:
Which gives me:
['12', '13', '14', '121']
I would like to have:
['12', '13', '14', '', '121']
Is there a way to get the empty value?
My current work around is to call:
And then parsing through each html tag myself (the empty tags are here, which is what I want).
This is where it is okay to manually strip the tags and get the text. You can use
remove_tags() function provided by
>>> from w3lib.html import remove_tags >>> map(remove_tags, sel.xpath('//b').extract()) [u'12', u'13', u'14', u'', u'121']
w3lib is a Scrapy dependency and is used internally. No need to install it separately.
Also, it would be better to use
Scrapy Input and Output Processors here. Continue using
sel.xpath('b') and define an input processor. For example, you can define it for specific
Fields for the
from scrapy.contrib.loader.processor import MapCompose from scrapy.item import Item, Field from w3lib.html import remove_tags class MyItem(Item): my_field = Field(input_processor=MapCompose(remove_tags))