当前位置: 动力学知识库 > 问答 > 编程问答 >

python - lookbehind in a for loop

问题描述:

How are you doing?

Am kinda stuck with this problem, I need to use a for loop to find a word that ends with 'ing' and is preceded by a tag that is IN, I come from a background of C and java and there its easy to do, but I cant yet grasp how to do it in python!!

I searched around and here is what I think i need to do:

for word, tag in list:

if word.endswith('ing'):

//use regular expression here which should look like this '(?<=\bIN\b)ing'

now ofcourse there are some problems there, first I the I need to look at the previous tag not word, the regular expression is probably is wrong and more importantly this just sounds too complicated, am I missing something here, is there a way to just use the index of the word ending with 'ing' to look at the tag behind it like I would have done using java for example??

Thank you in advance and sorry if its a silly question, its like my second time trying to write python and am still rusty with it =)

EDIT: more explanation on what I need to do, and an example here is what am trying to solve, sometimes pos_tag mistakes a VBG for a noun, so i need to write a method that given a tagged list (for example [('Cultivate', 'NNP'), ('peace', 'NN'), ('by', 'IN'), ('observing', 'NN'), ('justice', 'NN')] corrects this problem and returns [('Cultivate', 'NNP'), ('peace', 'NN'), ('by', 'IN'), ('observing', 'VBG'), ('justice', 'NN')] ) notice how observing has changed

EDIT2: problem solved now, here is the solution def transform(li):

for i in xrange(len(li)):

if li[i][0].endswith('ing') and i > 0 and li[i-1][1]:

li[i] = (li[i], 'VBG')

thank you guys all for your help =D appreciated it

网友答案:

Based on your comment, sounds like you want this:

def transform(li):
    new_li = []
    prev_tag = None
    for word, tag in li:
        if word.endswith('ing') and prev_tag == 'NN':
            tag = 'VBG'
        new_li += [(word, tag)]
        prev_tag = tag
    return new_li

You can also do this in-place:

def transform(li):
    for i in xrange(len(li)):
        if li[i][0].endswith('ing') and i > 0 and li[i-1][1]:
            li[i] = (li[i], 'VBG')

Note that I renamed list to li. list is the type-name for a Python list and overriding it is a bad idea.

网友答案:

This does the change in place

for index,(word, _tag) in enumerate(li):
    if word.endswith('ing') and i > 0 and li[index-1][1] == 'IN':
        li[index] = word, 'VBG'

enumerate allows you to iterate over a list in a foreach fashion, but also get access to the current index. I quite like it, but I sometimes worry if I overuse it and should use something like for i in xrange(10): ... instead.

网友答案:
previousWord = ""
previousTag = ""

for word, tag in list:
    if word.endswith('ing'):
       //use regular expression here which should look like this '(?<=\bIN\b)ing'
       //use previousWord and previousTag here
    previousWord = word
    previousTag = tag
网友答案:

Your solution is somewhat driven by having immutable tuples as the data pairs in your list. The easiest way then is to create the new list you want in total:

li=[('Cultivate', 'NNP'), 
    ('peace', 'NN'), 
    ('by', 'IN'), 
    ('observing', 'NN'), 
    ('justice', 'NN')]

lnew=[]    

for word, tag in li:
    if word.endswith('ing') and tag == 'NN':
        tag='VBG'
    lnew.append((word,tag))

for word, tag in lnew:
    print word, tag

Somewhat wasteful if you have thousands or millions...

If this is your data and your format that you control, you may wish to consider using a dictionary instead of a list of tuples. Then you can loop the dictionary through more naturally and modify in place:

ld={'justice': 'NN', 'Cultivate': 'NNP', 'peace': 'NN', 
    'observing': 'NN', 'by': 'IN'}

for word, tag in ld.items():
    if word.endswith('ing') and tag == 'NN':
       ld[word]='VBG'

In large data sets, the dictionary approach is faster and more memory efficient. Consider that.

分享给朋友:
您可能感兴趣的文章:
随机阅读: