当前位置: 动力学知识库 > 问答 > 编程问答 >

python - How to use NLTK to generate sentences from an induced grammar?

问题描述:

I have a (large) list of parsed sentences (which were parsed using the Stanford parser), for example, the sentence "Now you can be entertained" has the following tree:

(ROOT

(S

(ADVP (RB Now))

(, ,)

(NP (PRP you))

(VP (MD can)

(VP (VB be)

(VP (VBN entertained))))

(. .)))

I am using the set of sentence trees to induce a grammar using nltk:

import nltk

# ... for each sentence tree t, add its production to allProductions

allProductions += t.productions()

# Induce the grammar

S = nltk.Nonterminal('S')

grammar = nltk.induce_pcfg(S, allProductions)

Now I would like to use grammar to generate new, random sentences. My hope is that since the grammar was learned from a specific set of input examples, then the generated sentences will be semantically similar. Can I do this in nltk?

If I can't use nltk to do this, do any other tools exist that can take the (possibly reformatted) grammar and generate sentences?

网友答案:

In NLTK 2.0 you can use nltk.parse.generate to generate all possible sentences for a given grammar.

This code defines a function which should generate a single sentence based on the production rules in a (P)CFG.

# This example uses choice to choose from possible expansions
from random import choice
# This function is based on _generate_all() in nltk.parse.generate
# It therefore assumes the same import environment otherwise.
def generate_sample(grammar, items=["S"]):
    frags = []
    if len(items) == 1:
        if isinstance(items[0], Nonterminal):
            for prod in grammar.productions(lhs=items[0]):
                frags.append(generate_sample(grammar, prod.rhs()))
        else:
            frags.append(items[0])
    else:
        # This is where we need to make our changes
        chosen_expansion = choice(items)
        frags.append(generate_sample,chosen_expansion)
    return frags

To make use of the weights in your PCFG, you'll obviously want to use a better sampling method than choice(), which implicitly assumes all expansions of the current node are equiprobable.

网友答案:

First of all, if you generate random sentences, they may be semantically correct, but they will probably loose their sense.

(It's sounds to me a bit like those MIT students did with their SCIgen program which is auto-generating scientific paper. Very interesting btw.)

Anyway, I never done it myself, but it seems possible with nltk.bigrams, you may way to have a look there under Generating Random Text with Bigrams.

You can also generate all subtrees of a current tree, I'm not sure if it is what you want either.

网友答案:

With an nltk Text object you can call 'generate()' on it which will "Print random text, generated using a trigram language model."http://nltk.org/_modules/nltk/text.html

分享给朋友:
您可能感兴趣的文章:
随机阅读: