当前位置: 动力学知识库 > 问答 > 编程问答 >

python - How to iterate through top words in BigARTM?

问题描述:

I want to write topic name and top words related with that topic.

BigARTM library have been updated to v. 0.8.0 from 0.7.6, so the old code below stopped to work:

for topic_name in model_artm.topic_names:

print topic_name + ': ',

for word in model_artm.score_tracker["top_words"].last_topic_info[topic_name].tokens:

print word,

print

The problem is related with the second cycle, there is no such last_topic_info, according to the official manual, we need artm.score_tracker.TopTokensScoreTracker, we should write something like this:

model_artm.score_tracker["topTokes1"].last_tokens[topic_name].value #doesn't work.

Any ideas what is wrong?

网友答案:

There was a small change around BigARTM Score Tracker API betwen v0.7.9 and v0.8.0 here. The following example should work with v0.8.0

import artm
batch_vectorizer = artm.BatchVectorizer(data_path=r'D:\Datasets\kos',
                                        data_format='batches')
dictionary = artm.Dictionary(data_path=r'D:\Datasets\kos')
model = artm.ARTM(num_topics=15,
                  num_document_passes=5,
                  dictionary=dictionary,
                  scores=[artm.TopTokensScore(name='top_tokens_score')])

model.fit_offline(batch_vectorizer=batch_vectorizer, num_collection_passes=3)

top_tokens = model.score_tracker['top_tokens_score']
for topic_name in model.topic_names:
    print '\n', topic_name
    for (token, weight) in zip(top_tokens.last_tokens[topic_name],
                               top_tokens.last_weights[topic_name]):
        print token, '-', weight

For other changes in BigARTM Python API see the release notes: http://docs.bigartm.org/en/stable/release_notes/python.html

分享给朋友:
您可能感兴趣的文章:
随机阅读: