当前位置: 动力学知识库 > 问答 > 编程问答 >

perl - Tokenizer in moses-SMT system stuck even with 10 sentences

问题描述:

I was trying to make a baseline MT system. Just for checking How it works I made Source (S) and Target (T) language corpus of just 2000 sentences. The very first step is to prepare the data for Machine Translation (MT) system. In this step first we have to perform tokenization as mentioned here Baseline SMT. I've used this code:

~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en \

< ~/corpus/training/news-commentary-v8.fr-en.en \

> ~/corpus/news-commentary-v8.fr-en.tok.en

~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l fr \

< ~/corpus/training/news-commentary-v8.fr-en.fr \

> ~/corpus/news-commentary-v8.fr-en.tok.fr

( say S = French & T = English)

I checked after 2 hours it was still running. I got curious since it was not expected. Then I tried with just ten sentences. To my surprise, it's been 30 minutes and it is still running.

Did I do anything wrong?

PS: OS = Ubuntu 14.04.5 LTS

Sony ultrabook

No dual boot.

分享给朋友:
您可能感兴趣的文章:
随机阅读: