Morphology In Statistical Machine Translation From English To Highly Inflectional Language
Keywords: natural language processing, statistical machine translation, inflectional language, morphology
AbstractIn this paper, we investigate the role of morphology in phrase-based statistical machine translation from English to highly inflectional Slovenian language. Translation to inflectional language is a challenging task, because of its morphological complexity.Rich morphology increases data sparsity and worsens the quality of statistical machine translation.To address this issue, we added the morphological information in terms of MSD tags, that were attached to words. MSD tag includes all morphosyntactic informationin position-dependent attributes. Tags were attached to words by TreeTagger. Several experiments were performed using MSD tags to improve the translation results.First, factored translation was studied. Different configurations were tested. They show that factored translation improves modeling of short distance collocations. To capture long-distance dependencies in languages, OSM models were added in the second set of experiments. Additional improvement was obtained.The overall results show that morphosyntactic information of inflectional language is an important factor in translation. Factored translation with OSM modelsbrought 9% relative improvement.The conclusions of our work can be generalized to other Balto-Slavic languages, as they share to some extend the same morphological characteristics.
Copyright terms are indicated in the Republic of Lithuania Law on Copyright and Related Rights, Articles 4-37.