Morphology In Statistical Machine Translation From English To Highly Inflectional Language

Mirjam Sepesy Maučec, Gregor Donaj

Abstract


In this paper, we investigate the role of morphology in phrase-based statistical machine translation from  English to highly inflectional Slovenian language. Translation to inflectional language is a  challenging task, because of its morphological complexity.Rich morphology increases data sparsity and worsens the quality of statistical machine translation.To address this issue, we added the morphological information in terms of MSD tags, that were attached to words. MSD tag includes all morphosyntactic informationin position-dependent attributes. Tags were attached to words by TreeTagger. Several experiments were performed using MSD tags to improve the translation results.First, factored translation was studied. Different configurations were tested.  They show that factored translation improves modeling of short distance collocations. To capture long-distance dependencies in languages, OSM models were added in the second set of experiments. Additional improvement was obtained.The overall results show that morphosyntactic information of inflectional language is an important factor in translation. Factored translation with OSM modelsbrought  9% relative improvement.The conclusions of our work can be generalized to other Balto-Slavic languages, as they share to some extend the same morphological characteristics.

DOI: http://dx.doi.org/10.5755/j01.itc.47.1.17887


Keywords


natural language processing; statistical machine translation; inflectional language; morphology

Full Text: PDF

Print ISSN: 1392-124X 
Online ISSN: 2335-884X