Morphology In Statistical Machine Translation From English To Highly Inflectional Language

  • Mirjam Sepesy Maučec University of Maribor Faculty of Electrical Engineering and Computer Science
  • Gregor Donaj University of Maribor Faculty of Electrical Engineering and Computer Science
Keywords: natural language processing, statistical machine translation, inflectional language, morphology

Abstract

In this paper, we investigate the role of morphology in phrase-based statistical machine translation from  English to highly inflectional Slovenian language. Translation to inflectional language is a  challenging task, because of its morphological complexity.Rich morphology increases data sparsity and worsens the quality of statistical machine translation.To address this issue, we added the morphological information in terms of MSD tags, that were attached to words. MSD tag includes all morphosyntactic informationin position-dependent attributes. Tags were attached to words by TreeTagger. Several experiments were performed using MSD tags to improve the translation results.First, factored translation was studied. Different configurations were tested.  They show that factored translation improves modeling of short distance collocations. To capture long-distance dependencies in languages, OSM models were added in the second set of experiments. Additional improvement was obtained.The overall results show that morphosyntactic information of inflectional language is an important factor in translation. Factored translation with OSM modelsbrought  9% relative improvement.The conclusions of our work can be generalized to other Balto-Slavic languages, as they share to some extend the same morphological characteristics.

DOI: http://dx.doi.org/10.5755/j01.itc.47.1.17887

Author Biographies

Mirjam Sepesy Maučec, University of Maribor Faculty of Electrical Engineering and Computer Science

Associate Professor

Laboratory for Digital Signal Processing

Institute of Electronic and Telecommunications

Faculty of Electrical Engineering and Computer Science

University of Maribor

Gregor Donaj, University of Maribor Faculty of Electrical Engineering and Computer Science

Teaching Assistant

Laboratory for Digital Signal Processing

Institute of Electronic and Telecommunications

Faculty of Electrical Engineering and Computer Science

University of Maribor

Published
2018-02-05
Section
Articles