Attention based English-Bodo Neural Machine Translation System for Tourism Domain

3rd International Conference on Computing Methodologies and Communication (ICCMC)

March, 2019

Sanjib Narzary, Maharaj Brahma, Bobita Singha, Rangjali Brahma, Bonali Dibragede, Sunita Barman, Sukumar Nandi and Bidisha Som

Abstract

Bodo language is a relatively low resource language. Other than the text-book, novels and some print publication of newspaper, there appears to be very few resources available in the public domain. As the technology becomes affordable there is a growing number of active Bodo internet users. It requires a technology that can bring information in their own language. Machine translation appears to be a promising solution for that purpose. In this work we build an English-Bodo Neural Machine Translation by adopting a two layered bidirectional Long Short Term Memory (LSTM) cells that can capture the long term dependencies. As very few work has been done on English-Bodo NMT, we make our baseline model which produced a BLEU Score of 11.8 . We then gradually overcome the baseline model by introducing several attention mechanism. We achieved a BLEU Score of 16.71 using the approach presented in Bahdanu. Furthermore we got a better BLEU score of 17.9 when we introduced beam search with a beam width of 5. We found that the model performs very well despite the few dataset available.

Citation

@INPROCEEDINGS{8819699,
  author={Narzary, Sanjib and Brahma, Maharaj and Singha, Bobita and Brahma, Rangjali and Dibragede, Bonali and Barman, Sunita and Nandi, Sukumar and Som, Bidisha},
  booktitle={2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)}, 
  title={Attention based English-Bodo Neural Machine Translation System for Tourism Domain}, 
  year={2019},
  volume={},
  number={},
  pages={335-343},
  abstract={Bodo language is a relatively low resource language. Other than the text-book, novels and some print publication of newspaper, there appears to be very few resources available in the public domain. As the technology becomes affordable there is a growing number of active Bodo internet users. It requires a technology that can bring information in their own language. Machine translation appears to be a promising solution for that purpose. In this work we build an English-Bodo Neural Machine Translation by adopting a two layered bidirectional Long Short Term Memory (LSTM) cells that can capture the long term dependencies. As very few work has been done on English-Bodo NMT, we make our baseline model which produced a BLEU Score of 11.8 . We then gradually overcome the baseline model by introducing several attention mechanism. We achieved a BLEU Score of 16.71 using the approach presented in Bahdanu. Furthermore we got a better BLEU score of 17.9 when we introduced beam search with a beam width of 5. We found that the model performs very well despite the few dataset available.},
  keywords={Decoding;Task analysis;Conferences;Neural Machine Translation;Natural Language Processing;Bodo Language;Low Resource Indian Language;Low Resource Indian Languages},
  doi={10.1109/ICCMC.2019.8819699},
  ISSN={},
  month={March},}