Deep Learning based Named Entity Recognition for the Bodo Language

Procedia Computer Science Journal

Feb, 2024

Sanjib Narzary, Anjali Brahma, Sukumar Nandi and Bidisha Som

Abstract

One of the important application of natural language processing (NLP) is Name Entity Recognition (NER). It automatically recognise and categorise named entities in a document. Named Entities can be the name of an individual, group, place, etc. It is crucial to the success of NER applications including text summarization, machine translation and information extraction and retrieval. It is one of the most useful application tools for a variety of topics and languages. Despite its widespread use and effectiveness in English, this field is currently under investigation for other Indian languages, such as Bodo. Due to the lack of resources and a high-quality dataset, NER in Bodo is a difficult task. In this research, a deep learning-based NER tagger is investigated for the Bodo language and NER tagged dataset is generated for Bodo language using Docanno and enlarge the dataset size by employing a data augmentation technique. As there is no Bodo NER baseline model to compared with, we employed several deep learning techniques for Bodo NER System and compared their results. We achieved an accuracy of 99.62%, precision of 99.75%, recall of 98.74% and F-score of 99.35% when employed with LSTM and character based. This study also highlights GRU and CNN based models performance in Bodo NER task.

Citation

@article{NARZARY20242405,
title = {Deep Learning based Named Entity Recognition for the Bodo Language},
journal = {Procedia Computer Science},
volume = {235},
pages = {2405-2421},
year = {2024},
note = {International Conference on Machine Learning and Data Engineering (ICMLDE 2023)},
issn = {1877-0509},
doi = {https://doi.org/10.1016/j.procs.2024.04.228},
url = {https://www.sciencedirect.com/science/article/pii/S1877050924009049},
author = {Sanjib Narzary and Anjali Brahma and Sukumar Nandi and Bidisha Som},
keywords = {Name Entity Recognition, Natural Language Processing (NLP), Long Short Term Memory (LSTM), BiLSTM, CRF},
abstract = {One of the important application of natural language processing (NLP) is Name Entity Recognition (NER). It automatically recognise and categorise named entities in a document. Named Entities can be the name of an individual, group, place, etc. It is crucial to the success of NER applications including text summarization, machine translation and information extraction and retrieval. It is one of the most useful application tools for a variety of topics and languages. Despite its widespread use and effectiveness in English, this field is currently under investigation for other Indian languages, such as Bodo. Due to the lack of resources and a high-quality dataset, NER in Bodo is a difficult task. In this research, a deep learning-based NER tagger is investigated for the Bodo language and NER tagged dataset is generated for Bodo language using Docanno and enlarge the dataset size by employing a data augmentation technique. As there is no Bodo NER baseline model to compared with, we employed several deep learning techniques for Bodo NER System and compared their results. We achieved an accuracy of 99.62%, precision of 99.75%, recall of 98.74% and F-score of 99.35% when employed with LSTM and character based. This study also highlights GRU and CNN based models performance in Bodo NER task.}
}

Paper Link Science Direct