Part-of-speech (POS) tagging is the basis of many Natural Language Processing tasks and, nowadays, there exist several algorithms able to determine the POS tag for a specific word. However, the increasing usage of Internet and the explosion of blogs and microblogs changed the way people communicate, and POS taggers trained on structured corpora lost the ability to catch this new tendency. The proposed algorithm is an auxiliary POS tagger which aims at predicting unknown POS tags. It is based on the Bayesian Networks and it uses information regarding POS tags that precede and follow the unknown POS tag. The well-known Brown Corpus and the more recent Ark dataset are the datasets over which the proposed methodology is tested.
A MICROBLOG AUXILIARY PART-OF-SPEECH TAGGER BASED ON BAYESIAN NETWORKS
Silvia Golia
;Paola Zola
2019-01-01
Abstract
Part-of-speech (POS) tagging is the basis of many Natural Language Processing tasks and, nowadays, there exist several algorithms able to determine the POS tag for a specific word. However, the increasing usage of Internet and the explosion of blogs and microblogs changed the way people communicate, and POS taggers trained on structured corpora lost the ability to catch this new tendency. The proposed algorithm is an auxiliary POS tagger which aims at predicting unknown POS tags. It is based on the Bayesian Networks and it uses information regarding POS tags that precede and follow the unknown POS tag. The well-known Brown Corpus and the more recent Ark dataset are the datasets over which the proposed methodology is tested.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.