Needle in a hay stack an eggscellent eggvanture

7/4/2023

Association for Computational Linguistics. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), pages 11–19, Online. Finding the needle in a haystack: Extraction of Informative COVID-19 Danish Tweets. Anthology ID: 2021.wnut-1.2 Volume: Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021) Month: November Year: 2021 Address: Online Venue: WNUT SIG: Publisher: Association for Computational Linguistics Note: Pages: 11–19 Language: URL: DOI: 10.18653/v1/2021.wnut-1.2 Bibkey: olsen-plank-2021-finding Cite (ACL): Benjamin Olsen and Barbara Plank. We hope the contributed dataset is a starting point for further work in this direction. We find a weighted CNN to work well but it is sensitive to embedding and hyperparameter choices.

We examine how well a simple probabilistic model and a convolutional neural network (CNN) perform on this task. In contrast to prior work, which balances the label distribution, we model the problem by keeping its natural distribution. In this work, we introduce a new dataset of 5,000 tweets for finding informative COVID-19 tweets for Danish.

Prior work focused on a balanced data setup and on English, but informative tweets are rare, and English is only one of the many languages spoken in the world. Abstract Finding informative COVID-19 posts in a stream of tweets is very useful to monitor health-related updates.

0 Comments

Needle in a hay stack an eggscellent eggvanture

Leave a Reply.

Author

Archives

Categories