<< List of all deliverables

D4.1.2: Analysis Tools for Multimodal Born-digital Social Media

Project: FIN-CLARIAH
Grant agreement: Research Council of Finland no. 358720
Start date: 01-01-2024
Duration: 24 months

WP 4.1: Report on analysis tools for multimodal born-digital social media: Nordic Tweet Stream (NTS)
Date of reporting: 18-12-2024

Report author: Mikko Laitinen (UEF)
Contributors: Paula Rautionaho (UEF), Masoud Fatemi (UEF), Mehrdad Salimi (UEF)
Deliverable location: https://nordictweetstream.fi/

Description

The Nordic Tweet Stream (NTS) is a monitor corpus of geolocated tweets and associated metadata from the Nordic region covering over 11 years from 2013 to 2023. It is accessible through a graphic interface that allows users to search, subset, visualize, and download extremely large-scale user-generated data from one social media application.

The objective of this digital interface is to enable easy access to and distribution of born-digital data for basic research. We have recently witnessed the closing down of free access to various digital sources because of the APIcalypse (Bruns 2019) and feel that, despite restrictive measures by social media giants, it is extremely important to store cultural heritage from social media. We operate according to the FAIR Data Principle. The guiding principles of FAIR aim at making data findable, accessible, interoperable, and reusable (Wilkinson et al. 2016).

The NTS provides data spanning from January 2013 to May 2023, encompassing over 900 million tokens from more than 73 million messages, generated by nearly 900,000 individuals. The dataset includes content in 73 languages. The largest languages are Swedish (c. 31 %), English (c. 26 %) and Finnish (c. 13 %). Detailed information of the material is found in the Statistics pages of the interface.

The NTS dataset is intended for use by researchers across various disciplines, including sociolinguistics, dialectology, social sciences, and cultural studies. It can serve as both primary data and supplementary material alongside structured corpus data. This interface is designed for users seeking quick access to the data. Advanced users, however, may prefer to utilize the download function to retrieve the data for further processing in other environments.

Publications

Laitinen, M., Lundberg, J., Levin, M., & Martins, R. M. 2018. The Nordic Tweet Stream: A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data. In DHN 2018 Digital Humanities in the Nordic Countries 3rd Conference: Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference Helsinki, Finland, pp. 349–362. https://erepo.uef.fi/handle/123456789/6697

Events

NTS presented in the following event:

References

  • Bruns, Axel. 2019. After the ‘APIcalypse’: Social media platforms and their fight against critical scholarly research. Information, Communication & Society, 22(11), 1544–1566, doi: 10.1080/1369118X.2019.1637447
  • Wilkinson, M. D. et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 160018. doi:10.1038/sdata.2016.18

 
FIN-CLARIAH project has received funding from the European Union – NextGenerationEU instrument and is funded by the Research Council of Finland under grant number 358720.
 

Search the Language Bank Portal:
Sofoklis Kakouros
Researcher of the Month: Sofoklis Kakouros

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information