Korp moved to a new server, with some fixes and changes
The Korp service of the Language Bank of Finland was moved to a new server on 12 November 2024. Korp also got a few minor fixes and changes listed below. We apologize for some features having been broken for a long time.
If something does not work as before, please send feedback either via the feedback form or by email to fin-clarin (at) helsinki.fi
.
Fixes and changes:
- The time interval selector (text attribute time interval) in the extended search works again.
- The representation of the text attributes containing the identified language(s) of a sentence, paragraph and text has been changed. The changes affect the Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland version 2 and the Suomi24 2018–2020 corpus. The internal representations of the attributes are intact, so they can be used in the CQP expressions of the advanced search as before. The changes are the following:
- A language is always represented by its three-letter ISO 639-3 code.
- If a language code has a translation, it is shown as a tooltip in the sidebar of the KWIC result when hovering over the code.
- A language code in the KWIC sidebar is a link to the page of the language in question on the SIL’s ISO 639-3 site.
- The extended search has a selection list for language codes (sentence only).
- The attribute label includes the language code standard (ISO 639-3).
- In the Suomi24 2001–2020 corpus, the text attribute name sentence polarity has been changed to sentence sentiment polarity and the internal name of the attribute (used e.g. in the extended search) has been changed from
sentence_polarity
to sentence_sentiment_polarity
.
- In The Finnish Dialect Corpus of the Syntax Archive (LA-murre), The Corpus of English as a Lingua Franca in Academic Settings (ELFA) and ScotsCorr, the search results of the extended search include matches with punctuation marks and annotations represented as tokens between the tokens explicitly specified in the extended search. Such tokens thus need not be explicitly taken into account in the extended search expression. This feature was present in the “old Korp” (Korp 5) that was shut down in June 2024.
- The ScotsCorr corpus finally works in this Korp version. In addition, the name of the text attribute script type (secondary) is now shown correctly.
- The video links in the Route to A wing Corpus (Reittidemo) work again.
For some more details, please see the corresponding news items on the Korp newsdesk.