Resources and Datasets

  • SB-10k: German Sentiment Corpus

SB-10k is a publicly available corpus that contains 9738 German tweets, each labeled by 3 annotators with “positive”, “negative”, “neutral”, “mixed”, or “unknown”. It was created by SpinningBytes in collaboration with the Zurich University of Applied Sciences (ZHAW).
Details and Download

  • SB-CH: Swiss German Sentiment Corpus

SB-CH is a publicly available corpus that contains 165’916 German sentences, of which 2799 are labeled by 5 annotators with “positive”, “negative”, “neutral”, “mixed”, or “unknown”. It was created by SpinningBytes in collaboration with the Zurich University of Applied Sciences (ZHAW).
Details and Download

  • Supplementary Material for Publications

We provide supplementary material for some of our publications, including data and code. Please click the publication title to access the material.

  • Word Embeddings

Word embeddings are used in Natural Language Processing (NLP) to map words to vector representations. They are used, for instance, in deep learning algorithms for named entity extraction, sentiment analysis or chatbots.
We provide publicly available word embeddings for various languages, including English, German, French and several other languages.
Details and Download

Contact Us
close slider

Please check to consent to your data being stored in line with the guidelines in our Privacy Policy


captcha

We are using cookies on our website

Please confirm, if you accept our tracking cookies. You can also decline the tracking, so you can continue to visit our website without any data sent to third party services. For more information please visit our Privacy Policy