Resources and Datasets

  • SB-10k: German Sentiment Corpus

SB-10k is a publicly available corpus that contains 9738 German tweets, each labeled by 3 annotators with “positive”, “negative”, “neutral”, “mixed”, or “unknown”. It was created by SpinningBytes in collaboration with the Zurich University of Applied Sciences (ZHAW).
Details and Download

  • SB-CH: Swiss German Sentiment Corpus

SB-CH is a publicly available corpus that contains 165’916 German sentences, of which 2799 are labeled by 5 annotators with “positive”, “negative”, “neutral”, “mixed”, or “unknown”. It was created by SpinningBytes in collaboration with the Zurich University of Applied Sciences (ZHAW).
Details and Download

  • Supplementary Material for Publications

We provide supplementary material for some of our publications, including data and code. Please click the publication title to access the material.

  • Word Embeddings

Word embeddings are used in Natural Language Processing (NLP) to map words to vector representations. They are used, for instance, in deep learning algorithms for named entity extraction, sentiment analysis or chatbots.
We provide publicly available word embeddings for various languages, including English, German, French and several other languages.
Details and Download

Contact Us

captcha