SB-10k: German Sentiment Corpus

SB-10k is a publicly available corpus that contains 9738 German tweets, each labeled by 3 annotators with “positive”, “negative”, “neutral”, “mixed”, or “unknown”. It was created by SpinningBytes in collaboration with the Zurich University of Applied Sciences (ZHAW).


All word embeddings are provided under Creative Commons License CC BY 4.0.
This means that they are free to use and distribute, even commercially, as long as appropriate credit to the reference below is given.
Human-readable format: Link
Licence Contract: Link


If you use the corpus, please make sure to reference the following publication:


  • A Twitter Corpus and Benchmark Resources for German Sentiment Analysis. by Mark Cieliebak, Jan Deriu, Fatih Uzdilli, and Dominic Egger. In “Proceedings of the 4th International Workshop on Natural Language Processing for Social Media (SocialNLP 2017)”, Valencia, Spain, 2017


A detailed description of the corpus and how it was constructed can be found in the reference above.


In order to use the corpus, download the annotations below. Since Twitter does not allow to distribute the content of tweets, the dataset only contains tweet ID’s (first column) and the corresponding annotations (second column). A Python script to download the tweet content for the IDs can be found here*.

*On Windows, you might have to comment the “signal.alarm(…)” calls in to get the script to work

Contact Us
close slider

Please check to consent to your data being stored in line with the guidelines in our Privacy Policy

We are using cookies on our website

Please confirm, if you accept our tracking cookies. You can also decline the tracking, so you can continue to visit our website without any data sent to third party services. For more information please visit our Privacy Policy