Your Text Analytics Project
Our Three Phase Approach
You have a new
We show you how to realize it
Our onboarding workshops quickly assesses if the problem can be solved with text analytics, and how much resources will be needed.
- Duration: 2 hours
- Cost: CHF 600.00 plus taxes
You want to
Test your Idea
We build a running prototype
We develop a first prototype on your data, evaluate the resulting quality and give a recommendation for implementing a productive solution. This usually takes 5-15 days and shows how good the results will be.
- Duration: 5-15 days
- Cost: Fixed
You want to
We develop a software solution
We implement a software component that solves your text analytics task. This includes definition of architecture and interfaces (e.g. REST), performance optimization (quality and runtime) and deployment in your infrastructure. Time and cost depend on the size of the project.
- Duration: flexible
- Cost: project-dependent
We are multilingual
We can easily apply our technology to any language. Usually we just need a good set of training examples in the new language. If no such training data exists, we can help you generate it.
We already implemented solutions in many languages, including German, Swiss German, French, Italian and Arabic.
Topic Categorization for Swiss News Articles
The Swiss Economic Archive has gathered more than 2.5 million news documents on Swiss economics, politics, and companies since 1910. We developed an automatic topic categorization for approximately 600 pre-defined topics.
Generation of Company Reports
Our partner developed an online platform for an Indian professional service provider with 3000+ employees. We implemented machine learning algorithms that support automatic generation of company reports by finding static company data (e.g. founding year), identifying competitors, and generating company descriptions from online news.
Real-time Integration of Social Media
We automatically matched entries from a customer database with Facebook fans, enriched the data with sentiment analysis on the posts, and integrated the results into the CRM.
German Sentiment Corpus
We manually annotated 10’000 German and 3’000 Swiss-German Texts with sentiment labels (positive, negative, neutral). The corpora are publicly available.
Statistics of Job Offerings in Switzerland
For the annual report of JobCloud, we analyzed sentiment and word statistics for more than 140’000 job offerings.
Our sentiment analysis system uses deep learning to detect the tonality of a text (positive, negative, or neutral). It won two international competitions, SemEval-2016 and Evalita-2016.
Health Risk Prediction from Twitter
We gathered 150 million tweets and correlated public health data on heart diseases and diabetes with frequent twitter topics. Our model can predict mortality rates based only on the tweets of users.
Chatbot for Movie Recommendations
We implemented a chatbot for movie recommendations, which communicates with the user in natural language and selects the most appropriate movies based on the users preferences.
The largest set of data we
worked with contained over
Frequently Asked Questions (FAQ)
Do you have your own software products?
YES. We have developed solutions for many classical text analytics tasks, such as sentiment analysis, named entity recognition, language detection, topic modeling etc. They are state-of-the-art, as we have shown in several international competitions. We can use our solutions to realize your project, but we also integrate third-party services and libraries, e.g. by Microsoft, IBM or other service providers.
The latest software we’ve developed is Interscriber, a tool for automatic transcriptions of audio files in several languages. We use a combination of 5 speech processing engines that consolidate the transcription results for better quality. We also offer the option to run exclusively on local servers to enforce data privacy and security.
Can I get the source code?
YES. We usually hand-over the complete source code to our clients. You get all rights to modify and re-use it. This allows you, for instance, to re-train the machine learning models on your own if your data has changed at some time.
How can we integrate your solution in our infrastructure?
We always aim to deliver a software component with a very simple interface. For instance, our sentiment solution takes a plain text (e.g. a tweet) as input and outputs its sentiment (positive, negative, neutral) and some meta-data in a simple JSON format.
We usually use Docker to wrap our solutions, which makes it easy for you to deploy and scale it on your infrastructure.
Do you also do images, audio or other data types?
Our primary focus is on text analytics, where we are mostly working on. But our team has also experience in other domains, e.g. speaker detection, image classification or predictive analysis.
In addition, we have a huge network of partners in both academia and industry, where we can usually find a suitable expert.
How good will the results be?
There are three aspects to consider:
- In general, the more time we invest, the better gets the solution. Every text analytics algorithms has several parameters that we can optimize, and tuning every step in the process usually improves the results.
- Every problem has an intrinsic complexity. For instance, the best solutions for sentiment analysis have an accuracy of approx. 73% (on tweets, 3 classes positive, negative, neutral), whereas named entity extraction can be solved with up to 93%.
- Your underlying data is important. Many tasks in text analytics are solved with machine learning, and these algorithms are trained and optimized on your data. The more data they have for training, the better the results.
- On premise data analysis: none of your data leaves your house
- Swiss Made Software
- Respect for data protection and privacy
We build technologies for other types of data as well!