Great chatbot performance depends on questions you create and approve in your dataset. They are the base on which chatbot calculates similarity and provides automatic responses to other, incoming questions. Please find here a few tips on what makes a good (and bad) question:

Too long questions might confuse the NLP engine in picking the correct category since there might be a lot of keywords and irrelevant topics in the query.

Short and precise questions allow the NLP engine to better process the meaning behind them and provide, in the future, better responses to the incoming questions. 


Too general questions might confuse the NLP engine in picking the correct category since the category-defining keywords are missing. A general question can possibly belong to several categories so please avoid adding those to any category.

Single words or commands are not questions and since a particular word can belong to several categories it should never be trained to just one - it gives that category too much “power” and may lead to category overpowering other, similar ones. 

The exact same or very similar questions trained into different categories may cause confusion, meaning that the chatbot will not be able to decide where, similar questions, should belong. This way, it may start providing wrong answers to new incoming questions. If you notice such an overlap it is good to review the categories and shift similar questions into one category. 

Adding personal information will not only confuse the chatbot but it is against privacy guidelines. Please always avoid adding questions with any kind of personal information like names, contact information, addresses, payment information, and so on.

Questions in foreign to your chatbot languages should not be trained or approved since every dataset works on just a singular language framework (meaning - a chatbot can fully understand just one selected language). Chatbot has limited knowledge of other languages, this is why you may sometimes see it providing correct categories for foreign questions. But in no case, you should approve those questions into your dataset since it may have a really bad impact on the chatbot’s performance.

Small talk questions and insults will be handled before they arrive to your dashboard so creating small talk categories and adding small talk questions is unnecessary. We currently handle several small talk categories, you can read more about them here

Learn more about creating good datasets in the next articles: