The FAQ dataset is divided into categories and a set of, attached to it, questions. Our NLP engine compares every incoming question with all queries available in the dataset. When the similarity is detected the incoming question is matched to the appropriate category and the category’s answer is given to the user by the chatbot.

For this process to function as well as possible it is important to have a clean FAQ dataset with well-defined categories. The categories and questions within the set should not overlap. This means very similar questions should not belong to different categories.

How to create good categories?

Keep in mind that with bigger topics it can make sense to split them into several categories, for example:

It is important in this case to have the category-defining keywords (like an internship, insurance etc.) in the questions. In that way the categories will stay clean and distinctive like in the examples below:

Keep also in mind that for job-specific answers you can use contexts and you should not create separate categories.


Learn more about creating good datasets in the next articles: