...
For every ML application, and especially one doing NLP, data quality is the biggest factor in performance. Especially with chatbots, datasets tend to be quite small and domain-specific. This is why we have developed a rigorous approach to cleaning and improving our datasets, in addition to constantly working to push the technical capability of our in-house ML/NLP service. For a deeper dive, we have a research article you can refer to regarding the data quality process and a related post on the jobpal developer blog: Why (and How) Explainable AI Matters for Chatbot Design.
We also work on improving the in-house service itself through a variety of state-of-the-art methods. A more technical article is also available that explains how we approach automated evaluation and monitoring of ML/NLP performance. Shorter description and poster slides are also available in our blog post: Plausible Negative Examples for Better Multi-Class Classifier Evaluation