I mean, even if we resort to using a neural network for checking “is the conversation finished?” That hyper-specialised NN would likely be orders of magnitude cheaper to run than your standard LLM, so you could likely save quite a bit of power/money by using it to filter for the actual LLM, no?
I mean, even if we resort to using a neural network for checking “is the conversation finished?” That hyper-specialised NN would likely be orders of magnitude cheaper to run than your standard LLM, so you could likely save quite a bit of power/money by using it to filter for the actual LLM, no?