How to evaluate AI inference
Running AI models is typically expensive, as it requires a lot of power to perform the millions of computations necessary for large language models like ChatGPT to work.
As much as 90% of a model's time can be spent on inference, which continues after the model is deployed; training is a one-time cost. So, in active models, training tends to be less important than inference.
There are several ways to evaluate the inferential abilities of large language models. Speed is a key differentiator between models, and models with better inferencing capabilities will result in faster response times.
Cost is another area worth evaluating, especially if you're paying to run the model, and there are tools and techniques that can help reduce the cost of running expensive AI models. Arm's (ARM -1.42%) CPU architecture, used in almost every smartphone, is being adopted for AI because of its efficiency.
Accuracy is also a valuable way of judging inference capabilities, and inference and training both determine the accuracy of a model. Accuracy can be easily tested by posing the same questions to different models and seeing which ones give the best answers and make the fewest mistakes.