PyLadies Amsterdam
Online,
It is so easy to build a PoC using LLMs and it is so hard to turn it into a production-grade LLM application. To succeed you should have a robust evaluation framework during development and post-deployment, not to mention its automation, as manual reviews usually do not scale.This workshop focuses on the evaluation of LLM based applications powered by Python open-source libraries. We start with the exploration of different evaluation techniques. Further, we incorporate them into our test suite. At last, we lay the monitoring foundation.