How We Evaluate Large Language Models

Patrycja Cieplicka

Patrycja Cieplicka

Tooploox

Abstract

Good evaluation helps understand what large language models really do. This talk gives a simple overview of how large language models are evaluated in practice. It looks at common open-source benchmarks and tools used to test model behaviour and capabilities. On top of that, recent research trends, common issues, and practical tips for real-world evaluation are covered.

Bio

Patrycja Cieplicka is a Machine Learning Engineer with around six years of experience. At Tooploox, she focuses on Large Language Models, especially post-training, evaluation, and optimization. She holds a degree in Computer Science from the Warsaw University of Technology. In 2024, she was named one of the TOP100 Women in Data Science in Poland.

Sponsors & Partners

Want to become a sponsor? Get in touch!