How We Evaluate Large Language Models

Abstract

Good evaluation helps understand what large language models really do. This talk gives a simple overview of how large language models are evaluated in practice. It looks at common open-source benchmarks and tools used to test model behaviour and capabilities. On top of that, recent research trends, common issues, and practical tips for real-world evaluation are covered.

Bio

Patrycja Cieplicka is a Machine Learning Engineer with around six years of experience. At Tooploox, she focuses on Large Language Models, especially post-training, evaluation, and optimization. She holds a degree in Computer Science from the Warsaw University of Technology. In 2024, she was named one of the TOP100 Women in Data Science in Poland.

How We Evaluate Large Language Models

Patrycja Cieplicka

Abstract

Bio

Sponsors & Partners