Latest publications

Do metrics currently used to measure model performance yield an adequate reflection of progress in AI?

Whether benchmarking yields an adequate reflection of current AI capacities not only depends on the quality and validity of benchmark datasets but also on the properties of the metrics that are used to assess performance. As a first step in creating insights from the Intelligence Task Ontology, we analysed the prevalence of performance metrics currently used to measure progress in AI based on data covering 32209 reported performance results across 2298 distinct benchmark datasets.