There is a data set that only knows the category but does not have an annotation for each image. How to evaluate it?