In the creation of datasets or benchmarks it is often the case that, ideally, large quantities of data that underwent high-quality curation by humans are available (e.g., for labeling or annotating data items).
Strategies for incentivizing, funding, organizing and quality-checking such human curation efforts might have high leverage for improving progress directly (via directly increasing the utility of data), and are also likely to improve AI model training and benchmarking.Impact: Very high
- Improving funding for data curation efforts
- Improving workflows for large-scale data curation efforts
- Creating and establishing good standard benchmarks
- Improving the semantic annotation of data
- Improving annotation of datasets with metadata