100,000 Fine-Tuning text data set for English LLM General Domain SFT
300 million pairs of high-quality image-caption dataset
7 Million Sets - High-Quality Video Caption Dataset
Large Language Model content safety considerations text data
100,000 Instruction-Following Evaluation SFT for Chinese LLM Text Data
200,000 Sets of Multi-country Landmark Buildings Image Caption Data
32 million - Science Subjects Questions Text Parsing And Processing Data
6.03 Million - Majors Questions Text Parsing And Processing Data
6.9 million - Chinese Multi-disciplinary Questions Text Parsing And Processing Data
Japanese OKWAVE Q&A platform Text Parsing and Processing Data
200000 text data in German, Spanish, French, and Italian
250,000 English Animals Medical dataset
140,000,000 - Chinese Judgment Documents Text Parsing And Processing Data
480000 corrected texts in German, Spanish, French, Italian
2.4 million - Korean Test Questions Structured Analysis Processing Data