
200,000 Sets of Multi-country Landmark Buildings Image Caption Data

600 Hours - Greek Real-world Casual Conversation and Monologue speech dataset

600 Hours - Norwegian Real-world Casual Conversation and Monologue speech dataset

3D High-Fidelity Synthetic Data - DMS

Japanese OKWAVE Q&A platform Text Parsing and Processing Data

500 Hours - Tamil Scripted Monologue Smartphone Speech Dataset

500 Hours - Lao Scripted Monologue Smartphone Speech Dataset

Chinese Multi-emotional Modal particle and Natural Conversation Speech Synthesis Corpus

Gujatati(India) Scripted dialogue speech dataset

100 Hours - Burmese Speech Data by Mobile Phone_Reading

100,000 Fine-Tuning text data set for English LLM General Domain SFT

30 Million High-quality Video Data

80 Million Vector Image Data

200 Million High-quality Image Data

2 People - Cantonese Multi-emotional Natural Conversation Speech Synthesis Corpus

500,000 Images - Natural Scenes and Documents OCR Data

30,000 Images - Natural Scenes OCR Data in Southeast Asian Languages

100,000 Sets of ICONS Image Caption Data

6.9 million - Chinese Multi-disciplinary Questions Text Parsing And Processing Data

1 million - Chinese Code Questions Text Parsing And Processing Data
. . .