Meta works on open-source Vietnamese-language dataset for AI development

Monday, March 17, 2025 | 19:18:39

(VOVWORLD) - Meta Corporation and the Vietnam National Innovation Center have launched project ViGen to create high-quality open-source Vietnamese datasets to enhance Vietnamese language representation in AI, while fuelling rapid and sustainable economic growth in Vietnam.

Meta works on open-source Vietnamese-language dataset for AI development - ảnh 1

The launch ceremony of project ViGen, part of the Vietnam Innovation Challenge 2025 (Photo: chinhphu.vn)

At the launch ceremony in Hanoi last Friday, Sarim Aziz, Director of Public Policy at Meta, said that the initiative aims to elevate the performance and adoption of AI technologies in Vietnam.

“We ensure free and easy access for researchers in Vietnam, developers, startups, and businesses and those around the world that want to serve Vietnam. More importantly this data platform will not only accelerate AI research in Vietnam, but also foster an inclusive community that will drive innovation in both private and public sectors. It’ll propel Vietnam into a new era of the nation's rise,” said Sarim.

Meta will contribute open-source datasets from its AI and public-interest data programs, including insights on mobility and social connectivity, as well as training data from AI-powered population maps.

More than 99% of the data for AI is in English and other languages, and less than 1% of AI data in Vietnamese. Developing large-scale, high-quality, open-source Vietnamese-language datasets for AI training and evaluation has become an urgent priority.