MOSTLY AI debuts synthetic text functionality for training LLMs without exposing PII

MOSTLY AI, a pioneer in structured synthetic data, is unveiling a new synthetic text functionality that preserves the privacy of proprietary data assets for training AI models. This capability enables organizations to leverage a vast range of text data—such as emails, customer support transcripts, chatbot conversations, and more—to train and fine-tune large language models (LLMs) without risking a privacy breach. 

Sensitive data, whether customer information or proprietary business assets, is as common of a component of modern business as it is critical to protect. As pursuits toward AI implementation remain strong, protecting personally identifiable information (PII) from being exposed to LLMs is a top priority. Yet, LLMs inherently need ample amounts of data to improve its outputs—leading to a significant challenge for many enterprises.

Enter synthetic data, a synthetic representation of data that mimics the statistical properties of real data, reflecting both the text and its underlying structured insights without exposing PII. Synthetic data also offers an efficiency advantage in training LLMs because it serves as an optimized version of data that doesn’t require tedious, manual labor to create. 

“Synthetic data allows companies to create realistic, anonymized datasets that can be used for testing, model training, and analytics without exposing sensitive information. This helps maintain compliance with regulations while still enabling AI and analytics innovation,” explained Tobais Hann, CEO of MOSTLY AI. “In many enterprise settings, especially in industries like finance or healthcare, collecting real-world data can be difficult or expensive. Synthetic data helps fill the gap where real-world data is either scarce or unavailable, allowing businesses to train AI models on large datasets without waiting for real events to generate the necessary data.”

MOSTLY AI’s synthetic text functionality now empowers enterprises to securely create a complete, statistically accurate representation of their proprietary data assets that can fine-tune and deliver high-quality generative AI (GenAI) solutions without compromising privacy. Outside of its promise of safety and compliance, MOSTLY AI’s synthetic text functionality delivers a performance improvement as much as 35% compared to text produced by GPT-4o-mini, according to the company. 

“Only very few vendors in the synthetic data category offer synthetic text functionality. Enterprises choose MOSTLY AI because we are able to provide the highest quality synthetic data available on the market today, and our ease of use and flexibility within the platform is unparalleled,” said Hann. “MOSTLY AI is also the only platform to offer structured data, the combination of structured and text data, and the ability to easily select LLMs from Hugging Face.”

With the ability to take any model from Hugging Face and fine-tune it with proprietary text data to generate synthetic data, enterprises benefit from the optimization of a traditionally complex process. This is particularly beneficial for large organizations that host high volumes of data, as now, they can more easily and seamlessly leverage the power of creative, private, high-quality synthetic text, according to MOSTLY AI. 

“This new functionality is an integral part of the MOSTLY AI Platform,” noted Hann. “One important aspect of the platform is its ability to run in isolation within a secure enterprise environment, a capability that extends even to synthetic text. We allow users to select and combine various Generative AI models—including LLM models from Hugging Face and proprietary MOSTLY AI models—to produce synthetic data of the highest quality and with the highest privacy guarantees. Additionally, many companies today are unable to fully leverage the power of GenAI for their use cases because of privacy restrictions. This new functionality within the MOSTLY AI Platform removes those barriers, enabling innovation across different areas including customer service.” 

