Vocal Image Raises USD 3.6M to Expand AI-Powered Soft-Skills Coaching Globally

0
  • Tallinn-based Vocal Image transforms voice training into AI-driven soft-skills coaching for global users
  • Founders leverage expertise in voice, engineering, and business to scale personalized coaching
  • App adapts to accents, emotional state, and professional contexts for individualized improvement
  • Following USD 3.6M Seed investment, Vocal Image’s plans include expanding the development team, adding languages, and preparing for Series A

This September, the famous Estonian AI-driven voice training app provider Vocal Image received a Seed injection of USD 3.6M. The well-known Paris-based fund EduCapital (invested in Preply, among others) led the round, and other participants included another well-known fund Specialist VC (invested in Flowstep, among others) and Generations Fund.

From YouTube Beginnings to AI-Powered Voice Coaching

Nick Lahoika, Co-Founder and CEO at Vocal Image

Vocal Image is an app built around proprietary AI that develops personalized voice-training programs and tracks each user’s progress. The project began in 2018 as a modest Russian-language YouTube channel offering short vocal exercises. As the audience rapidly expanded, co-founders Nick Lahoika (CEO), Mikalai Karaliou (CTO), and Maryna Shukiurava (Voice Guru) turned it into the largest channel in its niche.

The growing community soon began requesting customized advice, far beyond what the YouTube format could provide. It became evident that this format was too limiting, and the team needed an intelligent system capable of delivering individualized guidance at scale. This insight led to the creation of the Vocal Image app, launched in early 2021.

Maryna Shukiurava, a seasoned vocal coach and performer, brings deep expertise to the platform. Russian-speaking audiences may know her voice from Warner Brothers productions, her band Shuma, or her work training Sviatlana Tsikhanouskaya’s team. Paired with Mikalai Karaliou’s two decades of full-stack engineering experience and Nick Lahoika’s business background, the founders combined their strengths to transform a simple YouTube initiative into a comprehensive AI-powered vocal training solution.

Expanding into Full Soft-Skills Coaching

Vocal Image dedicated the following few years to growing beyond a simple voice-training tool into a fully-fledged soft-skills coaching system. The realization came that improving only the voice wasn’t enough and what people actually needed was help with the entire spectrum of communication. This includes clarity, rhythm, posture, emotional tone, and confidence. So it was quite natural for Vocal Image to expand in this direction.

‘The market pushed us there as well. Soft-skills training is a USD 39B industry, but most AI efforts still focus on transcription or synthetic voices. We chose a different path. We are using AI to understand how people actually sound, and how they can sound better. Coaching is about repetition and feedback, and technology allows both to be consistent. It remembers where you struggle, nudges you when you need it, and tracks progress long after a traditional course would have been forgotten,’ Mr Lahoika shares.

Operationally, Vocal Image also grew up significantly as a company. They won a number of notable startup competitions, including Meta x Hugging Face European AI Startup Program and the AWS AI Challenge. The team now includes people who previously built products at Bolt, Classpass, Joom, Manychat, Palta, PandaDoc, Prisma Labs, and Wargaming.

Comprehensive Communication Coaching with a Focus on Voice-Centered Soft Skills

Géraldine Fillet, Associate at EduCapital

Today, the Vocal Image app goes beyond improving pitch, tone, or articulation, and helps users develop broader communication skills essential in professional and social contexts. These include clarity, confidence, pacing, emotional expressiveness, and structured speaking. The system combines AI-driven analysis with personalized exercises and progress tracking, allowing users to receive individualized guidance that adapts to their performance over time. With Ms Shukiurava’s professional vocal coaching expertise, the app can coach subtleties of voice delivery, presence, and persuasive communication, making Lahoika’s claim largely accurate.

In practice, this means focus on voice-centered soft skills, such as articulation, pacing, confidence, emphasis, and expressiveness, which are measurable and trainable through AI. Thus, Vocal Image bridges traditional voice training and communication coaching, offering a scalable solution for improving presentation and speaking skills, focusing on voice as the medium.

AI-Enabled Personalization

Vocal Image’s method tailors coaching to individuals quite deeply, taking into account such nuances as possible speech impediments, regional accents, emotional state, or anxiety levels.  There are also specialized tracks (e.g. legal, medical, performance arts) that require different styles of speaking.

According to the team, this is one of the core strengths of AI coaching. It allows the user to create the exact situations you need to practice, instead of being limited to generic templates or the curriculum of a traditional course. While it is possible to find a live coach online without much difficulty, finding a single person to cover professional context, accent work, and body language at the same time remains unrealistic. Each area usually requires a different specialist.

‘What we are building is a different way to work on soft skills, a system that lets you train multiple dimensions of communication in one place, all adapted to individual needs,’ Mr Lahoika states.

Building Reliable Feedback

He assures that the app’s AI-feedback (on clarity, confidence, etc.) does not systematically favor certain accents, dialects, or speech patterns over others. According to Vocal Image, such bias is impossible because AI doesn’t have preferences and cannot ‘like’ one accent more than another. Instead, it recognizes how a delivery sounds in a particular moment, for instance, clearer, more confident, or less so.

‘Our model is trained on human feedback, it reflects how people perceive a delivery, not an opinion of its own. It notes what the result is this time and how it changes with practice. The accent itself is simply reported as part of the signal. To keep this objective, we rely on scale. We used the Law of Large Numbers to stabilize highly subjective qualities. With a large dataset of human ratings, individual biases cancel out, and the assessment becomes consistent across accents and speaking styles,’ Mr Lahoika explains.

The primary difficulty, as Vocal Image adds more languages, lies in building a reliable rating system because while some users rate carefully, others simply click through. A system is needed to distinguish quality feedback from noise, and Vocal Image solves this by giving more weight to users with consistent participation and reliable judgment, filtering out sporadic input. I.e. a curated community of genuine soft skills learners proves essential. Because this process is continuous, the model keeps learning the nuances of each new language directly from people who use it. This makes the coaching feel natural and relevant as the platform expands.

GDPR Compliance and Security Against Misuse

Marie-Christine Levet, Co-Founder and Partner at EduCapital

To manage consent, storage, anonymization, and potential misuse of all the voice data collected from over a million unique human voices, Vocal Image naturally follows GDPR, assuring that the data is stored only while a user stays with the platform (and for six months after they leave). The data is used for training only in aggregate, not as identifiable samples. The recordings are anonymous and not linked to identities (Vocal Image doesn’t even insist on users using real name). Such design prevents from tracing a specific voice back to a specific person. Users, in turn, can delete their recordings permanently in a straightforward process. 

‘Any voice can be misused today. Even a few seconds from a phone call is enough to create a deepfake. What we can control is access. Our dataset is stored on secure AWS servers in Europe and is fully anonymous, so individual voices cannot be linked back to specific people. Because of this design, the data cannot be extracted or used by malicious actors,’ Mr Lahoika adds.

Investor Perspectives

Educapital’s co-founder Marie-Christine Levet believes that AI presents a big tsunami for the education and training world. With more than 4 million users across 190+ countries and USD 12M ARR today, Vocal Image—the fund’s portfolio company as of recently—is a significant part of this tsunami.

‘We are proud to back Vocal Image in its mission to democratize communication coaching through conversational AI. Their mission perfectly resonates with our investment thesis which focuses on human capital and social impact,’ EduCapital’s associate Géraldine Fillet adds.

Sebastian Wahl, Partner at Generations Fund

‘At Generations Fund, we saw Vocal Image as a true disruptor of the traditional voice and communication coaching industry. Where conventional training is costly, limited, and difficult to scale, Vocal Image leverages AI to deliver personalized, accessible, and highly effective voice training to millions globally. The team has already proven strong traction, and with its multi-language capabilities it is uniquely positioned to capture huge underserved markets. We believe their technology creates a powerful moat through data and personalization, and we are excited to back a company that can redefine how people everywhere build confidence and communicate in the digital age,’ Generations Fund’s partner Sebastian Wahl tells ITKeyMedia.

Funding to Power Further Expansion and Enhancements

The new funding enables Vocal Image to expand its development team and accelerate platform enhancements, including adding additional language support beyond English, Spanish, German, French, Ukrainian, and Russian. With these upgrades, the app responds to the growing demand for personalized AI coaching, becomes more globally accessible, and delivers a more rounded user experience.

‘We’re entering an important new phase and are now meeting with investors as we prepare for our Series A. The company is scaling quickly, and this next stage is about supporting that growth,’ Mr Lahokia wraps up.

Transforming voice training into a comprehensive AI-driven soft-skills coaching platform, Vocal Image helps users improve clarity, confidence, and overall communication across multiple languages. With its recent Seed funding of USD 3.6M from Educapital, Specialist VC, and Generations Fund, the company is empowered to enhance the app further and reach a larger audience globally. Leveraging personalized AI feedback and a vast dataset, Vocal Image is democratizing high-quality communication coaching and positioning itself as a major player in the growing soft-skills market.

Share.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.