Google Cloud has announced the addition of Lyria, its text-to-music model, to the Vertex AI platform. This makes Vertex AI the first platform to offer generative AI models across all media types: video, image, speech, and music. The move signals Google’s intent to provide businesses with a complete creative production pipeline powered by AI.
Along with Lyria, Google released updates to its existing media models. Veo 2 gets new editing features, Chirp 3 adds voice cloning capabilities, and Imagen 3 improves image manipulation. These updates transform Vertex AI from a collection of standalone generative tools into an end-to-end media creation platform.
The Complete Media Creation Pipeline
With these additions, businesses can now create a full media asset starting with just a text prompt. A marketing team could generate an image with Imagen 3, expand it into a video with Veo 2, add narration with Chirp 3, and finish with background music from Lyria—all within the same platform.
This unified approach solves a key problem for enterprise teams: the need to piece together separate tools for different media types. Previously, companies might use one vendor for image generation, another for video, and purchase music through stock libraries. Google’s strategy eliminates this fragmentation.
Many industry experts view this as the first truly comprehensive media creation suite in the AI space. The ability to maintain visual and brand consistency across different media types within one platform significantly changes how marketing operations can function, allowing for much greater efficiency and cohesion across campaigns.
Lyria: AI Music Generation Enters the Enterprise
Lyria marks Google’s entry into AI music generation for business use. Unlike consumer-focused music AI tools, Lyria is built for enterprise applications in marketing, content creation, and brand experiences.
The model can produce music across genres based on text prompts that specify instruments, tempo, mood, and style. For example, a prompt for “a high-octane bebop tune with dizzying saxophone and trumpet solos” produces jazz music with those specific elements.
For businesses, this offers two main benefits. First, companies can create custom soundtracks aligned with their brand identity without licensing fees. Second, video teams can generate custom background music that perfectly matches their content’s tone and pacing.
This capability fills a major gap in enterprise content creation. While stock music libraries offer convenience, they lack customization. Custom music production, on the other hand, is expensive and time-consuming. Lyria sits in the middle, offering customization at scale.
Veo 2: From Generator to Video Studio
Google has transformed Veo 2 from a text-to-video generator into what they call a “comprehensive video creation and editing platform.” New features include inpainting for removing unwanted elements from videos, outpainting for extending video frames to different aspect ratios, camera controls for directing shot composition and creating effects like timelapses, and interpolation for connecting two video clips with AI-generated transitions.
These features address practical production challenges that businesses face. For example, the ability to convert landscape videos to portrait format solves a common problem for social media teams who need to repurpose content across platforms.
L’Oreal Groupe reports that implementing Veo has changed their creative process, allowing them to expand video production across 20 additional countries and languages. Similarly, Kraft Heinz claims their creative workflows that once took eight weeks now take only eight hours.
These case studies suggest that AI video tools are moving beyond novelty to deliver measurable business value, particularly for global enterprises that need to scale content production across markets.
Chirp 3: Voice Cloning and Speaker Identification
Google’s updates to Chirp 3 focus on two key features: Instant Custom Voice and Transcription with Diarization.
Instant Custom Voice can create a synthetic voice from just 10 seconds of audio input. This allows companies to create consistent voice experiences across customer touchpoints without requiring voice actors to record every piece of content.
For call centers, this means creating personalized automated responses that maintain brand voice consistency. For content creators, it enables efficient localization of audio content across languages.
The Transcription with Diarization feature separates and identifies individual speakers in recordings. This addresses the needs of businesses that handle large volumes of audio from meetings, interviews, or customer calls.
Compared to existing transcription tools, Chirp 3’s speaker identification capability reduces the manual work of tagging speakers in transcripts—a small but significant workflow improvement for teams that process audio content.
Imagen 3: Enhanced Image Editing
Google’s improvements to Imagen 3 focus on editing capabilities rather than generation. The model now offers better inpainting for removing objects and reconstructing missing parts of images.
This shift toward editing tools suggests that businesses are moving beyond basic image generation to more sophisticated workflows that combine AI generation with precise human direction.
For e-commerce teams, this means faster product image editing. For marketing departments, it enables quick adaptation of visual assets across campaigns without starting from scratch.
Enterprise Safeguards and Responsible AI
Google has built several safeguards into these media models. Digital watermarking through SynthID embeds invisible watermarks in all media created by Imagen, Veo, and Lyria. Safety filters provide built-in protections against harmful content creation. Data governance ensures customer data is not used for model training. Copyright indemnity offers protection for businesses against third-party IP claims.
The indemnification policy stands out as particularly important for enterprise adoption. Many businesses have hesitated to use generative AI due to copyright concerns. Google’s willingness to absorb this legal risk removes a significant barrier to adoption.
Strategic Market Positioning
Google’s release positions Vertex AI against Amazon’s Bedrock platform, which offers similar generative AI capabilities. However, Google’s emphasis on a complete media creation suite sets it apart in the enterprise AI market.
By focusing on enterprise media creation rather than general-purpose AI, Google has carved out a specific use case where Vertex AI excels. This strategic focus on creative workflows addresses specific pain points for marketing, content, and brand teams.
Integration Considerations for Technical Teams
For technical teams considering implementation, several factors should guide decision-making. Consider how these models fit into existing content creation processes and what APIs or connectors are needed. Examine Google’s pricing model which charges by usage (for example, Veo 2 costs around 50 cents per second of generated video).
Technical teams should also implement review processes for AI-generated content to maintain brand standards. Assess whether your team needs prompt engineering expertise to effectively use these models. Finally, evaluate if your current cloud infrastructure supports these AI workloads.
Looking Ahead: The Future of AI Media Creation
The introduction of comprehensive media creation tools on Vertex AI signals a shift toward more integrated approaches to content production. As these tools mature, we can expect several developments.
Pre-built prompts optimized for specific business sectors will likely emerge. We’ll see greater emphasis on AI as an assistant rather than a replacement for human creativity. The ability to use one media type (like an image) to generate another (like music that matches its mood) will become more common. Better tools for maintaining brand identity across all generated media will also develop.
What This Means For Your Business
For businesses evaluating how to use these new AI capabilities, start by identifying specific content bottlenecks that could benefit from automation. Common use cases include social media content production, localization of marketing assets, product visualization, internal training materials, and customer support content.
Most importantly, approach implementation with clear goals rather than adopting AI for its own sake. The businesses seeing the greatest return—like L’Oreal and Kraft Heinz—are those using AI to solve specific content production challenges.
As these tools continue to evolve, the key to success will be finding the right balance between AI automation and human creativity. The most effective implementations will use AI to handle repetitive content production tasks while freeing human teams to focus on strategic creative decisions.
What steps will you take to explore how these new AI media creation tools might fit into your content production workflow?