Vertex Client¶
Thin wrapper around three Google SDK clients, providing a unified interface for text generation and multi-speaker audio synthesis.
Clients¶
| Client | SDK | Purpose |
|---|---|---|
tts_client |
google.genai (regional us-central1) |
Vertex AI TTS via gemini-3.1-flash-tts-preview |
genai_client |
google.genai (global) |
Gemini transcript generation |
anthropic_client |
anthropic.AnthropicVertex (global) |
Optional Claude routing |
The regional/global split exists because Vertex AI TTS is only available in certain regional endpoints, while Gemini model access benefits from the global endpoint.
Model routing¶
generate_content() routes automatically:
- Models prefixed with
claude→anthropic_client(AnthropicVertex) - All other models →
genai_client(Google GenAI)
This lets PodcastGeneration swap between Gemini and Claude for transcript generation by changing the model argument.
Audio synthesis¶
synthesize_conversation() synthesizes each (speaker, text) turn individually:
- Looks up the voice name from
speaker_map. - Prepends
[short pause]to all turns after the first (natural inter-speaker breathing room). - Calls the TTS model with
temperature=0.7and aPrebuiltVoiceConfig. - Extracts raw PCM bytes from
response.candidates[0].content.parts[0].inline_data. - Accumulates all PCM frames, then writes a single WAV file.
A 500 ms sleep between turns respects TTS API rate limits.
Module reference¶
the_curator.utils.vertex_client
¶
VertexClient
¶
tts_client = genai.Client(vertexai=True, project=project, location=location, http_options=(HttpOptions(api_version='v1')))
instance-attribute
¶
genai_client = genai.Client(vertexai=True, project=project, location='global', http_options=(HttpOptions(api_version='v1')))
instance-attribute
¶
anthropic_client = anthropic.AnthropicVertex(project_id=project, region='global')
instance-attribute
¶
__init__(project: str, location: str)
¶
generate_content(model: str, contents: str) -> str
¶
synthesize_conversation(transcript: list[tuple[str, str]], speaker_map: dict[str, str], filename: str = 'conversation.wav') -> str
¶
--- Example Usage ---¶
my_transcript = [ ("Eli", "Curator, are the vault logs ready for review?"), ("System", "Affirmative. The logs have been decrypted and are " "currently being indexed for your terminal."), ("Eli", "Excellent. Proceed with the primary sequence.") ]
my_voices = { "Eli": "Charon", # Deep/Human "System": "Puck" # Brighter/System-like } Args: transcript: List of tuples [("Speaker1", "Text"), ("Speaker2", "Text")] speaker_map: Dict mapping speaker names to Vertex AI voice names e.g. {"Eli": "Charon", "System": "Puck"}