Skip to content

Vertex Client

Thin wrapper around three Google SDK clients, providing a unified interface for text generation and multi-speaker audio synthesis.

Clients

Client SDK Purpose
tts_client google.genai (regional us-central1) Vertex AI TTS via gemini-3.1-flash-tts-preview
genai_client google.genai (global) Gemini transcript generation
anthropic_client anthropic.AnthropicVertex (global) Optional Claude routing

The regional/global split exists because Vertex AI TTS is only available in certain regional endpoints, while Gemini model access benefits from the global endpoint.

Model routing

generate_content() routes automatically:

  • Models prefixed with claudeanthropic_client (AnthropicVertex)
  • All other models → genai_client (Google GenAI)

This lets PodcastGeneration swap between Gemini and Claude for transcript generation by changing the model argument.

Audio synthesis

synthesize_conversation() synthesizes each (speaker, text) turn individually:

  1. Looks up the voice name from speaker_map.
  2. Prepends [short pause] to all turns after the first (natural inter-speaker breathing room).
  3. Calls the TTS model with temperature=0.7 and a PrebuiltVoiceConfig.
  4. Extracts raw PCM bytes from response.candidates[0].content.parts[0].inline_data.
  5. Accumulates all PCM frames, then writes a single WAV file.

A 500 ms sleep between turns respects TTS API rate limits.

Module reference

the_curator.utils.vertex_client

VertexClient

tts_client = genai.Client(vertexai=True, project=project, location=location, http_options=(HttpOptions(api_version='v1'))) instance-attribute

genai_client = genai.Client(vertexai=True, project=project, location='global', http_options=(HttpOptions(api_version='v1'))) instance-attribute

anthropic_client = anthropic.AnthropicVertex(project_id=project, region='global') instance-attribute

__init__(project: str, location: str)

generate_content(model: str, contents: str) -> str

synthesize_conversation(transcript: list[tuple[str, str]], speaker_map: dict[str, str], filename: str = 'conversation.wav') -> str

--- Example Usage ---

my_transcript = [ ("Eli", "Curator, are the vault logs ready for review?"), ("System", "Affirmative. The logs have been decrypted and are " "currently being indexed for your terminal."), ("Eli", "Excellent. Proceed with the primary sequence.") ]

my_voices = { "Eli": "Charon", # Deep/Human "System": "Puck" # Brighter/System-like } Args: transcript: List of tuples [("Speaker1", "Text"), ("Speaker2", "Text")] speaker_map: Dict mapping speaker names to Vertex AI voice names e.g. {"Eli": "Charon", "System": "Puck"}