ondewo/t2s/text-to-speech.proto

Top

Services

Text2Speech

Text2Speech service provides endpoints for text-to-speech generation.

Service Methods

Synthesize

rpc Synthesize (SynthesizeRequest) returns (SynthesizeResponse)
Synthesize RPC Synthesizes a specific text sent in the request with the provided configuration requirements and retrieves a response that includes the synthesized text as audio and the requested configuration.

BatchSynthesize

rpc BatchSynthesize (BatchSynthesizeRequest) returns (BatchSynthesizeResponse)
BatchSynthesize RPC Performs batch synthesis by accepting a batch of synthesis requests and returning a batch response. This can be more efficient for generating predictions on the AI model in bulk.

StreamingSynthesize

rpc StreamingSynthesize (stream StreamingSynthesizeRequest) returns (stream StreamingSynthesizeResponse)
Performs streaming synthesis by accepting stream of input text and returning a stream of generated audio.

NormalizeText

rpc NormalizeText (NormalizeTextRequest) returns (NormalizeTextResponse)
NormalizeText RPC Normalizes a text according to the specific pipeline's normalization rules.

GetT2sPipeline

rpc GetT2sPipeline (T2sPipelineId) returns (Text2SpeechConfig)
GetT2sPipeline RPC Retrieves the configuration of the specified text-to-speech pipeline.

CreateT2sPipeline

rpc CreateT2sPipeline (Text2SpeechConfig) returns (T2sPipelineId)
CreateT2sPipeline RPC Creates a new text-to-speech pipeline with the provided configuration and returns its pipeline ID.

DeleteT2sPipeline

rpc DeleteT2sPipeline (T2sPipelineId) returns (.google.protobuf.Empty)
DeleteT2sPipeline RPC Deletes the specified text-to-speech pipeline.

UpdateT2sPipeline

rpc UpdateT2sPipeline (Text2SpeechConfig) returns (.google.protobuf.Empty)
UpdateT2sPipeline RPC Updates the specified text-to-speech pipeline with the given configuration.

ListT2sPipelines

rpc ListT2sPipelines (ListT2sPipelinesRequest) returns (ListT2sPipelinesResponse)
ListT2sPipelines RPC Retrieves a list of text-to-speech pipelines based on specific requirements.

ListT2sLanguages

rpc ListT2sLanguages (ListT2sLanguagesRequest) returns (ListT2sLanguagesResponse)
ListT2sLanguages RPC Retrieves a list of languages available based on specific configuration requirements.

ListT2sDomains

rpc ListT2sDomains (ListT2sDomainsRequest) returns (ListT2sDomainsResponse)
ListT2sDomains RPC Retrieves a list of domains available based on specific configuration requirements.

ListT2sNormalizationPipelines

rpc ListT2sNormalizationPipelines (ListT2sNormalizationPipelinesRequest) returns (ListT2sNormalizationPipelinesResponse)
ListT2sNormalizationPipelines RPC Retrieves a list of normalization pipelines based on specific requirements.

GetServiceInfo

rpc GetServiceInfo (.google.protobuf.Empty) returns (T2SGetServiceInfoResponse)
GetServiceInfo RPC Retrieves the version information of the running text-to-speech server.

GetCustomPhonemizer

rpc GetCustomPhonemizer (PhonemizerId) returns (CustomPhonemizerProto)
GetCustomPhonemizer RPC Retrieves a custom phonemizer based on the provided PhonemizerId.

CreateCustomPhonemizer

rpc CreateCustomPhonemizer (CreateCustomPhonemizerRequest) returns (PhonemizerId)
CreateCustomPhonemizer RPC Creates a custom phonemizer based on the provided CreateCustomPhonemizerRequest. Returns the PhonemizerId associated with the created custom phonemizer.

DeleteCustomPhonemizer

rpc DeleteCustomPhonemizer (PhonemizerId) returns (.google.protobuf.Empty)
DeleteCustomPhonemizer RPC Deletes a custom phonemizer based on the provided PhonemizerId. Returns an Empty response upon successful deletion.

UpdateCustomPhonemizer

rpc UpdateCustomPhonemizer (UpdateCustomPhonemizerRequest) returns (CustomPhonemizerProto)
UpdateCustomPhonemizer RPC Updates the specified custom phonemizer with the provided configuration.

ListCustomPhonemizer

rpc ListCustomPhonemizer (ListCustomPhonemizerRequest) returns (ListCustomPhonemizerResponse)
ListCustomPhonemizer RPC Retrieves a list of custom phonemizers based on specific requirements.
Messages

Apodization

Apodization message contains settings for apodization postprocessing.

FieldTypeLabelDescription
apodization_secs float

The duration of apodization in seconds.

BatchSynthesizeRequest

BatchSynthesizeRequest message is used to send a batch request for synthesis.

FieldTypeLabelDescription
batch_request SynthesizeRequest repeated

Repeated field holding individual synthesis requests that make up the batch request.

BatchSynthesizeResponse

BatchSynthesizeResponse message is used to store the responses for a batch synthesis request.

FieldTypeLabelDescription
batch_response SynthesizeResponse repeated

Repeated field holding individual synthesis responses that correspond to the input requests in the batch.

Caching

Caching message contains settings for caching.

FieldTypeLabelDescription
active bool

Flag indicating whether caching is active.

memory_cache_max_size int64

The maximum size of the memory cache.

sampling_rate int64

The sampling rate for caching.

load_cache bool

Flag indicating whether to load cache.

save_cache bool

Flag indicating whether to save cache.

cache_save_dir string

The directory path to save the cache.

CompositeInference

CompositeInference message combines text-to-mel and mel-to-audio inference settings.

FieldTypeLabelDescription
text2mel Text2Mel

Text-to-mel inference settings.

mel2audio Mel2Audio

Mel-to-audio inference settings.

CreateCustomPhonemizerRequest

CreateCustomPhonemizerRequest message represents the request for creating a custom phonemizer.

FieldTypeLabelDescription
prefix string

The prefix for the custom phonemizer ID.

maps Map repeated

Repeated field of Map messages representing word-to-phoneme mappings.

CustomPhonemizerProto

CustomPhonemizerProto message represents a custom phonemizer.

FieldTypeLabelDescription
id string

The ID of the custom phonemizer.

maps Map repeated

Repeated field of Map messages representing word-to-phoneme mappings.

GlowTTS

GlowTTS message contains settings for the GlowTTS inference.

FieldTypeLabelDescription
batch_size int64

The batch size for inference.

use_gpu bool

Flag indicating whether to use GPU for inference.

length_scale float

The length scale for inference.

noise_scale float

The noise scale for inference.

path string

The path to the GlowTTS model.

cleaners string repeated

Repeated field containing the cleaners for text normalization.

param_config_path string

The path to the parameter configuration.

GlowTTSTriton

GlowTTSTriton message contains settings for the GlowTTS Triton inference.

FieldTypeLabelDescription
batch_size int64

The batch size for inference.

length_scale float

The length scale for inference.

noise_scale float

The noise scale for inference.

cleaners string repeated

Repeated field containing the cleaners for text normalization.

max_text_length int64

The maximum text length allowed.

param_config_path string

The path to the parameter configuration.

triton_model_name string

The name of the Triton model.

triton_server_host string

The host of the Triton inference server which servers the model.

triton_server_port int64

The port of the Triton inference server which servers the model.

HiFiGan

HiFiGan message contains settings for the HiFiGan inference.

FieldTypeLabelDescription
use_gpu bool

Flag indicating whether to use GPU for inference.

batch_size int64

The batch size for inference.

config_path string

The path to the HiFiGan configuration.

model_path string

The path to the HiFiGan model.

HiFiGanTriton

HiFiGanTriton message contains settings for the HiFiGan Triton inference.

FieldTypeLabelDescription
config_path string

The path to the HiFiGan Triton configuration.

triton_model_name string

The name of the Triton model.

triton_server_host string

The host of the Triton inference server which servers the model.

triton_server_port int64

The port of the Triton inference server which servers the model.

ListCustomPhonemizerRequest

ListCustomPhonemizerRequest message represents the request for listing custom phonemizers.

FieldTypeLabelDescription
pipeline_ids string repeated

Repeated field of pipeline IDs to filter the list of custom phonemizers.

ListCustomPhonemizerResponse

ListCustomPhonemizerResponse message represents the response for listing custom phonemizers.

FieldTypeLabelDescription
phonemizers CustomPhonemizerProto repeated

Repeated field of CustomPhonemizerProto messages representing the custom phonemizers.

ListT2sDomainsRequest

Domain Request representation.

The request message for ListT2sDomains.

Filter domains of pipelines by attributed in request.

FieldTypeLabelDescription
speaker_sexes string repeated

Optional. Define the speaker sex.

pipeline_owners string repeated

Optional. Define the pipeline owner/ owners.

speaker_names string repeated

Optional. Define the speaker name/ names.

languages string repeated

Optional. Define the language/ languages.

ListT2sDomainsResponse

Domains Response representation.

The response message for ListT2sDomains.

FieldTypeLabelDescription
domains string repeated

Required. Define the domain/ domains that satisfy/ies the specifications in the ListT2sDomainsRequest.

ListT2sLanguagesRequest

Language Request representation.

The request message for ListT2sLanguages.

Filter languages of pipelines by attributed in request.

FieldTypeLabelDescription
speaker_sexes string repeated

Optional. Define the speaker sex.

pipeline_owners string repeated

Optional. Define the pipeline owner/ owners.

speaker_names string repeated

Optional. Define the speaker name/ names.

domains string repeated

Optional. Define the domain/ domains.

ListT2sLanguagesResponse

Language Response representation.

The response message for ListT2sLanguages.

FieldTypeLabelDescription
languages string repeated

Required. Define the language/ languages that satisfy/ies the specifications in the ListT2sLanguagesRequest.

ListT2sNormalizationPipelinesRequest

The request message for ListT2sNormalizationPipelines.

Filter pipelines by attributed in request.

FieldTypeLabelDescription
language string

Optional. Define the language.

ListT2sNormalizationPipelinesResponse

Pipeline Response representation.

The response message for ListT2sNormalizationPipelines.

FieldTypeLabelDescription
t2s_normalization_pipelines string repeated

Required. Representation of a list of normalization pipelines configurations. Retrieved by ListT2sNormalizationPipelines, containing the configurations of normalization pipelines with the specifications received in the ListT2sNormalizationPipelinesRequest.

ListT2sPipelinesRequest

Pipeline Request representation.

The request message for ListT2sPipelines.

Filter pipelines by attributed in request.

FieldTypeLabelDescription
languages string repeated

Optional. Define the language/ languages.

speaker_sexes string repeated

Optional. Define the speaker sex.

pipeline_owners string repeated

Optional. Define the pipeline owner/ owners.

speaker_names string repeated

Optional. Define the speaker name/ names.

domains string repeated

Optional. Define the domain/ domains.

ListT2sPipelinesResponse

Pipeline Response representation.

The response message for ListT2sPipelines.

FieldTypeLabelDescription
pipelines Text2SpeechConfig repeated

Required. Representation of a list of pipelines configurations. Retrieved by ListT2sPipelines, containing the configurations of pipelines with the specifications received in the ListT2sPipelinesRequest.

Logmnse

Logmnse message contains settings for Logmnse postprocessing.

FieldTypeLabelDescription
initial_noise int64

The initial noise value.

window_size int64

The window size.

noise_threshold float

The noise threshold.

Map

Map message represents a word-to-phoneme mapping in a custom phonemizer.

FieldTypeLabelDescription
word string

The word to be mapped.

phoneme_groups string

The phoneme groups associated with the word.

MbMelganTriton

MbMelganTriton message contains settings for the MbMelgan Triton inference.

FieldTypeLabelDescription
config_path string

The path to the MbMelgan Triton configuration.

stats_path string

The path to the MbMelgan statistics.

triton_model_name string

The name of the Triton model.

triton_server_host string

The host of the Triton inference server which servers the model.

triton_server_port int64

The port of the Triton inference server which servers the model.

Mel2Audio

Mel2Audio message contains settings for mel-to-audio inference.

FieldTypeLabelDescription
type string

The type of mel-to-audio inference.

mb_melgan_triton MbMelganTriton

MbMelgan Triton inference settings.

hifi_gan HiFiGan

HiFiGan inference settings.

hifi_gan_triton HiFiGanTriton

HiFiGan Triton inference settings.

NormalizeTextRequest

NormalizeTextRequest message is used to request text normalization.

FieldTypeLabelDescription
t2s_pipeline_id string

The ID of the text-to-speech pipeline.

text string

The text to be normalized.

NormalizeTextResponse

NormalizeTextResponse message is used to store the normalized text response.

FieldTypeLabelDescription
normalized_text string

The normalized text.

PhonemizerId

PhonemizerId message represents the ID of a phonemizer.

FieldTypeLabelDescription
id string

The ID of the phonemizer.

Postprocessing

Postprocessing message contains settings for postprocessing.

FieldTypeLabelDescription
silence_secs float

The duration of silence in seconds.

pipeline string repeated

Repeated field containing pipeline names.

logmmse Logmnse

Logmnse postprocessing settings.

wiener Wiener

Wiener postprocessing settings.

apodization Apodization

Apodization postprocessing settings.

RequestConfig

Represents a Configuration for the text to speech conversion.

FieldTypeLabelDescription
t2s_pipeline_id string

Required. Represents the pipeline id of the model configuration that will be used.

length_scale float

Optional. This parameter is used for time stretching which is the process of changing the speed or duration of an audio. It should be much more than 1.0. O is not a valid number for this variable. The default value is 1.

noise_scale float

Optional. Defines the noise in the generated audio. It should be between 0.0 and 1. The default value is 0.0

sample_rate int32

Optional. Defines the sample rate of the generated wav file. The default value is 22050.

pcm Pcm

Optional. Defines the pulse-code modulation of the wav file. The default value is PCM_16.

audio_format AudioFormat

Optional. Defines the format of the desired audio. The default value is wav.

use_cache bool

Optional. Define if cache should be used or not. The default value is False.

t2s_service_config google.protobuf.Struct optional

Optional. t2s_service_config provides the configuration of the service such as API key, bearer tokens, JWT, and other header information as key value pairs, e.g.,

MY_API_KEY='LKJDIFe244LKJOI'
A. For Amazon T2S service, the following arguments should be passed: A1. aws_access_key_id (required) Access key id to access Amazon WEB Service. A2. aws_secret_access_key (required) Secret access key to access Amazon WEB Service. A3. region (required) Region name of Amazon Server. Example: t2s_config_service={'aws_access_key_id': 'YOUR_AWS_ACCESS_KEY_ID', 'aws_secret_access_key': 'YOUR_AWS_SECRET_ACCESS_KEY', 'region': 'YOUR_AMAZON_SERVER_REGION_NAME'} B. For ElevenLabs T2s service, the following arguments should be passed: B1. api_key (required) API key of ElevenLabs cloud provider to access its T2S service. Example: t2s_config_service={'api_key': 'YOUR_ELEVENLABS_API_KEY'} C. For Google cloud T2S service, the following arguments should be passed: C1. api_key (required) API key of Google cloud provider to access its T2S service. C2. api_endpoint (optional) Regional API endpoint of Google cloud T2S service. (Defaults to 'eu-texttospeech.googleapis.com') Example: t2s_config_service={'api_key': 'YOUR_GOOGLE_CLOUD_API_KEY', 'api_endpoint': 'YOUR_GOOGLE_CLOUD_API_ENDPOINT'} D. For Microsoft Azure T2s service, the following arguments should be passed: D1. subscription_key (required) Subscription key to access Microsoft Azure Service. D2. region (required) Region name of Microsoft Azure Server. Example: t2s_config_service={'subscription_key': 'YOUR_MICROSOFT_AZURE_SUBSCRIPTION_KEY', 'region': 'YOUR_MICROSOFT_AZURE_SERVER_REGION_NAME'} Note: ondewo-t2s will raise an error if you don't pass any of the required arguments above.

t2s_cloud_provider_config T2sCloudProviderConfig optional

Optional. Defines the cloud provider's specific configuration for using text to speech cloud services The default value is None.

t2s_normalization T2SNormalization

Optional. Define t2s_normalization config parameters for this specific request. The default values are set in the config file and the values set via RequestConfig are set just for this specific request and will not update the pipeline.

word_to_phoneme_mapping google.protobuf.Struct optional

Optional. Define a dict which specifies the phonemes for a special word.

SingleInference

SingleInference message inference settings of text2audio models.

FieldTypeLabelDescription
text2audio Text2Audio

Text-to-audio inference settings.

StreamingSynthesizeRequest

StreamingSynthesizeRequest is used to perform streaming synthesize.

FieldTypeLabelDescription
text string

Required. Represents the text that will be transformed to speech. All the properties according to the input text in SynthesizeRequest can be also applied here.

config RequestConfig

Required. Represents the specifications needed to do the text to speech transformation.

StreamingSynthesizeResponse

Represents a Streaming Synthesize Response.

A Streaming Synthesize Response contains the generated audio, requested text and and

all other properties of this generated audio.

FieldTypeLabelDescription
audio_uuid string

Required. Represents the pipeline id of the model configuration that will be used.

audio bytes

Required. Generated file with the parameters described in request.

generation_time float

Required. Time to generate audio.

audio_length float

Required. Audio length.

text string

Required. Text from which audio was generated.

config RequestConfig

Required. Configuration from which audio was generated.

normalized_text string

Optional. Normalized text.

sample_rate float

Optional. Value of sampling rate

SynthesizeRequest

Represents a Synthesize Request.

A Synthesize Request contains the information need to perform a text to speech conversion.

FieldTypeLabelDescription
text string

Required. Represents the text that will be transformed to speech.

Synthesize text:

- Simple text:
Hello, how are you?

Examples to modulate the voice based on SSML tags and Arpabet phonemes:

- SSML Tag Phone:
<say-as interpret-as="phone">+12354321</say-as>
- SSML Tag Email:
<say-as interpret-as="email">voices@ondewo.com</say-as>
- SSML Tag URL:
<say-as interpret-as="url">ondewo.com/en/</say-as>
- SSML Tag Spell:
<say-as interpret-as="spell">AP732</say-as>
- SSML Tag Spell With Names:
<say-as interpret-as="spell-with-names">AHO32</say-as>
- SSML Tag Callsigns Short:
<say-as interpret-as="callsign-short">AUA439</say-as>
- SSML Tag Callsigns Long:
<say-as interpret-as="callsign-long">AAL439</say-as>
- SSML Tag Break Tag:
I am going to take a 2 seconds break  done
- Arpabet Phonemes:
Hello I am {AE2 L EH0 G Z AE1 N D R AH0}

config RequestConfig

Required. Represents the specifications needed to do the text to speech transformation.

SynthesizeResponse

Represents a Synthesize Response.

A Synthesize Response contains the generated audio, requested text and all other properties of this generated audio.

FieldTypeLabelDescription
audio_uuid string

Required. Represents the pipeline id of the model configuration that will be used.

audio bytes

Required. Generated file with the parameters described in request.

generation_time float

Required. Time to generate audio.

audio_length float

Required. Audio length.

text string

Required. Text from which audio was generated.

config RequestConfig

Required. Configuration from which audio was generated.

normalized_text string

Optional. Normalized text.

sample_rate float

Optional. Value of sampling rate

T2SCustomLengthScales

T2SCustomLengthScales message contains custom length scales for text types.

FieldTypeLabelDescription
text float

The custom length scale for general text.

email float

The custom length scale for email text.

url float

The custom length scale for URL text.

phone float

The custom length scale for phone number text.

spell float

The custom length scale for spelled-out text.

spell_with_names float

The custom length scale for spelled-out text with names.

callsign_long float

The custom length scale for long callsigns.

callsign_short float

The custom length scale for short callsigns.

T2SDescription

T2SDescription message is used to describe the text-to-speech service.

FieldTypeLabelDescription
language string

The language supported by the service.

speaker_sex string

pipeline_owner string

The owner of the text-to-speech pipeline.

comments string

Additional comments or notes.

speaker_name string

The name of the speaker.

domain string

The domain or context of the service.

T2SGetServiceInfoResponse

Version information of the service

FieldTypeLabelDescription
version string

version number

T2SInference

T2SInference message is used to specify the text-to-speech inference settings.

FieldTypeLabelDescription
type string

The type of inference.

composite_inference CompositeInference

Composite inference settings.

single_inference SingleInference

Single inference settings.

caching Caching

Caching settings.

T2SNormalization

Represents the configuration for text-to-speech normalization.

FieldTypeLabelDescription
language string

The language for which the normalization is applied.

pipeline string repeated

The pipeline(s) used for normalization.

custom_phonemizer_id string

The ID of the custom phonemizer, if used.

custom_length_scales T2SCustomLengthScales

Custom length scales for different text types.

arpabet_mapping string

The mapping for Arpabet phonemes.

numeric_mapping string

The mapping for numeric expressions.

callsigns_mapping string

The mapping for callsigns.

phoneme_correction_mapping string

The mapping for phoneme correction.

T2sCloudProviderConfig

Configuration for cloud provider settings for Text-to-Speech (T2S).

FieldTypeLabelDescription
t2s_cloud_provider_config_elevenlabs T2sCloudProviderConfigElevenLabs

Configuration for Eleven Labs text-to-speech provider.

t2s_cloud_provider_config_google T2sCloudProviderConfigGoogle

Configuration for Google text-to-speech provider.

t2s_cloud_provider_config_microsoft T2sCloudProviderConfigMicrosoft

Configuration for Microsoft text-to-speech provider.

T2sCloudProviderConfigElevenLabs

Configuration details specific to the Eleven Labs text-to-speech provider.

FieldTypeLabelDescription
stability float

Stability level for inference, influencing consistency of generated speech. It is in the range [0.0, 1.0].

similarity_boost float

Boost value for similarity to enhance the similarity of the generated voice to a target voice. It is in the range [0.0, 1.0].

style float

Style parameter to control the expression or emotion in speech. It is in the range [0.0, 1.0].

use_speaker_boost bool

Enables or disables speaker boost for emphasis on clarity and loudness.

apply_text_normalization string

Specifies type of text normalization to apply during processing. Available options are 'auto', 'on', and 'off'.

T2sCloudProviderConfigGoogle

Configuration details specific to the Google text-to-speech provider.

FieldTypeLabelDescription
speaking_rate float

Speaking rate for inference, controlling the speed of generated speech. It is in the range [0.25, 4.0].

volume_gain_db float

Volume gain in dB applied to the generated speech. It is in the range [-96.0, 16.0].

pitch float

Pitch adjustment for inference, allowing control over voice pitch. It is in the range in the range [-20.0, 20.0].

T2sCloudProviderConfigMicrosoft

Configuration details specific to the Microsoft text-to-speech provider.

FieldTypeLabelDescription
use_default_speaker bool

Determines whether to use the default speaker voice.

T2sCloudServiceAmazon

T2sCloudServiceAmazon message contains settings for the Amazon Cloud service inference.

FieldTypeLabelDescription
voice_id string

Voice ID indicating the speaker

model_id string

Model id for the inference server.

T2sCloudServiceElevenLabs

T2sCloudServiceElevenLabs message contains settings for the ElevenLabs Cloud service inference.

FieldTypeLabelDescription
language_code string

Language of the generated audio. It should be 4-Letter language code.

model_id string

Model ID indicating the name of the model

voice_id string

Voice ID indicating the speaker

voice_settings VoiceSettings

Voice setting of the inference

apply_text_normalization string

Flag to indicate applying text normalization

T2sCloudServiceGoogle

T2sCloudServiceGoogle message contains settings for the Google Cloud service inference.

FieldTypeLabelDescription
voice_id string

Voice ID indicating the speaker

speaking_rate float

Speaking rate to control the speed of audio.

volume_gain_db float

Volume gain in db to control volume of the audio.

pitch float

pitch value of the audio

T2sCloudServiceMicrosoft

T2sCloudServiceMicrosoft message contains settings for the Microsoft Cloud service inference.

FieldTypeLabelDescription
voice_id string

Voice ID indicating the speaker.

use_default_speaker bool

Flag to indicate using the default speaker.

T2sPipelineId

Pipeline Id representation.

Used in the creation, deletion and getter of pipelines.

FieldTypeLabelDescription
id string

Required. Defines the id of the pipeline.

Text2Audio

Text2Audio message contains settings for text-to-audio inference.

FieldTypeLabelDescription
type string

The type of text-to-audio inference.

vits Vits

Vits inference settings.

vits_triton VitsTriton

Vits Triton inference settings.

t2s_cloud_service_elevenlabs T2sCloudServiceElevenLabs

ElevenLabs cloud service inference settings.

t2s_cloud_service_amazon T2sCloudServiceAmazon

Amazon cloud service inference settings.

t2s_cloud_service_google T2sCloudServiceGoogle

Google cloud service inference settings.

t2s_cloud_service_microsoft T2sCloudServiceMicrosoft

Microsoft cloud service inference settings.

Text2Mel

Text2Mel message contains settings for text-to-mel inference.

FieldTypeLabelDescription
type string

The type of text-to-mel inference.

glow_tts GlowTTS

GlowTTS inference settings.

glow_tts_triton GlowTTSTriton

GlowTTS Triton inference settings.

Text2SpeechConfig

Configuration of text-to-speech models representation.

FieldTypeLabelDescription
id string

Required. Defines the id of the pipeline.

description T2SDescription

Required. Defines the description of the pipeline representation.

active bool

Required. Defines if the pipeline is active or inactive.

inference T2SInference

Required. Defines he inference of the pipeline representation.

normalization T2SNormalization

Required. Defines the normalization process of the pipeline representation.

postprocessing Postprocessing

Required. Defines the postprocessing process of the pipeline representation.

UpdateCustomPhonemizerRequest

UpdateCustomPhonemizerRequest message represents the request for updating a custom phonemizer.

FieldTypeLabelDescription
id string

The ID of the custom phonemizer to be updated.

update_method UpdateCustomPhonemizerRequest.UpdateMethod

The update method.

maps Map repeated

Repeated field of Map messages representing word-to-phoneme mappings.

Vits

FieldTypeLabelDescription
batch_size int64

The batch size for inference.

use_gpu bool

Flag indicating whether to use GPU for inference.

length_scale float

The length scale for inference.

noise_scale float

The noise scale for inference.

path string

The path to the Vits model.

cleaners string repeated

Repeated field containing the cleaners for text normalization.

param_config_path string

The path to the parameter configuration.

VitsTriton

VitsTriton message contains settings for the Vits Triton inference.

FieldTypeLabelDescription
batch_size int64

The batch size for inference.

length_scale float

The length scale for inference.

noise_scale float

The noise scale for inference.

cleaners string repeated

Repeated field containing the cleaners for text normalization.

max_text_length int64

The maximum text length allowed.

param_config_path string

The path to the parameter configuration.

triton_model_name string

The name of the Triton model.

triton_server_host string

The host of the Triton inference server which servers the model.

triton_server_port int64

The port of the Triton inference server which servers the model.

VoiceSettings

VoiceSettings message contains settings for ElevenLabs inference.

FieldTypeLabelDescription
stability float

stability value for elevenlabs inference

similarity_boost float

similarity boost value for ElevenLabs inference.

style float

style boost value for ElevenLabs inference.

use_speaker_boost bool

Flag to indicate speaker boost

Wiener

Wiener message contains settings for Wiener postprocessing.

FieldTypeLabelDescription
frame_len int64

The frame length.

lpc_order int64

The LPC order.

iterations int64

The number of iterations.

alpha float

The alpha value.

thresh float

The threshold value.

Enums

AudioFormat

AudioFormat enum represents various audio file formats for storing digital audio data.

NameNumberDescription
wav 0

Waveform Audio File Format (WAV)

flac 1

Free Lossless Audio Codec (FLAC)

caf 2

Core Audio Format (CAF)

mp3 3

MPEG Audio Layer III (MP3)

aac 4

Advanced Audio Coding (AAC)

ogg 5

Ogg Vorbis (OGG)

wma 6

Windows Media Audio (WMA)

Pcm

Represents a pulse-code modulation technique.

NameNumberDescription
PCM_16 0

16-bit pulse-code modulation.

PCM_24 1

24-bit pulse-code modulation.

PCM_32 2

32-bit pulse-code modulation.

PCM_S8 3

Signed 8-bit pulse-code modulation.

PCM_U8 4

Unsigned 8-bit pulse-code modulation.

FLOAT 5

Floating-point (32-bit) pulse-code modulation.

DOUBLE 6

Floating-point (64-bit) pulse-code modulation.

UpdateCustomPhonemizerRequest.UpdateMethod

The update method to be used.

NameNumberDescription
extend_hard 0

Add new words, replacing existing ones.

extend_soft 1

Add new words if they are not already present.

replace 2

Replace all words in the phonemizer with new ones.

Scalar Value Types

.proto TypeNotesC++JavaPythonGoC#PHPRuby
double double double float float64 double float Float
float float float float float32 float float Float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int int32 int integer Bignum or Fixnum (as required)
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long int64 long integer/string Bignum
uint32 Uses variable-length encoding. uint32 int int/long uint32 uint integer Bignum or Fixnum (as required)
uint64 Uses variable-length encoding. uint64 long int/long uint64 ulong integer/string Bignum or Fixnum (as required)
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int int32 int integer Bignum or Fixnum (as required)
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long int64 long integer/string Bignum
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int uint32 uint integer Bignum or Fixnum (as required)
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long uint64 ulong integer/string Bignum
sfixed32 Always four bytes. int32 int int int32 int integer Bignum or Fixnum (as required)
sfixed64 Always eight bytes. int64 long int/long int64 long integer/string Bignum
bool bool boolean boolean bool bool boolean TrueClass/FalseClass
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode string string string String (UTF-8)
bytes May contain any arbitrary sequence of bytes. string ByteString str []byte ByteString string String (ASCII-8BIT)