Text2Speech service provides endpoints for text-to-speech generation.
rpc Synthesize (SynthesizeRequest) returns (SynthesizeResponse)Synthesize RPC Synthesizes a specific text sent in the request with the provided configuration requirements and retrieves a response that includes the synthesized text as audio and the requested configuration.
rpc BatchSynthesize (BatchSynthesizeRequest) returns (BatchSynthesizeResponse)BatchSynthesize RPC Performs batch synthesis by accepting a batch of synthesis requests and returning a batch response. This can be more efficient for generating predictions on the AI model in bulk.
rpc StreamingSynthesize (stream StreamingSynthesizeRequest) returns (stream StreamingSynthesizeResponse)Performs streaming synthesis by accepting stream of input text and returning a stream of generated audio.
rpc NormalizeText (NormalizeTextRequest) returns (NormalizeTextResponse)NormalizeText RPC Normalizes a text according to the specific pipeline's normalization rules.
rpc GetT2sPipeline (T2sPipelineId) returns (Text2SpeechConfig)GetT2sPipeline RPC Retrieves the configuration of the specified text-to-speech pipeline.
rpc CreateT2sPipeline (Text2SpeechConfig) returns (T2sPipelineId)CreateT2sPipeline RPC Creates a new text-to-speech pipeline with the provided configuration and returns its pipeline ID.
rpc DeleteT2sPipeline (T2sPipelineId) returns (.google.protobuf.Empty)DeleteT2sPipeline RPC Deletes the specified text-to-speech pipeline.
rpc UpdateT2sPipeline (Text2SpeechConfig) returns (.google.protobuf.Empty)UpdateT2sPipeline RPC Updates the specified text-to-speech pipeline with the given configuration.
rpc ListT2sPipelines (ListT2sPipelinesRequest) returns (ListT2sPipelinesResponse)ListT2sPipelines RPC Retrieves a list of text-to-speech pipelines based on specific requirements.
rpc ListT2sLanguages (ListT2sLanguagesRequest) returns (ListT2sLanguagesResponse)ListT2sLanguages RPC Retrieves a list of languages available based on specific configuration requirements.
rpc ListT2sDomains (ListT2sDomainsRequest) returns (ListT2sDomainsResponse)ListT2sDomains RPC Retrieves a list of domains available based on specific configuration requirements.
rpc ListT2sNormalizationPipelines (ListT2sNormalizationPipelinesRequest) returns (ListT2sNormalizationPipelinesResponse)ListT2sNormalizationPipelines RPC Retrieves a list of normalization pipelines based on specific requirements.
rpc GetServiceInfo (.google.protobuf.Empty) returns (T2SGetServiceInfoResponse)GetServiceInfo RPC Retrieves the version information of the running text-to-speech server.
rpc GetCustomPhonemizer (PhonemizerId) returns (CustomPhonemizerProto)GetCustomPhonemizer RPC Retrieves a custom phonemizer based on the provided PhonemizerId.
rpc CreateCustomPhonemizer (CreateCustomPhonemizerRequest) returns (PhonemizerId)CreateCustomPhonemizer RPC Creates a custom phonemizer based on the provided CreateCustomPhonemizerRequest. Returns the PhonemizerId associated with the created custom phonemizer.
rpc DeleteCustomPhonemizer (PhonemizerId) returns (.google.protobuf.Empty)DeleteCustomPhonemizer RPC Deletes a custom phonemizer based on the provided PhonemizerId. Returns an Empty response upon successful deletion.
rpc UpdateCustomPhonemizer (UpdateCustomPhonemizerRequest) returns (CustomPhonemizerProto)UpdateCustomPhonemizer RPC Updates the specified custom phonemizer with the provided configuration.
rpc ListCustomPhonemizer (ListCustomPhonemizerRequest) returns (ListCustomPhonemizerResponse)ListCustomPhonemizer RPC Retrieves a list of custom phonemizers based on specific requirements.
Apodization message contains settings for apodization postprocessing.
Field | Type | Label | Description |
apodization_secs | float | The duration of apodization in seconds. |
BatchSynthesizeRequest message is used to send a batch request for synthesis.
Field | Type | Label | Description |
batch_request | SynthesizeRequest | repeated | Repeated field holding individual synthesis requests that make up the batch request. |
BatchSynthesizeResponse message is used to store the responses for a batch synthesis request.
Field | Type | Label | Description |
batch_response | SynthesizeResponse | repeated | Repeated field holding individual synthesis responses that correspond to the input requests in the batch. |
Caching message contains settings for caching.
Field | Type | Label | Description |
active | bool | Flag indicating whether caching is active. |
|
memory_cache_max_size | int64 | The maximum size of the memory cache. |
|
sampling_rate | int64 | The sampling rate for caching. |
|
load_cache | bool | Flag indicating whether to load cache. |
|
save_cache | bool | Flag indicating whether to save cache. |
|
cache_save_dir | string | The directory path to save the cache. |
CompositeInference message combines text-to-mel and mel-to-audio inference settings.
Field | Type | Label | Description |
text2mel | Text2Mel | Text-to-mel inference settings. |
|
mel2audio | Mel2Audio | Mel-to-audio inference settings. |
CreateCustomPhonemizerRequest message represents the request for creating a custom phonemizer.
Field | Type | Label | Description |
prefix | string | The prefix for the custom phonemizer ID. |
|
maps | Map | repeated | Repeated field of Map messages representing word-to-phoneme mappings. |
CustomPhonemizerProto message represents a custom phonemizer.
Field | Type | Label | Description |
id | string | The ID of the custom phonemizer. |
|
maps | Map | repeated | Repeated field of Map messages representing word-to-phoneme mappings. |
GlowTTS message contains settings for the GlowTTS inference.
Field | Type | Label | Description |
batch_size | int64 | The batch size for inference. |
|
use_gpu | bool | Flag indicating whether to use GPU for inference. |
|
length_scale | float | The length scale for inference. |
|
noise_scale | float | The noise scale for inference. |
|
path | string | The path to the GlowTTS model. |
|
cleaners | string | repeated | Repeated field containing the cleaners for text normalization. |
param_config_path | string | The path to the parameter configuration. |
GlowTTSTriton message contains settings for the GlowTTS Triton inference.
Field | Type | Label | Description |
batch_size | int64 | The batch size for inference. |
|
length_scale | float | The length scale for inference. |
|
noise_scale | float | The noise scale for inference. |
|
cleaners | string | repeated | Repeated field containing the cleaners for text normalization. |
max_text_length | int64 | The maximum text length allowed. |
|
param_config_path | string | The path to the parameter configuration. |
|
triton_model_name | string | The name of the Triton model. |
|
triton_server_host | string | The host of the Triton inference server which servers the model. |
|
triton_server_port | int64 | The port of the Triton inference server which servers the model. |
HiFiGan message contains settings for the HiFiGan inference.
Field | Type | Label | Description |
use_gpu | bool | Flag indicating whether to use GPU for inference. |
|
batch_size | int64 | The batch size for inference. |
|
config_path | string | The path to the HiFiGan configuration. |
|
model_path | string | The path to the HiFiGan model. |
HiFiGanTriton message contains settings for the HiFiGan Triton inference.
Field | Type | Label | Description |
config_path | string | The path to the HiFiGan Triton configuration. |
|
triton_model_name | string | The name of the Triton model. |
|
triton_server_host | string | The host of the Triton inference server which servers the model. |
|
triton_server_port | int64 | The port of the Triton inference server which servers the model. |
ListCustomPhonemizerRequest message represents the request for listing custom phonemizers.
Field | Type | Label | Description |
pipeline_ids | string | repeated | Repeated field of pipeline IDs to filter the list of custom phonemizers. |
ListCustomPhonemizerResponse message represents the response for listing custom phonemizers.
Field | Type | Label | Description |
phonemizers | CustomPhonemizerProto | repeated | Repeated field of CustomPhonemizerProto messages representing the custom phonemizers. |
Domain Request representation.
The request message for ListT2sDomains.
Filter domains of pipelines by attributed in request.
Field | Type | Label | Description |
speaker_sexes | string | repeated | Optional. Define the speaker sex. |
pipeline_owners | string | repeated | Optional. Define the pipeline owner/ owners. |
speaker_names | string | repeated | Optional. Define the speaker name/ names. |
languages | string | repeated | Optional. Define the language/ languages. |
Domains Response representation.
The response message for ListT2sDomains.
Field | Type | Label | Description |
domains | string | repeated | Required. Define the domain/ domains that satisfy/ies the specifications in the ListT2sDomainsRequest. |
Language Request representation.
The request message for ListT2sLanguages.
Filter languages of pipelines by attributed in request.
Field | Type | Label | Description |
speaker_sexes | string | repeated | Optional. Define the speaker sex. |
pipeline_owners | string | repeated | Optional. Define the pipeline owner/ owners. |
speaker_names | string | repeated | Optional. Define the speaker name/ names. |
domains | string | repeated | Optional. Define the domain/ domains. |
Language Response representation.
The response message for ListT2sLanguages.
Field | Type | Label | Description |
languages | string | repeated | Required. Define the language/ languages that satisfy/ies the specifications in the ListT2sLanguagesRequest. |
The request message for ListT2sNormalizationPipelines.
Filter pipelines by attributed in request.
Field | Type | Label | Description |
language | string | Optional. Define the language. |
Pipeline Response representation.
The response message for ListT2sNormalizationPipelines.
Field | Type | Label | Description |
t2s_normalization_pipelines | string | repeated | Required. Representation of a list of normalization pipelines configurations. Retrieved by ListT2sNormalizationPipelines, containing the configurations of normalization pipelines with the specifications received in the ListT2sNormalizationPipelinesRequest. |
Pipeline Request representation.
The request message for ListT2sPipelines.
Filter pipelines by attributed in request.
Field | Type | Label | Description |
languages | string | repeated | Optional. Define the language/ languages. |
speaker_sexes | string | repeated | Optional. Define the speaker sex. |
pipeline_owners | string | repeated | Optional. Define the pipeline owner/ owners. |
speaker_names | string | repeated | Optional. Define the speaker name/ names. |
domains | string | repeated | Optional. Define the domain/ domains. |
Pipeline Response representation.
The response message for ListT2sPipelines.
Field | Type | Label | Description |
pipelines | Text2SpeechConfig | repeated | Required. Representation of a list of pipelines configurations. Retrieved by ListT2sPipelines, containing the configurations of pipelines with the specifications received in the ListT2sPipelinesRequest. |
Logmnse message contains settings for Logmnse postprocessing.
Field | Type | Label | Description |
initial_noise | int64 | The initial noise value. |
|
window_size | int64 | The window size. |
|
noise_threshold | float | The noise threshold. |
Map message represents a word-to-phoneme mapping in a custom phonemizer.
Field | Type | Label | Description |
word | string | The word to be mapped. |
|
phoneme_groups | string | The phoneme groups associated with the word. |
MbMelganTriton message contains settings for the MbMelgan Triton inference.
Field | Type | Label | Description |
config_path | string | The path to the MbMelgan Triton configuration. |
|
stats_path | string | The path to the MbMelgan statistics. |
|
triton_model_name | string | The name of the Triton model. |
|
triton_server_host | string | The host of the Triton inference server which servers the model. |
|
triton_server_port | int64 | The port of the Triton inference server which servers the model. |
Mel2Audio message contains settings for mel-to-audio inference.
Field | Type | Label | Description |
type | string | The type of mel-to-audio inference. |
|
mb_melgan_triton | MbMelganTriton | MbMelgan Triton inference settings. |
|
hifi_gan | HiFiGan | HiFiGan inference settings. |
|
hifi_gan_triton | HiFiGanTriton | HiFiGan Triton inference settings. |
NormalizeTextRequest message is used to request text normalization.
Field | Type | Label | Description |
t2s_pipeline_id | string | The ID of the text-to-speech pipeline. |
|
text | string | The text to be normalized. |
NormalizeTextResponse message is used to store the normalized text response.
Field | Type | Label | Description |
normalized_text | string | The normalized text. |
PhonemizerId message represents the ID of a phonemizer.
Field | Type | Label | Description |
id | string | The ID of the phonemizer. |
Postprocessing message contains settings for postprocessing.
Field | Type | Label | Description |
silence_secs | float | The duration of silence in seconds. |
|
pipeline | string | repeated | Repeated field containing pipeline names. |
logmmse | Logmnse | Logmnse postprocessing settings. |
|
wiener | Wiener | Wiener postprocessing settings. |
|
apodization | Apodization | Apodization postprocessing settings. |
Represents a Configuration for the text to speech conversion.
Field | Type | Label | Description |
t2s_pipeline_id | string | Required. Represents the pipeline id of the model configuration that will be used. |
|
length_scale | float | Optional. This parameter is used for time stretching which is the process of changing the speed or duration of an audio. It should be much more than 1.0. O is not a valid number for this variable. The default value is 1. |
|
noise_scale | float | Optional. Defines the noise in the generated audio. It should be between 0.0 and 1. The default value is 0.0 |
|
sample_rate | int32 | Optional. Defines the sample rate of the generated wav file. The default value is 22050. |
|
pcm | Pcm | Optional. Defines the pulse-code modulation of the wav file. The default value is PCM_16. |
|
audio_format | AudioFormat | Optional. Defines the format of the desired audio. The default value is wav. |
|
use_cache | bool | Optional. Define if cache should be used or not. The default value is False. |
|
t2s_service_config | google.protobuf.Struct | optional | Optional. t2s_service_config provides the configuration of the service such as API key, bearer tokens, JWT, and other header information as key value pairs, e.g.,
A. For Amazon T2S service, the following arguments should be passed:
A1. aws_access_key_id (required) Access key id to access Amazon WEB Service.
A2. aws_secret_access_key (required) Secret access key to access Amazon WEB Service.
A3. region (required) Region name of Amazon Server.
Example:
t2s_config_service={'aws_access_key_id': 'YOUR_AWS_ACCESS_KEY_ID', 'aws_secret_access_key':
'YOUR_AWS_SECRET_ACCESS_KEY', 'region': 'YOUR_AMAZON_SERVER_REGION_NAME'}
B. For ElevenLabs T2s service, the following arguments should be passed:
B1. api_key (required) API key of ElevenLabs cloud provider to access its T2S service.
Example:
t2s_config_service={'api_key': 'YOUR_ELEVENLABS_API_KEY'}
C. For Google cloud T2S service, the following arguments should be passed:
C1. api_key (required) API key of Google cloud provider to access its T2S service.
C2. api_endpoint (optional) Regional API endpoint of Google cloud T2S service.
(Defaults to 'eu-texttospeech.googleapis.com')
Example:
t2s_config_service={'api_key': 'YOUR_GOOGLE_CLOUD_API_KEY', 'api_endpoint': 'YOUR_GOOGLE_CLOUD_API_ENDPOINT'}
D. For Microsoft Azure T2s service, the following arguments should be passed:
D1. subscription_key (required) Subscription key to access Microsoft Azure Service.
D2. region (required) Region name of Microsoft Azure Server.
Example:
t2s_config_service={'subscription_key': 'YOUR_MICROSOFT_AZURE_SUBSCRIPTION_KEY', 'region':
'YOUR_MICROSOFT_AZURE_SERVER_REGION_NAME'}
Note: ondewo-t2s will raise an error if you don't pass any of the required arguments above. |
t2s_cloud_provider_config | T2sCloudProviderConfig | optional | Optional. Defines the cloud provider's specific configuration for using text to speech cloud services The default value is None. |
t2s_normalization | T2SNormalization | Optional. Define t2s_normalization config parameters for this specific request. The default values are set in the config file and the values set via RequestConfig are set just for this specific request and will not update the pipeline. |
|
word_to_phoneme_mapping | google.protobuf.Struct | optional | Optional. Define a dict which specifies the phonemes for a special word. |
SingleInference message inference settings of text2audio models.
Field | Type | Label | Description |
text2audio | Text2Audio | Text-to-audio inference settings. |
StreamingSynthesizeRequest is used to perform streaming synthesize.
Field | Type | Label | Description |
text | string | Required. Represents the text that will be transformed to speech. All the properties according to the input text in SynthesizeRequest can be also applied here. |
|
config | RequestConfig | Required. Represents the specifications needed to do the text to speech transformation. |
Represents a Streaming Synthesize Response.
A Streaming Synthesize Response contains the generated audio, requested text and and
all other properties of this generated audio.
Field | Type | Label | Description |
audio_uuid | string | Required. Represents the pipeline id of the model configuration that will be used. |
|
audio | bytes | Required. Generated file with the parameters described in request. |
|
generation_time | float | Required. Time to generate audio. |
|
audio_length | float | Required. Audio length. |
|
text | string | Required. Text from which audio was generated. |
|
config | RequestConfig | Required. Configuration from which audio was generated. |
|
normalized_text | string | Optional. Normalized text. |
|
sample_rate | float | Optional. Value of sampling rate |
Represents a Synthesize Request.
A Synthesize Request contains the information need to perform a text to speech conversion.
Field | Type | Label | Description |
text | string | Required. Represents the text that will be transformed to speech. Synthesize text: - Simple text:
Examples to modulate the voice based on SSML tags and Arpabet phonemes: - SSML Tag Phone:
- SSML Tag Email:
- SSML Tag URL:
- SSML Tag Spell:
- SSML Tag Spell With Names:
- SSML Tag Callsigns Short:
- SSML Tag Callsigns Long:
- SSML Tag Break Tag:
- Arpabet Phonemes: |
|
config | RequestConfig | Required. Represents the specifications needed to do the text to speech transformation. |
Represents a Synthesize Response.
A Synthesize Response contains the generated audio, requested text and all other properties of this generated audio.
Field | Type | Label | Description |
audio_uuid | string | Required. Represents the pipeline id of the model configuration that will be used. |
|
audio | bytes | Required. Generated file with the parameters described in request. |
|
generation_time | float | Required. Time to generate audio. |
|
audio_length | float | Required. Audio length. |
|
text | string | Required. Text from which audio was generated. |
|
config | RequestConfig | Required. Configuration from which audio was generated. |
|
normalized_text | string | Optional. Normalized text. |
|
sample_rate | float | Optional. Value of sampling rate |
T2SCustomLengthScales message contains custom length scales for text types.
Field | Type | Label | Description |
text | float | The custom length scale for general text. |
|
float | The custom length scale for email text. |
||
url | float | The custom length scale for URL text. |
|
phone | float | The custom length scale for phone number text. |
|
spell | float | The custom length scale for spelled-out text. |
|
spell_with_names | float | The custom length scale for spelled-out text with names. |
|
callsign_long | float | The custom length scale for long callsigns. |
|
callsign_short | float | The custom length scale for short callsigns. |
T2SDescription message is used to describe the text-to-speech service.
Field | Type | Label | Description |
language | string | The language supported by the service. |
|
speaker_sex | string |
|
|
pipeline_owner | string | The owner of the text-to-speech pipeline. |
|
comments | string | Additional comments or notes. |
|
speaker_name | string | The name of the speaker. |
|
domain | string | The domain or context of the service. |
Version information of the service
Field | Type | Label | Description |
version | string | version number |
T2SInference message is used to specify the text-to-speech inference settings.
Field | Type | Label | Description |
type | string | The type of inference. |
|
composite_inference | CompositeInference | Composite inference settings. |
|
single_inference | SingleInference | Single inference settings. |
|
caching | Caching | Caching settings. |
Represents the configuration for text-to-speech normalization.
Field | Type | Label | Description |
language | string | The language for which the normalization is applied. |
|
pipeline | string | repeated | The pipeline(s) used for normalization. |
custom_phonemizer_id | string | The ID of the custom phonemizer, if used. |
|
custom_length_scales | T2SCustomLengthScales | Custom length scales for different text types. |
|
arpabet_mapping | string | The mapping for Arpabet phonemes. |
|
numeric_mapping | string | The mapping for numeric expressions. |
|
callsigns_mapping | string | The mapping for callsigns. |
|
phoneme_correction_mapping | string | The mapping for phoneme correction. |
Configuration for cloud provider settings for Text-to-Speech (T2S).
Field | Type | Label | Description |
t2s_cloud_provider_config_elevenlabs | T2sCloudProviderConfigElevenLabs | Configuration for Eleven Labs text-to-speech provider. |
|
t2s_cloud_provider_config_google | T2sCloudProviderConfigGoogle | Configuration for Google text-to-speech provider. |
|
t2s_cloud_provider_config_microsoft | T2sCloudProviderConfigMicrosoft | Configuration for Microsoft text-to-speech provider. |
Configuration details specific to the Eleven Labs text-to-speech provider.
Field | Type | Label | Description |
stability | float | Stability level for inference, influencing consistency of generated speech. It is in the range [0.0, 1.0]. |
|
similarity_boost | float | Boost value for similarity to enhance the similarity of the generated voice to a target voice. It is in the range [0.0, 1.0]. |
|
style | float | Style parameter to control the expression or emotion in speech. It is in the range [0.0, 1.0]. |
|
use_speaker_boost | bool | Enables or disables speaker boost for emphasis on clarity and loudness. |
|
apply_text_normalization | string | Specifies type of text normalization to apply during processing. Available options are 'auto', 'on', and 'off'. |
Configuration details specific to the Google text-to-speech provider.
Field | Type | Label | Description |
speaking_rate | float | Speaking rate for inference, controlling the speed of generated speech. It is in the range [0.25, 4.0]. |
|
volume_gain_db | float | Volume gain in dB applied to the generated speech. It is in the range [-96.0, 16.0]. |
|
pitch | float | Pitch adjustment for inference, allowing control over voice pitch. It is in the range in the range [-20.0, 20.0]. |
Configuration details specific to the Microsoft text-to-speech provider.
Field | Type | Label | Description |
use_default_speaker | bool | Determines whether to use the default speaker voice. |
T2sCloudServiceAmazon message contains settings for the Amazon Cloud service inference.
Field | Type | Label | Description |
voice_id | string | Voice ID indicating the speaker |
|
model_id | string | Model id for the inference server. |
T2sCloudServiceElevenLabs message contains settings for the ElevenLabs Cloud service inference.
Field | Type | Label | Description |
language_code | string | Language of the generated audio. It should be 4-Letter language code. |
|
model_id | string | Model ID indicating the name of the model |
|
voice_id | string | Voice ID indicating the speaker |
|
voice_settings | VoiceSettings | Voice setting of the inference |
|
apply_text_normalization | string | Flag to indicate applying text normalization |
T2sCloudServiceGoogle message contains settings for the Google Cloud service inference.
Field | Type | Label | Description |
voice_id | string | Voice ID indicating the speaker |
|
speaking_rate | float | Speaking rate to control the speed of audio. |
|
volume_gain_db | float | Volume gain in db to control volume of the audio. |
|
pitch | float | pitch value of the audio |
T2sCloudServiceMicrosoft message contains settings for the Microsoft Cloud service inference.
Field | Type | Label | Description |
voice_id | string | Voice ID indicating the speaker. |
|
use_default_speaker | bool | Flag to indicate using the default speaker. |
Pipeline Id representation.
Used in the creation, deletion and getter of pipelines.
Field | Type | Label | Description |
id | string | Required. Defines the id of the pipeline. |
Text2Audio message contains settings for text-to-audio inference.
Field | Type | Label | Description |
type | string | The type of text-to-audio inference. |
|
vits | Vits | Vits inference settings. |
|
vits_triton | VitsTriton | Vits Triton inference settings. |
|
t2s_cloud_service_elevenlabs | T2sCloudServiceElevenLabs | ElevenLabs cloud service inference settings. |
|
t2s_cloud_service_amazon | T2sCloudServiceAmazon | Amazon cloud service inference settings. |
|
t2s_cloud_service_google | T2sCloudServiceGoogle | Google cloud service inference settings. |
|
t2s_cloud_service_microsoft | T2sCloudServiceMicrosoft | Microsoft cloud service inference settings. |
Text2Mel message contains settings for text-to-mel inference.
Field | Type | Label | Description |
type | string | The type of text-to-mel inference. |
|
glow_tts | GlowTTS | GlowTTS inference settings. |
|
glow_tts_triton | GlowTTSTriton | GlowTTS Triton inference settings. |
Configuration of text-to-speech models representation.
Field | Type | Label | Description |
id | string | Required. Defines the id of the pipeline. |
|
description | T2SDescription | Required. Defines the description of the pipeline representation. |
|
active | bool | Required. Defines if the pipeline is active or inactive. |
|
inference | T2SInference | Required. Defines he inference of the pipeline representation. |
|
normalization | T2SNormalization | Required. Defines the normalization process of the pipeline representation. |
|
postprocessing | Postprocessing | Required. Defines the postprocessing process of the pipeline representation. |
UpdateCustomPhonemizerRequest message represents the request for updating a custom phonemizer.
Field | Type | Label | Description |
id | string | The ID of the custom phonemizer to be updated. |
|
update_method | UpdateCustomPhonemizerRequest.UpdateMethod | The update method. |
|
maps | Map | repeated | Repeated field of Map messages representing word-to-phoneme mappings. |
Field | Type | Label | Description |
batch_size | int64 | The batch size for inference. |
|
use_gpu | bool | Flag indicating whether to use GPU for inference. |
|
length_scale | float | The length scale for inference. |
|
noise_scale | float | The noise scale for inference. |
|
path | string | The path to the Vits model. |
|
cleaners | string | repeated | Repeated field containing the cleaners for text normalization. |
param_config_path | string | The path to the parameter configuration. |
VitsTriton message contains settings for the Vits Triton inference.
Field | Type | Label | Description |
batch_size | int64 | The batch size for inference. |
|
length_scale | float | The length scale for inference. |
|
noise_scale | float | The noise scale for inference. |
|
cleaners | string | repeated | Repeated field containing the cleaners for text normalization. |
max_text_length | int64 | The maximum text length allowed. |
|
param_config_path | string | The path to the parameter configuration. |
|
triton_model_name | string | The name of the Triton model. |
|
triton_server_host | string | The host of the Triton inference server which servers the model. |
|
triton_server_port | int64 | The port of the Triton inference server which servers the model. |
VoiceSettings message contains settings for ElevenLabs inference.
Field | Type | Label | Description |
stability | float | stability value for elevenlabs inference |
|
similarity_boost | float | similarity boost value for ElevenLabs inference. |
|
style | float | style boost value for ElevenLabs inference. |
|
use_speaker_boost | bool | Flag to indicate speaker boost |
Wiener message contains settings for Wiener postprocessing.
Field | Type | Label | Description |
frame_len | int64 | The frame length. |
|
lpc_order | int64 | The LPC order. |
|
iterations | int64 | The number of iterations. |
|
alpha | float | The alpha value. |
|
thresh | float | The threshold value. |
AudioFormat enum represents various audio file formats for storing digital audio data.
Name | Number | Description |
wav | 0 | Waveform Audio File Format (WAV) |
flac | 1 | Free Lossless Audio Codec (FLAC) |
caf | 2 | Core Audio Format (CAF) |
mp3 | 3 | MPEG Audio Layer III (MP3) |
aac | 4 | Advanced Audio Coding (AAC) |
ogg | 5 | Ogg Vorbis (OGG) |
wma | 6 | Windows Media Audio (WMA) |
Represents a pulse-code modulation technique.
Name | Number | Description |
PCM_16 | 0 | 16-bit pulse-code modulation. |
PCM_24 | 1 | 24-bit pulse-code modulation. |
PCM_32 | 2 | 32-bit pulse-code modulation. |
PCM_S8 | 3 | Signed 8-bit pulse-code modulation. |
PCM_U8 | 4 | Unsigned 8-bit pulse-code modulation. |
FLOAT | 5 | Floating-point (32-bit) pulse-code modulation. |
DOUBLE | 6 | Floating-point (64-bit) pulse-code modulation. |
The update method to be used.
Name | Number | Description |
extend_hard | 0 | Add new words, replacing existing ones. |
extend_soft | 1 | Add new words if they are not already present. |
replace | 2 | Replace all words in the phonemizer with new ones. |
.proto Type | Notes | C++ | Java | Python | Go | C# | PHP | Ruby |
double | double | double | float | float64 | double | float | Float | |
float | float | float | float | float32 | float | float | Float | |
int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | long | int/long | int64 | long | integer/string | Bignum |
uint32 | Uses variable-length encoding. | uint32 | int | int/long | uint32 | uint | integer | Bignum or Fixnum (as required) |
uint64 | Uses variable-length encoding. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum or Fixnum (as required) |
sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | long | int/long | int64 | long | integer/string | Bignum |
fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 2^28. | uint32 | int | int | uint32 | uint | integer | Bignum or Fixnum (as required) |
fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 2^56. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum |
sfixed32 | Always four bytes. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
sfixed64 | Always eight bytes. | int64 | long | int/long | int64 | long | integer/string | Bignum |
bool | bool | boolean | boolean | bool | bool | boolean | TrueClass/FalseClass | |
string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | String | str/unicode | string | string | string | String (UTF-8) |
bytes | May contain any arbitrary sequence of bytes. | string | ByteString | str | []byte | ByteString | string | String (ASCII-8BIT) |