Making Celebrity Voice#

Our first complete Chroma JS demo app#

March 6, 2023

Celebrity Voice is a demo app built using the chroma js client. You upload an audio clip of yourself talking, which is then embedded using an audio embedding model and then chroma is used to compare against the VoxCeleb dataset of celebrity voices. Chroma makes it easy to scale from a simple prototype in a jupyter notebook to a deployed application and today we’ll walk you through how Celebrity Voice worked as a quick prototype and an overview of the deployed version.

The VoxCeleb dataset consists of 1,251 speakers across 145,265 utterances (a clip of spoken audio). Each of these utterances is a few seconds long and stored as a wav file. The initial prototype of celebrity voice was simply a few lines in a jupyter notebook, it uses the pyannote embedding model to embed audio for diarization.

python

image

In its entirety celebrity voice consists of the pyannote embedding model deployed onto AWS lambda in order to turn user audio files into an embedding and then that embedding is used to query a deployed version of chroma running in client/server mode. The database is accessed via the chroma-js client through AWS API Gateway.

Client audio recording & Lambda embed service#

The client audio recording uses the MediaRecorder API to get a blob in the browsers recording format of the users recording. It then base64 encodes that data and uploads it to the lambda which is behind an API gateway. The lambda uses ffmpeg to convert the file into wav, which is what the embedding model expects, performs the embedding, and then sends that back to the client. There are many ways to optimize this path but for a demo application, this works great!

Chroma-js & Deployed Chroma#

We first deploy chroma using the cloud-formation template as described in the documentation. We deploy onto a m6i.xlarge instance using the template, which sets up a client/server ready version of chroma. We then configure an API gateway route with authorization around the routes we want to expose to the client - namely the /query route. After that, we can configure chroma js with the URL to our API gateway and use it to query our database for the top matches with the embeddings we get from the embed service. These top matches correspond to clips that the model thinks are from the same speaker!

Further Reading#