Project Goals

This project was started to address two basic concerns.

  • the difficulty associated with offering an http connection outside a local subnet
  • the cost associated with running cloud instances robust enough to transcribe STT and TTS

The first is solved by using a cloud based socket relay server that producers can connect to and wait for work. The second problem is solved by creating a distributed network of real (not virtual) low cost hardware (like a simple laptop or desktop) capable of transcribing TTS and STT which utilize the cloud based socket relay server to receive TTS and/or STT requests and produce responses.

To better understand the problems being solved consider an AWS instance capable of providing reasonable STT and TTS transcription performance. Such an instance could cost close to $100 a day (maybe $50k a year) while a small producer farm on a local subnet of 2-8 machines would cost almost nothing and produce nearly the same low-end throughput. High quality transcription grade STT would still require the larger machines but balancing the resource can be managed.

Setting up your own private producer farm or connecting one to our voice exchange is a relatively simple exercise that should require no longer than 15 minutes. Producer farms can be 'constant' or temporary and should cost no more to run than the cost of a standard internet connection like your wifi network at home. In fact if you have two or more old laptops or desktops laying around try setting up a small temporary producer farm to see what its like o contribute tto the voice exchange. You will also accumulate quality speech seconds (QSS) which will offset your own usage.


View the PriVox Repository for more detailed information and coding examples.