Technology, Security, Privacy and Cost

Speech recognition requires a device to be listening at all times. It is actually during silence that the device wakes up and begins processing the data it was quietly collecting. If the listening device has the ability, at this point it can convert the audio data in its buffer to a text string and the process is over. If all voice applications worked like this the world would be a very safe place because your data would only be exchanged between you and your local device. Unfortunately, this is not how most voice applications work.

Most voice applications (like voice assistants, smart speakers, etc) run on hardware that is not capable of transcribing speech to text (STT). STT requires significant processing power and is typically delegated to server farms in the cloud. Voice applications save audio in a buffer until they detect silence, then they send this audio data to the cloud where the speech to text is actually performed.

This can pose a security risk and so most cloud connections are protected by a level of security known as SSL (secure socket layer) which put simply, encrypts your data while it is in transit. This safeguard against malicious actors, while reasonably effective, does not consider the issue of what actually becomes of your data once it finds its way to its ultimate destination in the cloud. This is up to the company you (or technically your listening device) entrusted it with and a good read of your terms and conditions shows it is now associated with you personally in a database somewhere, and for sale.

So simply securing one's data from hackers and snoops and such while reasonably effective, is useless if the voice service provider you have entrusted your data with will ultimately disrespect your privacy. Since everything you say is sent to the cloud to be converted from audio to text it is always saved in a cloud based data store where it is first associated with you (via IP address, UBIDs, and other fingerprinting technologies) and then used for targeted advertising, behavioral categorization, surveillance and such. To put it succinctly, you and your information have now become the product.

This appears to be a fair exchange. After all, there are significant costs associated with providing leading edge technology as a service and so it seems reasonable the voice provider use whatever means are necessary to maintain a robust, scalable product. In fact, most providers of voice services do so at an hourly rate, however, this is not where they make their money. The real money is in harvesting and selling your personal information. Voice enabled applications are the collection point for your private information, and ultimately there is not much you can do about it. A quick read of the terms and conditions of nearly all voice service providers will show that if you don't like it, don't use them. To summarize. The price you pay to use voice services is your privacy.

However, an alternative exists.