Whisper (Speech to Text ML model) Project Ideas - STEAL these!

October 05, 2022

Speech to text machine learning models are surging in popularity right now as they become more accessible and increase in quality. A great example is OpenAI's new open-source speech to text model Whisper, an automatic speech recognition model that was trained on 680,000 hours of multilingual data from the internet.

Whisper is significant because it is fully open-sourced which unlocks anybody in the world to build with it, and the output results of this model have been very impressive so far.

Whisper (Speech to Text) Use Cases

Transcription Services

Transcription services are the primary use case of speech-to-text machine learning models like Whisper. That said, there are many creative project ideas that could leverage transcription to solve a specific need for customers and turn into a thriving business for you.

In regards to the new Whisper model, it's strength for use cases are in these areas:

English to Multi-language Transcription

Taking english audio and processing it into text in non-english languages.

Multi-language to English Transcription

Turning non-english language audio and processing it into english text.

English Transcription

Processing english language audio and converting it into english text.

Whisper (Speech to Text) Project Ideas

Let's get into it! Here are a few creative ideas of projects waiting to be built with speech to text models.

Video Game Audio Controller

Instead of controlling your video game with a keyboard or hand controller, you could unlock the ability for controlling your video game with only voice commands. There could be some fun adaptations to current games with this control style, but it would also be very useful for game accessibility if someone lacks motor skills in their hands.

CRM Plugin for Sales Reps

We all know sales reps love to talk. But not all of them love to do data entry or spend time typing at their computer when they could be out selling...and we don't blame them! The reality though is that at times sales reps can fall behind on updating customer records in CRM or they enter customer notes that are short and lower quality to save time.

This can cause friction with customer service or marketing teams that rely on this CRM data for their respective roles. Create a CRM plugin that allows sales reps to audibly share their customer notes and have them transcribed and saved in CRM. It could also enable the sales rep to audibly dictate what data fields need to be changed for the customer to save them time.

Plugin for Doctor Record-Keeping Software

Another career where turning speech into text could be extremely useful and efficient is in the medical world. Note taking and record keeping is notoriously one of the slowest and tedious parts of the day for a doctor. Specifically for general practitioners (family doctors), their day can be jam packed with patient appointments every minute of the day.

Create a plugin or tool for doctors that enables them to audibly speak their patient notes to transcribe and save them in their record keeping software. A tool like this could shorten the average appointment time, enabling the doctor to see more patients or to create more space within their day to avoid burnout.

Content Moderation

Livestream services, chat rooms, video games, and other places where audio communication takes place online typically have rules of engagement or moderation guidelines that you are required to abide by. Solutions exist for moderating text communication, but what if you could create a product that helps with audio moderation? Create a tool that takes in the audio communication and transcribes it to text. Then pass that text through moderation controls to flag sections that may not be following the moderation guidelines.

Court & Legal Transcription Services

Rather than have a human sitting in court recording each conversation and everything that was said for record keeping, you could use a speech-to-text model like Whisper to do that work for you. Your solution would take in audio and subsequently transcribe, organize and save it into the record-keeping format required.

Speech to Text/Dictation Browser Extension

Build a browser extension that takes in audio and performs browser commands based on what was dictated. For example, you could ask your browser to: "open Twitter and follow @Banana_Dev", login to LinkedIn and pull up new engineering job listings, or to pull up content on how to deploy Whisper to production. Another obvious use case is speech-to-text in messaging tools or text fields within the browser.

Multi-Language Video Transcription

Create a tool that makes it super easy to input an english video and have the audio transcribed and formatted nicely in multiple languages. This would be especially useful for content creators (like Mr. Beast!), who look to grow their audience in non-english speaking countries.

Real-time Transcription Services

This could be an amazing accessibility tool to support people with hearing impairment or folks traveling in a country where they don't speak the native language. Useful anywhere there are important instructions for the general population to understand, you could have people tune in to a radio frequency or go to a webpage where they can understand the audio being playing in a language they can comprehend. Pre-flight instructions on airplanes, subway and train traveling instructions, large public events are all examples of where a tool like this could be useful.

Want to build one of these projects? Banana can help! We have a detailed tutorial on how you can deploy Whisper to production you can view here. Don't forget to jump into our Discord community. Hundreds of ML & AI builders hang out there for you to connect with and get help building your next project!