Automatically Classify your Call Recordings using Natural Language Processing – Nexmo

Nexmo’s Voice API makes it simple to record inbound and outbound telephone calls. However, locating every call where people are discussing a particular topic, such as “Computer Science”, could become very time consuming if you have to listen to every audio file each time.

In this tutorial, we’ll show you how you can use Natural Language Processing, via Google Cloud Services, to automatically classify the content of each recording so that you can quickly identify voice calls which were about specific topics.


All the code for this tutorial is available on GitHub. It uses pipenv to manage dependencies and requires Python 3.6.4. You can create a virtual environment and install the dependencies by running:

We’re going to be using the Nexmo Voice API, specifically the record action. Before continuing with this tutorial, you should read through our voice building blocks as well as some of our previous tutorials on creating Voice Applications.

We use two Google Cloud Services APIs in the tutorial as well; Cloud Speech-to-Text and Cloud Natural Language. You should create a new Google Cloud Platform (GCP) project and ensure that you enable Speech-to-Text and Natural Language.

Remember to download your GCP project credentials and store them somewhere your script can access them. I added mine to the root of the project and named them google_private.json.

There’s a .env.example file in the root of the project. This example file outlines the different environmental variables the application is expecting. Copy this file and rename it to .env. Any values set within this file are automatically loaded into your environment when you run:

Recording our Call

Hopefully, by now you should be familiar with NCCOs. Our first Flask route is going to serve our NCCO file, instructing the Nexmo Voice API to record any calls to our Virtual Number:

There are two main things to note in the above code:

  1. The event_url points at our local Flask server. The handler for this route is discussed later in the tutorial.
  2. The recording format is set to wav, by default Nexmo provides recordings as MP3 files. However, the Google Speech-to-Text service supports WAV, so we need to set the format of our recording to match.

New Recording Webhook

Whenever a call completes the Nexmo Voice API sends a POST request to our event_url. I’ve extracted most of the heavy lifting from the Flask view handler and moved it into a series of background tasks using Huey:

Download the Recording

The get_recording method on the Nexmo Python client is new, so if you’ve installed the Python client before you’ll likely need to upgrade:

After you retrieve the WAV file from Nexmo, the application saves it into the recordings directory. The download_recording function returns the recording_uuid within a dictionary as Huey passes any return values into the next function in the pipeline as keyword arguments.

Transcribe the Recording

Before we can do any Natural Language Processing of the content of our audio file we need to convert it to text:

You can read more about the Google Cloud Speech-to-Text API on their website. Now that the audio file is converted to text the next function in the pipeline is triggered.

Classifying the Recording

The application makes one final API call, this time to the Google Cloud Language service:

This API can do a lot more than merely classify text; it can provide insights on the sentiment of the text provided, or break the text down into a series of sentences and tokens using Syntactic Analysis. Read their documentation for more details.

Further Reading

Hopefully, our tutorial has given you some idea of what is possible by combining the Nexmo Voice API with Google Cloud. If you’d like more information on other exciting things you can achieve with the Nexmo Voice API these other tutorials will be of interest:


Article by channel:

Read more articles tagged: Natural Language Processing