Creating Bots That Sound Like Humans With Natural Language Processing

We’ve all been there:

You want to change your cell phone plan, or figure out why your damn TV isn’t working (seriously, this is way too complicated sometimes), so you go ahead and dial customer service.

As you put the phone to your ear, you hear the dreaded,

“Please hold. All our customer service representatives are busy right now. A representative will be with you shortly.”

*Cue distorted phone music*

10 agonizing minutes pass.

Then 20.

Then 30.

The point is, customer service SUCKS.

Now, why am I reminding you of this annoying problem?

Because we shouldn’t have to deal with this crap.

Thanks to some recent advances in Natural Language Processing, customer service, and a million other things are about to change forever.

Natural What-?

Natural Language Processing (NLP) is the field of getting computers to understand human language.

It’s how Siri understands you when you want to know the weather forecast, how Google Translate translates French to English, and it’s how we could cut customer service wait times to 0 seconds.

Imagine a virtual customer service representative so smart that you could have entire conversations with them, and not even know they’re not human. Now imagine having hundreds if not thousands of them available at the press of a button.

That’s the power of NLP.

And we’re actually making some headway. Google recently created a voice-assistant that is almost indistinguishable from a human. In this video, you can see it booking an appointment for a haircut at a real salon.

I thought that was super cool, so I decided to use an NLP model to create my own chatbot.

Here’s how I did it.


The funny thing about machine learning is that even when the data comes pretty much perfectly preprocessed, you still have to preprocess it to make it useable to an algorithm.

In this case, the data came like this:

b'L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!\n'
b'L1044 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ They do to!\n'
b'L985 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ I hope so.\n'
b'L984 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ She okay?\n'
b"L925 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Let's go.\n"
b'L924 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ Wow\n'
b"L872 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Okay -- you're gonna need to learn how to lie.\n"
b'L871 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ No\n'
b'L870 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ I\'m kidding. You know how sometimes you just become this "persona"? And you don\'t know how to quit?\n'
b'L869 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Like my fear of wearing pastels?\n'

I’ll quickly explain how I preprocessed the data since it’s super boring and not the reason why you clicked on this article (feel free to skip this section).

First, I had to remove all the unnecessary characters you can see here (like +++$+++). Then I grouped the lines into pairs and conversations. This way the algorithm can understand the statement-and-response relationship between separate lines. The pairs were also grouped into conversations based on the movie they were taken from.

After this, the words had to be converted to numbers, since the algorithm can’t take words as input.

Luckily, this is pretty easy. All we have to do is create a python dictionary (read: index) where each word is assigned a corresponding number.

So for example, if this is our dictionary

Dog → 3

It’s → 14

a → 52

Then the sentence “It’s a dog” would become “14 52 3”.

Now we’re finally ready to design and train our model.

Before we can decide what model to use, we need to understand what we actually need to do.

All a chatbot really does is take a sequence of words, try to understand it, and output a sequence of words.

For this type of task, we can use a, wait for it, sequence-to-sequence (Seq2Seq) model!

Seq2Seq Model - How does it work?

Like I mentioned above, a Seq2Seq model takes in an input sequence one component at a time (in this case a word), and outputs a sequence one component at a time.

The Seq2Seq implementation I used has 3 main parts that work together to create a chatbot:

The Encoder Network → The Attention Mechanism → The Decoder Network

The Encoder

The Encoder network (an RNN) runs through the input sentence one word at a time. For every word, it outputs two vectors: an output vector and a hidden-state vector.

The hidden-state vector is passed to the next time-step (iteration) as an input. The output vector is recorded. These two vectors contain information about the word and its relation to other words in the sentence.

It also compiles the information it catches into a vector, called the “context”. This context is then decoded by the decoder network to produce our desired output.

The encoder uses a bi-directional GRU (Gated Recurrent Unit) architecture, which is two GRU-based RNNs “flowing” in opposite directions. One of the RNNs takes in the sentence in normal order, while the other one takes in the sentence in reverse order.

At each time step, the encoded outputs (A and A’) are summed (into y), enabling the network to not only encode words sequentially but to also encode the relationship between words at different points in the sentence.

This is useful because, in language, words at the beginning can have a direct effect on the meaning of the words at the end of a sentence. If we want to build a chatbot that can actually make sense of what the user types in, we need this capability.

The Attention Mechanism

The attention mechanism tells the decoder which part of the sequence to pay attention to at each time step when generating a word. This saves computational power since the network doesn’t have to worry about the entire sequence at a time.

Imagine you have to give a summary of one specific paragraph in an essay, but instead of only reading that one paragraph, you read the entire essay, paying equal attention to every single paragraph. That would be really hard and take a really long time compared to just reading that one paragraph.

The attention mechanism prevents this by telling the decoder which parts it should focus on to generate the correct word.

Since the decoder outputs a single word at a time, the attention changes with every word. The attention mechanism uses the decoder’s current hidden state, as well as the encoder’s outputs, to determine which parts of an input sequence the network should pay attention to when generating the current word.

The Decoder

The Decoder (also an RNN) generates the output sentence one word at a time. It takes the encoder’s context vectors, and its own hidden states (the ones it generates from the previous words), and output from the attention mechanism, and generates a new hidden state and the next word in the sequence.

In simpler terms, the decoder outputs each word based on the previous word it generated and the input sequence that the user entered.

Putting everything together

Put these three components together, and you have yourself a Seq2Seq model capable of conversation.

Here’s a conversation I had with my chatbot:

> Hello
Bot: hello . . . . .
> What is your name?
Bot: edward . . . . .
> Is your name really edward?
Bot: yes . . . . .
> Are you sure!
Bot: i m fine . . . . !
> Why do you think that?
Bot: i don t know . . .
> Can I help you?
Bot: i m not going to work . .
> Why not?
Bot: because you re a good experiment . .
> I'm an experiment!
Bot: you re a fool ! ! !
> I'm a fool!
Bot: i m sorry . ! ! !

As you can see, it’s kinda not-so-great. Its responses to what I’m typing generally make logical sense, butit’s still far from perfect.

Even still, with some more training, this model could easily be used for customer support. In fact, Google researchers actually managed to train a Seq2Seq model to provide tech support to customers. You can check out the paper here.

Natural Language Processing is still a developing field, but we’re already starting to see some exciting new innovations, and it’ll only get better from here.

If you thought Siri and Amazon Alexa were cool, just wait until you’re having casual, seamless conversations with your TV about sports, and more importantly, have instant customer service when it breaks.


  • Natural Language Processing is the field of getting computers to understand human language.
  • Natural Language Processing is what makes things like Siri, Amazon Alexa, and Google Translate possible.
  • A Seq2Seq chatbot model takes in a sentence, processes it one word at a time, and outputs a sentence one word at a time.
  • A Seq2Seq model is made of three parts: an encoder, an attention mechanism, and a decoder.

If you enjoyed this article, feel free to reach out to me on , and check out my personal website where you can see my other projects and sign up for my newsletter! Thanks!


Article by channel:

Read more articles tagged: Natural Language Processing