The human brain is the most powerful computer in the known universe. Loosely inspired by the structure and function of biological neurons, artificial neural networks are nonlinear statistical models designed to detect complex correlations and patterns.
The first artificial neural network, the SNARC machine, was built by Marvin Minsky and Dean Edmonds in 1951. The SNARC machine was the first machine that could learn information. Over time, the sophistication of artificial neural networks has grown immensely. In 1980, Kunihiko Fukushima published findings on using an artificial neural network to recognize patterns on handwritten characters - which became the main inspiration for convolutional neural networks (CNN). A CNN uses networks of hidden layers to uncover specific features or patterns. This comes to life by transforming data in stages - early layers detect features that are easier to extract, while later layers extract more complicated and smaller data features. After extracting these features, a CNN’s last layer will classify and weight probabilities within the data. By 1981, John Hopefield proposed the concept of a recurrent neural network (RNN) to be used for a content-addressable memory system to help algorithms interpret sequences. In 1997, the performance and efficiency of RNNs received an upgrade when Sepp Hochereiter and Jurgen Schmidhuber developed the long short-term memory (LSTM) recurrent neural network.
Today, CNNs do a lot of different tasks, especially classify images and the language models of NLP. RNNs enable the model to capture temporal dynamic behavior - which is particularly helpful in speech recognition, grammar learning, and music composition.
Neural networks enable a machine to understand and generate language capable of capturing intent, entity, and context.
Every communication has at least one intent. For example, an email can have multiple intents such as travel details, meeting requests, and so on. Knowing the intent(s) of the communication improves the accuracy of the context that a neural network generates.
Tags that identify a “part of speech” or an “entity” are also important components for building context. For instance, knowing if the word 'apple' is referring to either the company or the fruit is a necessary step for understanding the context. To effectively learn sentence structure and mimic the tone of the user behind an email, an application also requires information about what part of speech a particular word belongs to (e.g. nouns, verbs, adjectives, etc.). In addition to word usage, the flow of the email is also important. For example, when people start an email, they take many different approaches. One may offer a warm greeting, another may lead with a relevant news hook, and yet another may prefer an event recap. Identifying these patterns improves an algorithm's understanding of each user's unique writing style to a particular recipient.