Splunk gets talking to Alexa, via Insight Engines

One of the ‘star turns’ at the recent Splunk .conf2017 event in Washington DC didn’t come from the company but instead from one of its newer partners, and one which Splunk has – probably wisely – taken a stake in, Insight Engines.

This, according to CEO Grant Wernick, is now offering capabilities in building search arguments that have never really been available before and take what is normally considered to be a highly skilled coding job – building complex queries based on the analysis of machine logs – and making it a straight forward process. Not only can it make a good fist of deciphering a poorly phrased query attempt, it can then write the required code to deliver a result and format it to the requirements of any dashboard display.

The drive behind the original development work has been in the area of security analysis, hence its current name, Cyber Security Investigator, but Wernick readily acknowledges that it can – and will – apply equally to the Biz/Ops marketplace.
The system is based on a new natural language analysis tool developed by Wernick and his team with the original goal of re-interpreting written natural language queries as code written in Splunk’s SPL query language. The objective is to move query writing, and in particular those initial queries that are intended to give some idea of the scope of a particular problem area, from being a skilled job to one that a wide range of people can manage.

The original development work assumed that applications developed using would run in the cloud, but the first applications that users were interested in were in the security area, which operate almost exclusively on-premise, said Wernick:

It is really difficult to build an on prem agent that can understand normal English and then parse your phrases. The Cyber Security Investigator that runs on Splunk. This has saved us a huge headache with organisations because the need is so massive, including amongst the Federal Government. There, staff are often rotated every six months, so there is a real deficit while people train up.

The natural language compiler allows users to be very lax in the language they use, using questions such as `show me the network traffic yesterday around 5pm to China’. It can then decipher the concepts in that statement, such as time, location, traffic sources and the rest:

Humans are really good at building things in multiple perspectives. They can look at a visualisation and rapidly see that something is out of the ordinary, but a machine would take a long time to train to do that. To write a query like this is really difficult, but people who need to use something like this as a jumping off point for deeper queries.

The system can also expose the underlying Splunk query language that the system builds, so is also a good training tool to help developers understand what the Splunk system can do and how to make it achieve those ends. They can, in effect, learn Splunk `backwards’ from a working query to the underlying code.

Wernick and his team demonstrated using the system performing correlations and joins. In current circumstances it takes the right person about half a day to produce the right joins. This example was a join across two different data sets: a user authentication log and the malware records. The idea is to be able to triage these systems by seeing the malware on a system and the number of times a user has logged in to it. That way it becomes possible to determine who is at risk of being impersonated by the attacker. Wernick said:

You can then ask the question, ‘What did this user do next, what resources did the user touch?’ You can triage the systems on the basis of the most number of log-ins or the most number of infections. You can then eyeball which system should be triaged first. The point here is we are helping security teams decide what they need to focus on next without having to build the SPL routines needed to do such a job. And when requirements change, which they often do, it would normally mean that code has to be rewritten. Here they just change the query.

Users can phrase queries as one would when talking to a colleague and enter them using the keyboard. Regular queries can also be easily turned into a component of a dashboard. The latest version of the home page now allows the creation of up to six individual dashboards for different roles within a security team. It also ensures that team members don’t encroach on the domains of others by noting what tools are in use by those members and only allowing them access to their own domain areas.

It also logs actions taken by team members and makes them available to all other team members regardless of location, sharing knowledge around and allowing newer team members to pick up on tips’n’tricks much faster. Wernick added:

We have also released a new tool called AutoPilot. It is like an intelligent assistant, and because it can go and look at what sources of data are available it can start asking questions that a security team might never have thought of asking. It changes and augments the thresholds every time it asks a question and can then come up with new questions. It gets a lot more randomness into the operations center, which is kinda helpful when you live in the Great-Wall-Of-China world we live in today. It gets people out of their comfort zone.

Further out, the plan is to extend this into a much broader AI-based environment which will not only know what queries to set, but also understand the answers and be able to directly manage remedial action.


The best demo is also the spookiest, for at the time of the Splunk conference in Washington, the company had just announced voice interaction, having integrated the system to Amazon Alexa. So the order could be made:

Alexa, ask CSI to show me the network traffic for yesterday.

And though in a hotel room, with some doubts about WiFi connectivity, up comes the results on the laptop from the company’s servers. Then came more complex, convoluted questions drilling down into the data, such as specific network traffic to Germany.

Right now the capabilities are limited because the necessary security syntaxes and taxonomies are not part of the Alexa repertoire yet. But as Wernick suggested, even the demo is sufficient to get many people started on querying their security services.

The interesting part here is that this has applications well beyond security services. He is already looking with increasing inquisitiveness at the possibilities of applying the Investigator’s capabilities to other areas, such as Salesforce and IoT applications:

Imagine asking, ‘Show me account managers in the Pacific North West that closed over $1 million last quarter’. Or in IoT and industrial control, asking, ‘Tell me if boiler five is still running hot, and hotter than yesterday?’ And this is something that works with blue collar workers, not replaces them. They are still the experts and can be part of the new economy.

In Wernick’s view the user community has barely tapped what can be done with machine logs in the wide ranges of general management capabilities. And he sees real potential in the ability to exploit machine code querying to achieve faster, richer, delivery of results. As an example, the idea that a spoken query – `has company X paid all its bills up to date’ – could collect the results from all databases and spreadsheets at the same time, rather than requiring each to be queried in its own prescribed manner.

My take

Even as a consumer `novelty’ the Alexa speech recognition and synthesis system has already proved to be both flexible and productive. Coupling it with what is, on this first sight, a powerful and effective natural language processing system for writing queries certainly has that ‘wow’ potential, and with the right taxonomies programmed into it could not only be a good tool for security analysis and management, but a real boon in the Biz/Ops space.

Image credit – Amazon


Article by channel:

Read more articles tagged: