Author: Jay Jacobs and Bob Rudis
Reviewer: Ben Rothke
Summary: Superb book for effective use of data for information security
There is a not so fine line between data dashboards and other information displays that provide pretty but otherwise useless and unactionable information; and those that provide effective answers to key questions. Data-Driven Security: Analysis, Visualization and Dashboardsis all about the later.
In this extremely valuable book, authors Jay Jacobs and Bob Rudis show you how to find security patterns in your data logs and extract enough information from it to create effective information security countermeasures. By using data correctly and truly understanding what that data means, the authors show how you can achieve much greater levels of security.
The book is meant for a serious reader who is willing to put in the time and effort to learn the programming necessary (mainly in Python and R) to truly understand what information exists deep in the recesses of their logs. As to R, it is a GNU project and a free software programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. For analysis the level of which Jacobs and Rudis prescribe, R is a godsend.
The following are the 12 densely packed chapters in the book:
1 : The Journey to Data-Driven Security
2 : Building Your Analytics Toolbox: A Primer on Using R and Python for Security Analysis
3 : Learning the "Hello World" of Security Data Analysis
4 : Performing Exploratory Security Data Analysis
5 : From Maps to Regression
6 : Visualizing Security Data
7 : Learning from Security Breaches
8 : Breaking Up with Your Relational Database
9 : Demystifying Machine Learning
10 : Designing Effective Security Dashboards
11 : Building Interactive Security Visualizations
12 : Moving Toward Data-Driven Security
After completing the book, the reader will have the ability to know which questions to ask to gain security insights, and use that data to ensure the overall security of their data and networks. Getting to that level is not a trivial at all a trivial task; even if there are vendors who can promise to do that.
For many people performing data analysis, the dependable Excel spreadsheet is their basic choice for data manipulation. The book calls the spreadsheet a gateway tool between a text editor and programming. The book notes that spreadsheets work as long as the data is not too large or complex. The book quotes a 2013 report to shareholders from J.P. Morgan in which parts of their 2012 $6 billion in losses was due in part to problems with their Excel spreadsheets.
The authors suggest using Excel as a temporary solution for quick one-shot tasks. For those that have repeating analytical tasks or models that are used repeatedly, it's best to move to some type of structured programming language, specifically those that the book suggest and for provides significant amounts of code examples; all of which are available on the companion website here.
The goal of all data extraction is to use data analysis to answer real questions. A large part of the book focuses on how to ask the right question. In chapter 1, the authors write that every good data analysis project begins with setting a goal and creating one or more research questions. Without a well-formed question guiding the analysis, you may wasting time and energy seeking convenient answers in the data, or worse, you may end up answering a question that nobody was asking in the first place.
The value of the book is that it shows the reader how to focus on context and purpose of the data analysis by setting the research question appropriately; rather than simply parsing large amounts of data. It's ultimately irrelevant if you can use Hadoop to process petabytes of data if you don't know what you are looking for.
Visualization is a large part of what this book is about, and in chapter 6 — Visualizing Security Data, the book notes that the most efficient path to human understanding is via the visual sense. It goes on to details the many advantages data visualization has, and the key to making it work.
As important as visualization is, describing the data is equally important. In chapter 7, the book introduces the VERIS(Vocabulary for Event Recording and Incident Sharing) framework. VERIS is a set of metrics designed to provide a common language for describing security incidents in a structured and repeatable manner. VERIS helps organizations collect useful incident-related information and to share that information, anonymously and responsibly with others.
The book shows how you can use dashboards for effective data visualization. But the authors warn that a dashboard is notan art show. They caution that given the graphical nature of dashboards, it's easy to fall into the trap of making them look like pieces of modern or fringe art; when they are far more akin to architectural and industrial diagrams that require more controlled, deliberate and constrained design.
As to dashboards the authors do not like, they consider the Cyber Security Situational Awarenessto be glitzy but not informative. Personally, I thought the dashboard has a lot of good information.
The book uses the definition of dashboardaccording to Stephen Few, in that it's a "visual display of the most important information needed to achieve one or more objectives that has been consolidated in a single computer screen so it can be monitored at a glance". The book enables the reader to create dashboards like that.
Data-Driven Security: Analysis, Visualization and Dashboardsis a superb book written by two experts who provide significant amounts of valuable information in every chapter. For those that are willing to put the time and effort into the serious amount of work that the book requires, they will find it a vital resource that will certainly help them achieve much higher levels of security.
Reviewed by Ben Rothke"