For power users, we may support “bring your own detection algorithms” and ensemble methods. We want to support multi-dimensional detection methods. In addition, the algorithms should be able to account for (or not) seasonality in data, and be able to detect not only changes in first and second order magnitude, but also non-distributional effects such as changes in frequency content and phase. Our goal is to build algorithms that are: (1) lightweight, i.e., streaming, (2) unsupervised, (3) accurate, (4) interactive, i.e., low training time to detect anomalies, and (5) embarassingly parallelizable and elastic. In this document, we propose to build anomaly detection as a feature on Elasticsearch.
![hyperspace anomaly hyperspace anomaly](https://images.gog.com/548f559959b56c2e10270c48f53e19eb9b0e445d03f95d028984d5e921477b30.jpg)
The first step towards actionable data analytics is anomaly detection. In this blog, we discuss foundational aspects upon which anomaly detection is built: Random Cut Forest (RCF) machine learning algorithms underpinning the detection, the system architecture and workflow. We started off by building and releasing anomaly detection as an Open Distro for Elasticsearch feature. We want to provide users with an interactive and guided exploration of data without having them to worry about tuning the analytics “blackbox” that includes the models, hyperparameters and labels (with an option, however, for semi-supervised approaches). Our goal is to build a set of real time analytics features for Elasticsearch that makes it easier for Open Distro for Elasticsearch users to automatically mine real time patterns across data streams at ingestion. This is because actionable queries require keeping track of data changes and distribution of each field over time, which is harder to achieve when data volumes increase significantly heterogeneity (e.g., attack behavior in security traces) complicates understanding the data itself. An exponential data surge reduces the usage of a traditional analytics user workflow, which has been a set of canned queries and dashboards. The feature includes a nice mix of machine learning algorithms, statistics methods, systems work, visualization and UI, and enterprise primitives (for working on anomalies).Īnalytics data continues to increase exponentially with time.
![hyperspace anomaly hyperspace anomaly](https://i.redd.it/jrmb42b2j9j11.png)
We are excited to continue our work on anomaly detection as a part of Open Distro for Elasticsearch in the coming months, and invite developers in the larger search community to join in and co-develop some parts. Today, we released Anomaly Detection (preview) on Open Distro for Elasticsearch. This post was imported from the Open Distro For Elasticsearch blog,Ī predecessor project of OpenSearch.