Обнаружение аномалий в распределенной системе с использованием сгенерированных лог-файлов

2019-02-24 15:56:06

Краткое содержание

Я разрабатываю ИИ-инструмент для обнаружения аномалий в распределенной системе. Система поддерживает интерфейс, объединяющий несколько отдельных логов в один лог-файл, генерируя около 7000 записей/мин. Лог-записи частично генерируются системой (d-Bus, IPC, …) и являются текстовыми сообщениями, написанными людьми (Status not received, initialized successfully, ….). Разработчики используют сгенерированный лог для отладки. Записи настроены на схожий формат в зависимости от генерируемой системы (timestamp, ids, component, context, verbosity level, description, ….).Фон:1. История выявленных аномалий минимальна и не архивируется.2. Не так много похожих шаблонов событий в лог-файлах.3. Правила выполнения программ не четко задокументированы.4. Лог-события взаимосвязаны.Какие рекомендуемые алгоритмы (Статистические, NLP, ML, нейронные сети) можно использовать для эффективного извлечения паттернов из записей и выявления существующих и новых аномальных поведения?

Полный текст

Anomaly Detection in distributed system using generated log file Ask Question

Asked 7 years ago Modified today Viewed 361 times

3 $\begingroup$ I am developing an AI tool for anomaly detection in a distributed system . The system supports an interface that combines several individual logs into a single log file generating approx. 7000 entries/min. The logs entries are partially system generated (d-Bus, IPC, ….) and human written statements (Status not received, initialized successfully, ….). The developers use the generated log for debugging. The entries have been configured to have a similar format depending on the generated system (timestamp, ids, component, context, verbosity level, description, ….). Background: 1. The history of the identified anomalies is minimal and not archived. 2. Not many similar event templates in log files. 3. Software execution rules are not clearly documented. 4. The log events are co-related. What are the recommended algorithms (Statistical, NLP, ML, Neural networks) that can be used to efficiently perform pattern extraction on the entries and identify existing and new anomalous behavior? neural-networks machine-learning natural-language-processing pattern-recognition Share Improve this question Follow edited Mar 2, 2019 at 12:38 asked Feb 24, 2019 at 15:56 Ben 31 4 4 bronze badges $\endgroup$ 5 $\begingroup$ Are your categories only "anomalous" and "not anomalous" or you want to perform anomaly detection and categorisation (as two different tasks)? $\endgroup$ nbro – nbro 2019-02-24 16:28:33 +00:00 Commented Feb 24, 2019 at 16:28 $\begingroup$ The main goal is to perform anomaly detection. By categorization, I meant the extraction of features from the log events relevant to the identification of the anomalous behavior. $\endgroup$ Ben – Ben 2019-02-24 17:28:25 +00:00 Commented Feb 24, 2019 at 17:28 $\begingroup$ Is the data going to be continuously provided (that is, will you keep receiving a stream of data) or is the data contained in a set which will not change? $\endgroup$ nbro – nbro 2019-02-24 17:29:57 +00:00 Commented Feb 24, 2019 at 17:29 $\begingroup$ I am looking into the stream of data already present(offline). But in the future, it is desired to extend the method to perform on a stream of incoming data (online). $\endgroup$ Ben – Ben 2019-02-24 17:34:13 +00:00 Commented Feb 24, 2019 at 17:34 $\begingroup$ What percentage of user log are expected attack? $\endgroup$ Cloud Cho – Cloud Cho 2023-11-07 00:12:29 +00:00 Commented Nov 7, 2023 at 0:12 Add a comment | 1 Answer 1 Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first) 0 $\begingroup$ In the paper " Unsupervised real-time anomaly detection for streaming data " (by Subutai Ahmad, Alexander Lavin, Scott Purdy and Zuha Agha), 2017, an algorithm for anomaly detection (particularly suited for cases where a stream of data is continuously provided) is described. This algorithm is based on Numenta's Hierarchical Temporal Memory model. I've actually never used it, but I know that Numenta's work is particularly suited for anomaly detection. You can have a look at it and see if it fits your needs. Have also a look at the Numenta Anomaly Benchmark (NAB). Share Improve this answer Follow answered Feb 24, 2019 at 17:47 nbro 43.4k 14 14 gold badges 122 122 silver badges 222 222 bronze badges $\endgroup$ 10 $\begingroup$ Thank you for the reference. I am new to machine learning related topics. I am a bit confused here about the datasets(logs) I have. Do I need to start by thinking about how to label them (anomalous/normal behavior)? Or can I perform anomaly detection on the unlabeled dataset? $\endgroup$ Ben – Ben 2019-02-24 20:01:49 +00:00 Commented Feb 24, 2019 at 20:01 $\begingroup$ @Ben In this case, I don't think you will need labelled data. Anomaly detection only consists in finding certain patterns in the data (so this is a unsupervised learning technique). $\endgroup$ nbro – nbro 2019-02-24 20:19:06 +00:00 Commented Feb 24, 2019 at 20:19 $\begingroup$ Do you have suggestions for any other unsupervised learning techniques? The Numenta approach cannot be utilized for my work due to license issues. $\endgroup$ Ben – Ben 2019-03-02 12:37:02 +00:00 Commented Mar 2, 2019 at 12:37 $\begingroup$ @Ben I would not have other suggestions (right now). But why is the license an issue? I think they have an open-source option. $\endgroup$ nbro – nbro 2019-03-02 12:38:24 +00:00 Commented Mar 2, 2019 at 12:38 $\begingroup$ Unfortunately, it is blacklisted for use in my company. $\endgroup$ Ben – Ben 2019-03-02 12:40:51 +00:00 Commented Mar 2, 2019 at 12:40 | Show 5 more comments You must log in to answer this question. Start asking to get answers Find the answer to your question by asking. Ask question Explore related questions neural-networks machine-learning natural-language-processing pattern-recognition See similar questions with these tags.

3 $\begingroup$ I am developing an AI tool for anomaly detection in a distributed system . The system supports an interface that combines several individual logs into a single log file generating approx. 7000 entries/min. The logs entries are partially system generated (d-Bus, IPC, ….) and human written statements (Status not received, initialized successfully, ….). The developers use the generated log for debugging. The entries have been configured to have a similar format depending on the generated system (timestamp, ids, component, context, verbosity level, description, ….). Background: 1. The history of the identified anomalies is minimal and not archived. 2. Not many similar event templates in log files. 3. Software execution rules are not clearly documented. 4. The log events are co-related. What are the recommended algorithms (Statistical, NLP, ML, Neural networks) that can be used to efficiently perform pattern extraction on the entries and identify existing and new anomalous behavior? neural-networks machine-learning natural-language-processing pattern-recognition Share Improve this question Follow edited Mar 2, 2019 at 12:38 asked Feb 24, 2019 at 15:56 Ben 31 4 4 bronze badges $\endgroup$ 5 $\begingroup$ Are your categories only "anomalous" and "not anomalous" or you want to perform anomaly detection and categorisation (as two different tasks)? $\endgroup$ nbro – nbro 2019-02-24 16:28:33 +00:00 Commented Feb 24, 2019 at 16:28 $\begingroup$ The main goal is to perform anomaly detection. By categorization, I meant the extraction of features from the log events relevant to the identification of the anomalous behavior. $\endgroup$ Ben – Ben 2019-02-24 17:28:25 +00:00 Commented Feb 24, 2019 at 17:28 $\begingroup$ Is the data going to be continuously provided (that is, will you keep receiving a stream of data) or is the data contained in a set which will not change? $\endgroup$ nbro – nbro 2019-02-24 17:29:57 +00:00 Commented Feb 24, 2019 at 17:29 $\begingroup$ I am looking into the stream of data already present(offline). But in the future, it is desired to extend the method to perform on a stream of incoming data (online). $\endgroup$ Ben – Ben 2019-02-24 17:34:13 +00:00 Commented Feb 24, 2019 at 17:34 $\begingroup$ What percentage of user log are expected attack? $\endgroup$ Cloud Cho – Cloud Cho 2023-11-07 00:12:29 +00:00 Commented Nov 7, 2023 at 0:12 Add a comment |

3 $\begingroup$ I am developing an AI tool for anomaly detection in a distributed system . The system supports an interface that combines several individual logs into a single log file generating approx. 7000 entries/min. The logs entries are partially system generated (d-Bus, IPC, ….) and human written statements (Status not received, initialized successfully, ….). The developers use the generated log for debugging. The entries have been configured to have a similar format depending on the generated system (timestamp, ids, component, context, verbosity level, description, ….). Background: 1. The history of the identified anomalies is minimal and not archived. 2. Not many similar event templates in log files. 3. Software execution rules are not clearly documented. 4. The log events are co-related. What are the recommended algorithms (Statistical, NLP, ML, Neural networks) that can be used to efficiently perform pattern extraction on the entries and identify existing and new anomalous behavior? neural-networks machine-learning natural-language-processing pattern-recognition Share Improve this question Follow edited Mar 2, 2019 at 12:38 asked Feb 24, 2019 at 15:56 Ben 31 4 4 bronze badges $\endgroup$ 5 $\begingroup$ Are your categories only "anomalous" and "not anomalous" or you want to perform anomaly detection and categorisation (as two different tasks)? $\endgroup$ nbro – nbro 2019-02-24 16:28:33 +00:00 Commented Feb 24, 2019 at 16:28 $\begingroup$ The main goal is to perform anomaly detection. By categorization, I meant the extraction of features from the log events relevant to the identification of the anomalous behavior. $\endgroup$ Ben – Ben 2019-02-24 17:28:25 +00:00 Commented Feb 24, 2019 at 17:28 $\begingroup$ Is the data going to be continuously provided (that is, will you keep receiving a stream of data) or is the data contained in a set which will not change? $\endgroup$ nbro – nbro 2019-02-24 17:29:57 +00:00 Commented Feb 24, 2019 at 17:29 $\begingroup$ I am looking into the stream of data already present(offline). But in the future, it is desired to extend the method to perform on a stream of incoming data (online). $\endgroup$ Ben – Ben 2019-02-24 17:34:13 +00:00 Commented Feb 24, 2019 at 17:34 $\begingroup$ What percentage of user log are expected attack? $\endgroup$ Cloud Cho – Cloud Cho 2023-11-07 00:12:29 +00:00 Commented Nov 7, 2023 at 0:12 Add a comment |

$\begingroup$ I am developing an AI tool for anomaly detection in a distributed system . The system supports an interface that combines several individual logs into a single log file generating approx. 7000 entries/min. The logs entries are partially system generated (d-Bus, IPC, ….) and human written statements (Status not received, initialized successfully, ….). The developers use the generated log for debugging. The entries have been configured to have a similar format depending on the generated system (timestamp, ids, component, context, verbosity level, description, ….). Background: 1. The history of the identified anomalies is minimal and not archived. 2. Not many similar event templates in log files. 3. Software execution rules are not clearly documented. 4. The log events are co-related. What are the recommended algorithms (Statistical, NLP, ML, Neural networks) that can be used to efficiently perform pattern extraction on the entries and identify existing and new anomalous behavior? neural-networks machine-learning natural-language-processing pattern-recognition Share Improve this question Follow edited Mar 2, 2019 at 12:38 asked Feb 24, 2019 at 15:56 Ben 31 4 4 bronze badges $\endgroup$

I am developing an AI tool for anomaly detection in a distributed system . The system supports an interface that combines several individual logs into a single log file generating approx. 7000 entries/min. The logs entries are partially system generated (d-Bus, IPC, ….) and human written statements (Status not received, initialized successfully, ….). The developers use the generated log for debugging. The entries have been configured to have a similar format depending on the generated system (timestamp, ids, component, context, verbosity level, description, ….). Background: 1. The history of the identified anomalies is minimal and not archived. 2. Not many similar event templates in log files. 3. Software execution rules are not clearly documented. 4. The log events are co-related. What are the recommended algorithms (Statistical, NLP, ML, Neural networks) that can be used to efficiently perform pattern extraction on the entries and identify existing and new anomalous behavior?

I am developing an AI tool for anomaly detection in a distributed system . The system supports an interface that combines several individual logs into a single log file generating approx. 7000 entries/min. The logs entries are partially system generated (d-Bus, IPC, ….) and human written statements (Status not received, initialized successfully, ….). The developers use the generated log for debugging. The entries have been configured to have a similar format depending on the generated system (timestamp, ids, component, context, verbosity level, description, ….).

Background: 1. The history of the identified anomalies is minimal and not archived. 2. Not many similar event templates in log files. 3. Software execution rules are not clearly documented. 4. The log events are co-related.

What are the recommended algorithms (Statistical, NLP, ML, Neural networks) that can be used to efficiently perform pattern extraction on the entries and identify existing and new anomalous behavior?

neural-networks machine-learning natural-language-processing pattern-recognition

neural-networks machine-learning natural-language-processing pattern-recognition

neural-networks machine-learning natural-language-processing pattern-recognition

Share Improve this question Follow edited Mar 2, 2019 at 12:38 asked Feb 24, 2019 at 15:56 Ben 31 4 4 bronze badges

Share Improve this question Follow edited Mar 2, 2019 at 12:38 asked Feb 24, 2019 at 15:56 Ben 31 4 4 bronze badges

Share Improve this question Follow

Share Improve this question Follow

Share Improve this question Follow

Improve this question

edited Mar 2, 2019 at 12:38

edited Mar 2, 2019 at 12:38

edited Mar 2, 2019 at 12:38

edited Mar 2, 2019 at 12:38

asked Feb 24, 2019 at 15:56 Ben 31 4 4 bronze badges

asked Feb 24, 2019 at 15:56 Ben 31 4 4 bronze badges

asked Feb 24, 2019 at 15:56

asked Feb 24, 2019 at 15:56

Ben 31 4 4 bronze badges

$\begingroup$ Are your categories only "anomalous" and "not anomalous" or you want to perform anomaly detection and categorisation (as two different tasks)? $\endgroup$ nbro – nbro 2019-02-24 16:28:33 +00:00 Commented Feb 24, 2019 at 16:28 $\begingroup$ The main goal is to perform anomaly detection. By categorization, I meant the extraction of features from the log events relevant to the identification of the anomalous behavior. $\endgroup$ Ben – Ben 2019-02-24 17:28:25 +00:00 Commented Feb 24, 2019 at 17:28 $\begingroup$ Is the data going to be continuously provided (that is, will you keep receiving a stream of data) or is the data contained in a set which will not change? $\endgroup$ nbro – nbro 2019-02-24 17:29:57 +00:00 Commented Feb 24, 2019 at 17:29 $\begingroup$ I am looking into the stream of data already present(offline). But in the future, it is desired to extend the method to perform on a stream of incoming data (online). $\endgroup$ Ben – Ben 2019-02-24 17:34:13 +00:00 Commented Feb 24, 2019 at 17:34 $\begingroup$ What percentage of user log are expected attack? $\endgroup$ Cloud Cho – Cloud Cho 2023-11-07 00:12:29 +00:00 Commented Nov 7, 2023 at 0:12 Add a comment |

$\begingroup$ Are your categories only "anomalous" and "not anomalous" or you want to perform anomaly detection and categorisation (as two different tasks)? $\endgroup$ nbro – nbro 2019-02-24 16:28:33 +00:00 Commented Feb 24, 2019 at 16:28 $\begingroup$ The main goal is to perform anomaly detection. By categorization, I meant the extraction of features from the log events relevant to the identification of the anomalous behavior. $\endgroup$ Ben – Ben 2019-02-24 17:28:25 +00:00 Commented Feb 24, 2019 at 17:28 $\begingroup$ Is the data going to be continuously provided (that is, will you keep receiving a stream of data) or is the data contained in a set which will not change? $\endgroup$ nbro – nbro 2019-02-24 17:29:57 +00:00 Commented Feb 24, 2019 at 17:29 $\begingroup$ I am looking into the stream of data already present(offline). But in the future, it is desired to extend the method to perform on a stream of incoming data (online). $\endgroup$ Ben – Ben 2019-02-24 17:34:13 +00:00 Commented Feb 24, 2019 at 17:34 $\begingroup$ What percentage of user log are expected attack? $\endgroup$ Cloud Cho – Cloud Cho 2023-11-07 00:12:29 +00:00 Commented Nov 7, 2023 at 0:12

$\begingroup$ Are your categories only "anomalous" and "not anomalous" or you want to perform anomaly detection and categorisation (as two different tasks)? $\endgroup$ nbro – nbro 2019-02-24 16:28:33 +00:00 Commented Feb 24, 2019 at 16:28

$\begingroup$ Are your categories only "anomalous" and "not anomalous" or you want to perform anomaly detection and categorisation (as two different tasks)? $\endgroup$ nbro – nbro 2019-02-24 16:28:33 +00:00 Commented Feb 24, 2019 at 16:28

2019-02-24 16:28:33 +00:00

$\begingroup$ The main goal is to perform anomaly detection. By categorization, I meant the extraction of features from the log events relevant to the identification of the anomalous behavior. $\endgroup$ Ben – Ben 2019-02-24 17:28:25 +00:00 Commented Feb 24, 2019 at 17:28

$\begingroup$ The main goal is to perform anomaly detection. By categorization, I meant the extraction of features from the log events relevant to the identification of the anomalous behavior. $\endgroup$ Ben – Ben 2019-02-24 17:28:25 +00:00 Commented Feb 24, 2019 at 17:28

2019-02-24 17:28:25 +00:00

$\begingroup$ Is the data going to be continuously provided (that is, will you keep receiving a stream of data) or is the data contained in a set which will not change? $\endgroup$ nbro – nbro 2019-02-24 17:29:57 +00:00 Commented Feb 24, 2019 at 17:29

$\begingroup$ Is the data going to be continuously provided (that is, will you keep receiving a stream of data) or is the data contained in a set which will not change? $\endgroup$ nbro – nbro 2019-02-24 17:29:57 +00:00 Commented Feb 24, 2019 at 17:29

2019-02-24 17:29:57 +00:00

$\begingroup$ I am looking into the stream of data already present(offline). But in the future, it is desired to extend the method to perform on a stream of incoming data (online). $\endgroup$ Ben – Ben 2019-02-24 17:34:13 +00:00 Commented Feb 24, 2019 at 17:34

$\begingroup$ I am looking into the stream of data already present(offline). But in the future, it is desired to extend the method to perform on a stream of incoming data (online). $\endgroup$ Ben – Ben 2019-02-24 17:34:13 +00:00 Commented Feb 24, 2019 at 17:34

2019-02-24 17:34:13 +00:00

$\begingroup$ What percentage of user log are expected attack? $\endgroup$ Cloud Cho – Cloud Cho 2023-11-07 00:12:29 +00:00 Commented Nov 7, 2023 at 0:12

$\begingroup$ What percentage of user log are expected attack? $\endgroup$ Cloud Cho – Cloud Cho 2023-11-07 00:12:29 +00:00 Commented Nov 7, 2023 at 0:12

Cloud Cho – Cloud Cho

2023-11-07 00:12:29 +00:00

1 Answer 1 Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first) 0 $\begingroup$ In the paper " Unsupervised real-time anomaly detection for streaming data " (by Subutai Ahmad, Alexander Lavin, Scott Purdy and Zuha Agha), 2017, an algorithm for anomaly detection (particularly suited for cases where a stream of data is continuously provided) is described. This algorithm is based on Numenta's Hierarchical Temporal Memory model. I've actually never used it, but I know that Numenta's work is particularly suited for anomaly detection. You can have a look at it and see if it fits your needs. Have also a look at the Numenta Anomaly Benchmark (NAB). Share Improve this answer Follow answered Feb 24, 2019 at 17:47 nbro 43.4k 14 14 gold badges 122 122 silver badges 222 222 bronze badges $\endgroup$ 10 $\begingroup$ Thank you for the reference. I am new to machine learning related topics. I am a bit confused here about the datasets(logs) I have. Do I need to start by thinking about how to label them (anomalous/normal behavior)? Or can I perform anomaly detection on the unlabeled dataset? $\endgroup$ Ben – Ben 2019-02-24 20:01:49 +00:00 Commented Feb 24, 2019 at 20:01 $\begingroup$ @Ben In this case, I don't think you will need labelled data. Anomaly detection only consists in finding certain patterns in the data (so this is a unsupervised learning technique). $\endgroup$ nbro – nbro 2019-02-24 20:19:06 +00:00 Commented Feb 24, 2019 at 20:19 $\begingroup$ Do you have suggestions for any other unsupervised learning techniques? The Numenta approach cannot be utilized for my work due to license issues. $\endgroup$ Ben – Ben 2019-03-02 12:37:02 +00:00 Commented Mar 2, 2019 at 12:37 $\begingroup$ @Ben I would not have other suggestions (right now). But why is the license an issue? I think they have an open-source option. $\endgroup$ nbro – nbro 2019-03-02 12:38:24 +00:00 Commented Mar 2, 2019 at 12:38 $\begingroup$ Unfortunately, it is blacklisted for use in my company. $\endgroup$ Ben – Ben 2019-03-02 12:40:51 +00:00 Commented Mar 2, 2019 at 12:40 | Show 5 more comments You must log in to answer this question. Start asking to get answers Find the answer to your question by asking. Ask question Explore related questions neural-networks machine-learning natural-language-processing pattern-recognition See similar questions with these tags.

1 Answer 1 Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first)

1 Answer 1 Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first)

Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first)

Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first)

Sorted by: Reset to default

Highest score (default) Date modified (newest first) Date created (oldest first)

0 $\begingroup$ In the paper " Unsupervised real-time anomaly detection for streaming data " (by Subutai Ahmad, Alexander Lavin, Scott Purdy and Zuha Agha), 2017, an algorithm for anomaly detection (particularly suited for cases where a stream of data is continuously provided) is described. This algorithm is based on Numenta's Hierarchical Temporal Memory model. I've actually never used it, but I know that Numenta's work is particularly suited for anomaly detection. You can have a look at it and see if it fits your needs. Have also a look at the Numenta Anomaly Benchmark (NAB). Share Improve this answer Follow answered Feb 24, 2019 at 17:47 nbro 43.4k 14 14 gold badges 122 122 silver badges 222 222 bronze badges $\endgroup$ 10 $\begingroup$ Thank you for the reference. I am new to machine learning related topics. I am a bit confused here about the datasets(logs) I have. Do I need to start by thinking about how to label them (anomalous/normal behavior)? Or can I perform anomaly detection on the unlabeled dataset? $\endgroup$ Ben – Ben 2019-02-24 20:01:49 +00:00 Commented Feb 24, 2019 at 20:01 $\begingroup$ @Ben In this case, I don't think you will need labelled data. Anomaly detection only consists in finding certain patterns in the data (so this is a unsupervised learning technique). $\endgroup$ nbro – nbro 2019-02-24 20:19:06 +00:00 Commented Feb 24, 2019 at 20:19 $\begingroup$ Do you have suggestions for any other unsupervised learning techniques? The Numenta approach cannot be utilized for my work due to license issues. $\endgroup$ Ben – Ben 2019-03-02 12:37:02 +00:00 Commented Mar 2, 2019 at 12:37 $\begingroup$ @Ben I would not have other suggestions (right now). But why is the license an issue? I think they have an open-source option. $\endgroup$ nbro – nbro 2019-03-02 12:38:24 +00:00 Commented Mar 2, 2019 at 12:38 $\begingroup$ Unfortunately, it is blacklisted for use in my company. $\endgroup$ Ben – Ben 2019-03-02 12:40:51 +00:00 Commented Mar 2, 2019 at 12:40 | Show 5 more comments

0 $\begingroup$ In the paper " Unsupervised real-time anomaly detection for streaming data " (by Subutai Ahmad, Alexander Lavin, Scott Purdy and Zuha Agha), 2017, an algorithm for anomaly detection (particularly suited for cases where a stream of data is continuously provided) is described. This algorithm is based on Numenta's Hierarchical Temporal Memory model. I've actually never used it, but I know that Numenta's work is particularly suited for anomaly detection. You can have a look at it and see if it fits your needs. Have also a look at the Numenta Anomaly Benchmark (NAB). Share Improve this answer Follow answered Feb 24, 2019 at 17:47 nbro 43.4k 14 14 gold badges 122 122 silver badges 222 222 bronze badges $\endgroup$ 10 $\begingroup$ Thank you for the reference. I am new to machine learning related topics. I am a bit confused here about the datasets(logs) I have. Do I need to start by thinking about how to label them (anomalous/normal behavior)? Or can I perform anomaly detection on the unlabeled dataset? $\endgroup$ Ben – Ben 2019-02-24 20:01:49 +00:00 Commented Feb 24, 2019 at 20:01 $\begingroup$ @Ben In this case, I don't think you will need labelled data. Anomaly detection only consists in finding certain patterns in the data (so this is a unsupervised learning technique). $\endgroup$ nbro – nbro 2019-02-24 20:19:06 +00:00 Commented Feb 24, 2019 at 20:19 $\begingroup$ Do you have suggestions for any other unsupervised learning techniques? The Numenta approach cannot be utilized for my work due to license issues. $\endgroup$ Ben – Ben 2019-03-02 12:37:02 +00:00 Commented Mar 2, 2019 at 12:37 $\begingroup$ @Ben I would not have other suggestions (right now). But why is the license an issue? I think they have an open-source option. $\endgroup$ nbro – nbro 2019-03-02 12:38:24 +00:00 Commented Mar 2, 2019 at 12:38 $\begingroup$ Unfortunately, it is blacklisted for use in my company. $\endgroup$ Ben – Ben 2019-03-02 12:40:51 +00:00 Commented Mar 2, 2019 at 12:40 | Show 5 more comments

$\begingroup$ In the paper " Unsupervised real-time anomaly detection for streaming data " (by Subutai Ahmad, Alexander Lavin, Scott Purdy and Zuha Agha), 2017, an algorithm for anomaly detection (particularly suited for cases where a stream of data is continuously provided) is described. This algorithm is based on Numenta's Hierarchical Temporal Memory model. I've actually never used it, but I know that Numenta's work is particularly suited for anomaly detection. You can have a look at it and see if it fits your needs. Have also a look at the Numenta Anomaly Benchmark (NAB). Share Improve this answer Follow answered Feb 24, 2019 at 17:47 nbro 43.4k 14 14 gold badges 122 122 silver badges 222 222 bronze badges $\endgroup$

In the paper " Unsupervised real-time anomaly detection for streaming data " (by Subutai Ahmad, Alexander Lavin, Scott Purdy and Zuha Agha), 2017, an algorithm for anomaly detection (particularly suited for cases where a stream of data is continuously provided) is described. This algorithm is based on Numenta's Hierarchical Temporal Memory model. I've actually never used it, but I know that Numenta's work is particularly suited for anomaly detection. You can have a look at it and see if it fits your needs. Have also a look at the Numenta Anomaly Benchmark (NAB).

In the paper " Unsupervised real-time anomaly detection for streaming data " (by Subutai Ahmad, Alexander Lavin, Scott Purdy and Zuha Agha), 2017, an algorithm for anomaly detection (particularly suited for cases where a stream of data is continuously provided) is described. This algorithm is based on Numenta's Hierarchical Temporal Memory model.

I've actually never used it, but I know that Numenta's work is particularly suited for anomaly detection. You can have a look at it and see if it fits your needs. Have also a look at the Numenta Anomaly Benchmark (NAB).

Share Improve this answer Follow answered Feb 24, 2019 at 17:47 nbro 43.4k 14 14 gold badges 122 122 silver badges 222 222 bronze badges

Share Improve this answer Follow answered Feb 24, 2019 at 17:47 nbro 43.4k 14 14 gold badges 122 122 silver badges 222 222 bronze badges

Share Improve this answer Follow

Share Improve this answer Follow

Share Improve this answer Follow

answered Feb 24, 2019 at 17:47 nbro 43.4k 14 14 gold badges 122 122 silver badges 222 222 bronze badges

answered Feb 24, 2019 at 17:47 nbro 43.4k 14 14 gold badges 122 122 silver badges 222 222 bronze badges

answered Feb 24, 2019 at 17:47

answered Feb 24, 2019 at 17:47

nbro 43.4k 14 14 gold badges 122 122 silver badges 222 222 bronze badges

43.4k 14 14 gold badges 122 122 silver badges 222 222 bronze badges

$\begingroup$ Thank you for the reference. I am new to machine learning related topics. I am a bit confused here about the datasets(logs) I have. Do I need to start by thinking about how to label them (anomalous/normal behavior)? Or can I perform anomaly detection on the unlabeled dataset? $\endgroup$ Ben – Ben 2019-02-24 20:01:49 +00:00 Commented Feb 24, 2019 at 20:01 $\begingroup$ @Ben In this case, I don't think you will need labelled data. Anomaly detection only consists in finding certain patterns in the data (so this is a unsupervised learning technique). $\endgroup$ nbro – nbro 2019-02-24 20:19:06 +00:00 Commented Feb 24, 2019 at 20:19 $\begingroup$ Do you have suggestions for any other unsupervised learning techniques? The Numenta approach cannot be utilized for my work due to license issues. $\endgroup$ Ben – Ben 2019-03-02 12:37:02 +00:00 Commented Mar 2, 2019 at 12:37 $\begingroup$ @Ben I would not have other suggestions (right now). But why is the license an issue? I think they have an open-source option. $\endgroup$ nbro – nbro 2019-03-02 12:38:24 +00:00 Commented Mar 2, 2019 at 12:38 $\begingroup$ Unfortunately, it is blacklisted for use in my company. $\endgroup$ Ben – Ben 2019-03-02 12:40:51 +00:00 Commented Mar 2, 2019 at 12:40 | Show 5 more comments

$\begingroup$ Thank you for the reference. I am new to machine learning related topics. I am a bit confused here about the datasets(logs) I have. Do I need to start by thinking about how to label them (anomalous/normal behavior)? Or can I perform anomaly detection on the unlabeled dataset? $\endgroup$ Ben – Ben 2019-02-24 20:01:49 +00:00 Commented Feb 24, 2019 at 20:01 $\begingroup$ @Ben In this case, I don't think you will need labelled data. Anomaly detection only consists in finding certain patterns in the data (so this is a unsupervised learning technique). $\endgroup$ nbro – nbro 2019-02-24 20:19:06 +00:00 Commented Feb 24, 2019 at 20:19 $\begingroup$ Do you have suggestions for any other unsupervised learning techniques? The Numenta approach cannot be utilized for my work due to license issues. $\endgroup$ Ben – Ben 2019-03-02 12:37:02 +00:00 Commented Mar 2, 2019 at 12:37 $\begingroup$ @Ben I would not have other suggestions (right now). But why is the license an issue? I think they have an open-source option. $\endgroup$ nbro – nbro 2019-03-02 12:38:24 +00:00 Commented Mar 2, 2019 at 12:38 $\begingroup$ Unfortunately, it is blacklisted for use in my company. $\endgroup$ Ben – Ben 2019-03-02 12:40:51 +00:00 Commented Mar 2, 2019 at 12:40

$\begingroup$ Thank you for the reference. I am new to machine learning related topics. I am a bit confused here about the datasets(logs) I have. Do I need to start by thinking about how to label them (anomalous/normal behavior)? Or can I perform anomaly detection on the unlabeled dataset? $\endgroup$ Ben – Ben 2019-02-24 20:01:49 +00:00 Commented Feb 24, 2019 at 20:01

$\begingroup$ Thank you for the reference. I am new to machine learning related topics. I am a bit confused here about the datasets(logs) I have. Do I need to start by thinking about how to label them (anomalous/normal behavior)? Or can I perform anomaly detection on the unlabeled dataset? $\endgroup$ Ben – Ben 2019-02-24 20:01:49 +00:00 Commented Feb 24, 2019 at 20:01

2019-02-24 20:01:49 +00:00

$\begingroup$ @Ben In this case, I don't think you will need labelled data. Anomaly detection only consists in finding certain patterns in the data (so this is a unsupervised learning technique). $\endgroup$ nbro – nbro 2019-02-24 20:19:06 +00:00 Commented Feb 24, 2019 at 20:19

$\begingroup$ @Ben In this case, I don't think you will need labelled data. Anomaly detection only consists in finding certain patterns in the data (so this is a unsupervised learning technique). $\endgroup$ nbro – nbro 2019-02-24 20:19:06 +00:00 Commented Feb 24, 2019 at 20:19

2019-02-24 20:19:06 +00:00

$\begingroup$ Do you have suggestions for any other unsupervised learning techniques? The Numenta approach cannot be utilized for my work due to license issues. $\endgroup$ Ben – Ben 2019-03-02 12:37:02 +00:00 Commented Mar 2, 2019 at 12:37

$\begingroup$ Do you have suggestions for any other unsupervised learning techniques? The Numenta approach cannot be utilized for my work due to license issues. $\endgroup$ Ben – Ben 2019-03-02 12:37:02 +00:00 Commented Mar 2, 2019 at 12:37

2019-03-02 12:37:02 +00:00

$\begingroup$ @Ben I would not have other suggestions (right now). But why is the license an issue? I think they have an open-source option. $\endgroup$ nbro – nbro 2019-03-02 12:38:24 +00:00 Commented Mar 2, 2019 at 12:38

$\begingroup$ @Ben I would not have other suggestions (right now). But why is the license an issue? I think they have an open-source option. $\endgroup$ nbro – nbro 2019-03-02 12:38:24 +00:00 Commented Mar 2, 2019 at 12:38

2019-03-02 12:38:24 +00:00

$\begingroup$ Unfortunately, it is blacklisted for use in my company. $\endgroup$ Ben – Ben 2019-03-02 12:40:51 +00:00 Commented Mar 2, 2019 at 12:40

$\begingroup$ Unfortunately, it is blacklisted for use in my company. $\endgroup$ Ben – Ben 2019-03-02 12:40:51 +00:00 Commented Mar 2, 2019 at 12:40

2019-03-02 12:40:51 +00:00

| Show 5 more comments

Start asking to get answers Find the answer to your question by asking. Ask question Explore related questions neural-networks machine-learning natural-language-processing pattern-recognition See similar questions with these tags.

Start asking to get answers Find the answer to your question by asking. Ask question

Start asking to get answers Find the answer to your question by asking. Ask question

Start asking to get answers

Find the answer to your question by asking.

Explore related questions neural-networks machine-learning natural-language-processing pattern-recognition See similar questions with these tags.

Explore related questions neural-networks machine-learning natural-language-processing pattern-recognition See similar questions with these tags.

Explore related questions

neural-networks machine-learning natural-language-processing pattern-recognition

See similar questions with these tags.

Featured on Meta Logo updates to Stack Overflow's visual identity All users on Stack Exchange can now participate in chat Related 1 Human Height estimation using person detection techniques 4 Face liveness detection using face landmark points 2 Finding anomaly detection by pattern matching in a set of continous data 8 Which unsupervised learning technique can be used for anomaly detection in a time series? 2 Is it possible to create a named entity recognition system without using POS tagging in the corpus? 5 What is the difference between out of distribution detection and anomaly detection? Hot Network Questions DC9 nomograph how-to? What does 'replace' in 'Sample with Replacement' mean? Can I play Virtual Boy without the accessory? Image with multiple white boxes and text/letters Is this small closet wall load-bearing? Controlling Subsection Visibility in LaTeX Table of Contents показуют an acknowledged variant? Do equivalence-preserving steps keep a false equation false? Why do Mtt 4:1-2 and Lk 4:1-2 use different phraseology to convey the duration of Jesus' fasting? Partition a list into sublists of given length Joshua Lederberg's influence on graph theory Why don't Umlaut go at the end of the word? Is Charlie Kirk and conservatives in general against condoms? Who plays the miner Quaid talks to on the train? Why would someone censor the names of games, like 明xx舟 and x神? Is there a contradiction between Matthew 8:20 and John 1:39 in regards to a home of Yahusha (Jesus)? Combining sleep pads to increase insulation What is the best way to draw a potato (like found in vector analysis/continuum mechanics courses) Can I skip the 2nd leg of my flight, without immigration issues? Was it true that Forever knight then had 2 different Endings? Can all of classical logic be derived from the laws of thought? What are some theorems/facts/results where it is unreasonable for students to come up with a (counter-)example on their own? Is message-bound key derivation a known cryptographic construction, and has it been analyzed as a MAC alternative? What is this specific diode? more hot questions Question feed

Featured on Meta Logo updates to Stack Overflow's visual identity All users on Stack Exchange can now participate in chat

Logo updates to Stack Overflow's visual identity

All users on Stack Exchange can now participate in chat

Related 1 Human Height estimation using person detection techniques 4 Face liveness detection using face landmark points 2 Finding anomaly detection by pattern matching in a set of continous data 8 Which unsupervised learning technique can be used for anomaly detection in a time series? 2 Is it possible to create a named entity recognition system without using POS tagging in the corpus? 5 What is the difference between out of distribution detection and anomaly detection?

1 Human Height estimation using person detection techniques 4 Face liveness detection using face landmark points 2 Finding anomaly detection by pattern matching in a set of continous data 8 Which unsupervised learning technique can be used for anomaly detection in a time series? 2 Is it possible to create a named entity recognition system without using POS tagging in the corpus? 5 What is the difference between out of distribution detection and anomaly detection?

1 Human Height estimation using person detection techniques

4 Face liveness detection using face landmark points

2 Finding anomaly detection by pattern matching in a set of continous data

8 Which unsupervised learning technique can be used for anomaly detection in a time series?

2 Is it possible to create a named entity recognition system without using POS tagging in the corpus?

5 What is the difference between out of distribution detection and anomaly detection?

Hot Network Questions DC9 nomograph how-to? What does 'replace' in 'Sample with Replacement' mean? Can I play Virtual Boy without the accessory? Image with multiple white boxes and text/letters Is this small closet wall load-bearing? Controlling Subsection Visibility in LaTeX Table of Contents показуют an acknowledged variant? Do equivalence-preserving steps keep a false equation false? Why do Mtt 4:1-2 and Lk 4:1-2 use different phraseology to convey the duration of Jesus' fasting? Partition a list into sublists of given length Joshua Lederberg's influence on graph theory Why don't Umlaut go at the end of the word? Is Charlie Kirk and conservatives in general against condoms? Who plays the miner Quaid talks to on the train? Why would someone censor the names of games, like 明xx舟 and x神? Is there a contradiction between Matthew 8:20 and John 1:39 in regards to a home of Yahusha (Jesus)? Combining sleep pads to increase insulation What is the best way to draw a potato (like found in vector analysis/continuum mechanics courses) Can I skip the 2nd leg of my flight, without immigration issues? Was it true that Forever knight then had 2 different Endings? Can all of classical logic be derived from the laws of thought? What are some theorems/facts/results where it is unreasonable for students to come up with a (counter-)example on their own? Is message-bound key derivation a known cryptographic construction, and has it been analyzed as a MAC alternative? What is this specific diode? more hot questions

Читать оригинал статьи