Explore chapters and articles related to this topic
Biological Data Mining:
Published in Wahiba Ben Abdessalem Karaa, Nilanjan Dey, Mining Multimedia Documents, 2017
Amira S. Ashour, Nilanjan Dey, Dac-Nhuong Le
A relation exists between the DNA understanding process and the pattern recognition computational problems, machine learning, and information extraction using data mining. Researchers are interested in intelligent systems to solve leading computational–genomics problems, such as genome annotation to identify and classifying genes, computational comparative genomics to compare complete genomic sequences at different levels, and genomic patterns including regulatory regions identification in sequence data. These problems are essential to understand the biological organisms’ function and their collaborations with the environment. The understanding of genes facilitates development of new treatments of genetic diseases, innovative antibiotics, and other drugs. Biological sequence mining is applied to discover a precise model of any organism’s genome structure to provide informative characteristics for the sequence with its meaning.
Discovery of effective infrequent sequences based on maximum probability path
Published in Connection Science, 2022
Ke Lu, Xianwen Fang, Na Fang, Esther Asare
There are anomalous activities that occur relatively infrequently but have a nasty impact. This is why anomalous activity is specifically dealt with in some studies. Ghionna et al. (2008) uses a two-step method to detect outliers in infrequent logs. By calculating log clusters and obtaining their average size, individuals that are hardly part of any cluster or actually smaller than the average size are considered as noise activities. In Sani et al. (2018), a sequence mining algorithm is used to discover the flow relationship between sequential patterns and rules as well as long-distance activities. This filtering method can more accurately detect the removal of abnormal behaviour, especially for event data with severe parallel and long-term dependencies. A new technology derived from information theory and Bayesian statistics in Tax et al. (2019) is proposed to filter out chaotic activities from event logs. Experiments show that the filtering technology is superior to the frequency-based algorithm, but the performance of the four active filtering technologies is highly dependent on the characteristics of event logs. Krajsic and Franczyk (2020) performs unsupervised anomaly detection on event streams in an online scenario, resulting in a better model.