Uthrusamy Eds. Advances in Knowledge Discovery and Data Mining. Gunopulos, H. Mannila, and S. Discovering all most specific sentences by randomized algorithm. Han and Y. Discovery of multiple-level association rules from large databases. Lin and Z. Pincer-Search: A new algorithm for discovering the maximum frequent set. Technical Report TR , Dept. Mannila and H.
Discovering frequent episodes in sequences. KDD'95 , Aug. Levelwise search and borders of theories in knowledge discovery.
Mannila, H. Toivonen, and A. Improved methods for finding association rules. Generalization as search. Artificial Intelligence , Vol.
Cyclic Association Rules. Park, M. Chen, and P. An effective hash-based algorithm for mining association rules. Discovery, analysis, and presentation of strong rules.
Sarasere, E. Omiecinsky, and S. An efficient algorithm for mining association rules in large databases. The complexity of mining maximal frequent itemsets and maximal frequent patterns. Mathematics, Computer Science. View 4 excerpts, cites background and methods. Efficient mining of maximal frequent itemsets from databases on a cluster of workstations. Knowledge and Information Systems. View 1 excerpt, cites background.
Engineering, Computer Science. Parallel Algorithms for Discovery of Association Rules. Data Mining and Knowledge Discovery. An effective hash-based algorithm for mining association rules. View 2 excerpts. View 1 excerpt. IEEE Trans. Mining frequent patterns without candidate generation. Scalable parallel data mining for association rules. Fast sequential and parallel algorithms for association rule mining: a comparison. The passengers on the train are itemsets. When an itemset is on the train, we count its occurrence in the transactions that are read.
When an a priori algorithm is considered in this metaphor, all itemsets get on at the start of a pass and get off at the end. The 1 — itemsets take the first pass, the 2 — itemsets take the second pass, and so on.
In DIC, there is the added flexibility of allowing itemsets to get on at any stop as long as they get off at the same stop the next time the train goes around. Therefore, the itemset has seen all the transactions in the file.
However, we will begin counting 2 — itemsets after the first 10, transactions have been read. We will begin counting 3 itemsets after 20, transactions. For now, let us assume that there are no 4 — itemsets we need to count. Once we get to the end of the file, we will top counting the 1 — itemsets and go back to the start of the file to count the 2 — and 3 — itemsets. After the first 10, transactions, we will finish counting the 2 — itemsets and after 20, transaction, we will finish counting the 3 — itemsets.
In total, we have made 1. For notational convenience, we assign numbers to each stop sequentially. We then define four different structures:. Each of these structures maintains a list of itemsets. The counter is to keep track of the support value of the corresponding itemset. To stop umber is to keep track whether an itemset has completed one full pass over a database. The itemset in the solid box is the confirmed set of frequent sets.
The itemsets in the solid circle are the confirmed set of infrequent sets. The algorithm counts the support values of the itemsets in the dashed structure as it moves along from one stop point to another.
During the execution of the algorithm, at any stop point, the following events take place:. These are the itemsets whose support — counts reach value during this iteration reading records between two consecutive stops.
These are essentially the supersets of the itemsets that move from the dashed circle to the dashed box.
0コメント