Algorithm about selection of the characteristic element in a clustering process's set
UDC
004.424.4, 004.93.14, 004.021, 0Abstract
People use classification for objects organization into groups since ancient times. In one of his articles Robert Sokal notes that classification is high level of intellectual activity and it helps to understand the nature. Clustering is result of software algorithms applying to classification. This approach allows deploying data mining to classified information. The article describes an algorithm for a cluster characteristic element selection and its formal requirements definition. One of areas for the algorithm’s applying is intellectual text search systems. A main purpose of the article is description of an algorithm for characteristic element selection. The algorithm should have less asymptotic estimate operating time than enumeration of all elements. A main idea based on the classical method of branches and borders. An original part of the algorithm is errors estimates comparison for selected characteristic element. Also, the article describes two algorithms for random test data generation. Showed results of these tests illustrate and explain advantages of the main algorithm in comparison with the enumeration algorithm. An empirical assessment of the proposed algorithm convergence demonstrates its better efficiency. We plan to use the article results in intellectual text search area. Clustering and neural networks are main approaches used in this area.
Keywords:
branch and bound method, text search, graph models, Damerau-Lowenstein metricAcknowledgement
References
- Sokel R.R. Klaster-analiz i klassifikatsiya: predposylki i osnovnye napravleniya [Cluster analysis and classification: the background and the main directions]. In Dzh. Ven Rayzina (Ed.). Klassifikatsiya i klaster [Classification and cluster]. Moscow, Mir Publ., 1980, pp. 7-19. (In Russian)
- Levenshteyn V. I. Dvoichnye kody s ispravleniem vypadeniy, vstavok i zameshcheniy simvolov [Binary codes with correction for deletions, insertions and substitutions of characters]. Doklady Akademii nauk SSSR [Rep. of the Academy of Sciences of the USSR], 1965, vol. 163, no. 4, pp. 845-848. (In Russian)
- Land A.H., Doig A.G. An automatic method of solving discrete programming problems. Econometrica, 1960, pp. 497-520. doi: 10.1.1.308.7332.
- Bard G.V. Spelling-error tolerant, order-independent pass-phrases via the Damerau-Levenshtein string-edit distance metric. Proc. of ACSW '07 Proceedings of the Fifth Australasian symposium on ACSW frontiers, 2007, vol. 68, pp. 117-124.
- Garnaga V.V., Kol'tsov Yu.V., Trofimov B.I. Postroenie mekhanizma neyrosetevogo poiska na osnove algoritma rasshiryayushchegosya neyronnogo gaza [The construction of the mechanism of neural network based search algorithm expanding neural gas]. Izvestiya vuzov. Severo-Kavkazskiy region. Tekhnicheskie nauki [Proc. of the universities. North Caucasus region. Technical science], 2014, iss. 6, pp. 12-17. doi: 10.17213/0321-2653-2014-6-12-17
Downloads
Issue
Pages
Submitted
Published
How to Cite
Copyright (c) 2015 Trofimov B.I., Koltsov Yu.V., Garnaga V.V.
![Creative Commons License](http://i.creativecommons.org/l/by/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution 4.0 International License.