Algorithm about selection of the characteristic element in a clustering process's set
UDC
004.424.4, 004.93.14, 004.021, 0EDN
VHRPNBAbstract
People use classification for objects organization into groups since ancient times. In one of his articles Robert Sokal notes that classification is high level of intellectual activity and it helps to understand the nature. Clustering is result of software algorithms applying to classification. This approach allows deploying data mining to classified information. The article describes an algorithm for a cluster characteristic element selection and its formal requirements definition. One of areas for the algorithm’s applying is intellectual text search systems. A main purpose of the article is description of an algorithm for characteristic element selection. The algorithm should have less asymptotic estimate operating time than enumeration of all elements. A main idea based on the classical method of branches and borders. An original part of the algorithm is errors estimates comparison for selected characteristic element. Also, the article describes two algorithms for random test data generation. Showed results of these tests illustrate and explain advantages of the main algorithm in comparison with the enumeration algorithm. An empirical assessment of the proposed algorithm convergence demonstrates its better efficiency. We plan to use the article results in intellectual text search area. Clustering and neural networks are main approaches used in this area.
Keywords:
branch and bound method, text search, graph models, Damerau-Lowenstein metricFunding information
Работа выполнена при поддержке РФФИ (13-01-00807).
References
- Сокэл Р.Р. Кластер-анализ и классификация: предпосылки и основные направления. В кн.: Классификация и кластер / Под ред. Дж. Вэн Райзина. М.: Мир, 1980. С. 7-19. [Sokel R.R. Klaster-analiz i klassifikatsiya: predposylki i osnovnye napravleniya [Cluster analysis and classification: the background and the main directions]. In Dzh. Ven Rayzina (Ed.). Klassifikatsiya i klaster [Classification and cluster]. Moscow, Mir Publ., 1980, pp. 7-19. (In Russian)]
- Левенштейн В. И. Двоичные коды с исправлением выпадений, вставок и замещений символов // ДАН СССР. 1965. Т. 163. № 4. С. 845-848. [Levenshteyn V. I. Dvoichnye kody s ispravleniem vypadeniy, vstavok i zameshcheniy simvolov [Binary codes with correction for deletions, insertions and substitutions of characters]. Doklady Akademii nauk SSSR [Rep. of the Academy of Sciences of the USSR], 1965, vol. 163, no. 4, pp. 845-848. (In Russian)]
- Land A.H., Doig A.G. An automatic method of solving discrete programming problems // Econometrica. 1960. С. 497-520. doi: 10.1.1.308.7332.
- Bard G.V. Spelling-error tolerant, order-independent pass-phrases via the Damerau-Levenshtein string-edit distance metric // Proc. of ACSW '07 Proceedings of the Fifth Australasian symposium on ACSW frontiers. 2007. Vol. 68. P. 117-124.
- Гарнага В.В., Кольцов Ю.В., Трофимов Б.И. Построение механизма нейросетевого поиска на основе алгоритма расширяющегося нейронного газа // Известия вузов. Северо-Кавказский регион. Технические науки. 2014. Вып. 6. С. 12-17. doi: 10.17213/0321-2653-2014-6-12-17 [Garnaga V.V., Kol'tsov Yu.V., Trofimov B.I. Postroenie mekhanizma neyrosetevogo poiska na osnove algoritma rasshiryayushchegosya neyronnogo gaza [The construction of the mechanism of neural network based search algorithm expanding neural gas]. Izvestiya vuzov. Severo-Kavkazskiy region. Tekhnicheskie nauki [Proc. of the universities. North Caucasus region. Technical science], 2014, iss. 6, pp. 12-17. doi: 10.17213/0321-2653-2014-6-12-17]
Downloads
Downloads
Dates
Submitted
Accepted
Published
How to Cite
License
Copyright (c) 2015 Трофимов Б.И., Кольцов Ю.В., Гарнага В.В.

This work is licensed under a Creative Commons Attribution 4.0 International License.