Masich I.S.
Siberian State Aerospace University
named after academician M. F. Reshetnev, Russia
Pattern
search in data for solving
practical
problems of
recognition
À logical method for solving recognition problem is considered in the paper. A decision making
model consists of logical rules that describe patterns in the studied phenomenon or system. The main task is to
find these patterns and to make it suitable for decision making support.
Creation and
use of logical algorithms for recognition are based on identifying patterns in
the original data. A set of these patterns forms a decision function. Search of
patterns can be considered as a problem of combinatorial optimization. For more
effective solutions the choice of optimization algorithm should be made on the
basis of the characteristic features specific to the considered optimization
problem. In this paper some properties of optimization problems that solved for
finding logical patterns in the data are considered.
Consider the
recognition problem for objects described by binary attributes and divided into
two classes . The object is describe by a binary vector and can be presented as a point in the subcube of
binary attribute space .
Denote by
a pattern that cover some object . The variables that fixed in are equal to the corresponding values of the attributes of
the object a. To define the pattern introduce binary
variables
:
Patterns are
the elementary building blocks for logical recognition algorithms. Most useful
for this purpose are the patterns with the highest coverage (maximum pattern),
that is, those for which is maximum.
The problem
of finding the maximum patterns can be written as the problem of finding such
values , so that the resulting pattern covers as many points as possible and does not cover any points [1]:
(1)
for all (2)
This problem
is a problem of constrained pseudo-Boolean optimization, i.e. an optimization
problem in which the objective function and the functions in the left side of
the constraints are pseudo-Boolean functions – real functions of Boolean
variables [2, 3].
Research
results show that the search space has sets of constancy of the objective
function. That complicates work of optimization algorithms which start search
from a feasible point and make search among the neighboring points, since the
calculation of the objective function in the system of neighborhoods consisting
of neighboring points provides no information on best direction of the search. When
solving practical problems of higher dimensions a set of constancy may be such
that it covers most of the points of the feasible region. That makes it
difficult or impossible to work for such algorithms as the genetic algorithm, multi-start
of local search.
The main
difficulty in the presence of sets of constancy, that is connected sets, in
which the function takes the same value, is the absence of information about
what direction should look for the optimal or suboptimal solutions. This
concerns the behavior of the objective function (1) and the constraint function
(2) in the space of binary variables. One way to improve the situation is use
the information not only about the pattern covers objects of the sample, but
also use data on the distance to objects uncovered yet.
An
experimental comparison of the two algorithms for search rules is made on a
number of problems of recognition [4, 5].
The
experimental results show that the use of closeness of sampling objects to the
pattern allows us to overcome the difficulties that are associated with the
characteristic features of the solved optimization problem and manifest
themselves in the presence of sets of constancy. That makes possible to find better
patterns in the data to be used in solving the problems of recognition.
Literature
1. Hammer P.L.,
Bonates T. Logical Analysis of Data: From Combinatorial Optimization to Medical
Applications. RUTCOR Research Report 10-2005, 2005.
2. A.N. Antamoshkin,
I.S. Masich. Identification of pseudo-Boolean function conditional
optimization. Engineering & automation problems. N. 2, 2007, 66-69.
3. A.N. Antamoshkin,
I.S. Masich. Heuristic search algorithms for monotone pseudo-boolean function
conditional optimization. Engineering
& automation problems.
N. 3, 2007, 41-45.
4. UCI Machine
Learning Repository. http://archive.ics.uci.edu/ml/index.html.
5. Complications of
myocardial infarction: a database for testing systems of recognition and
forecasting / S.E. Golovenkin, A.N. Gorban, V.A. Shulman and others.
Krasnoyarsk, Computing Center of RAS, Preprint 6, 1997.