Martyshenko S.N., Egorov E.A.
Vladivostok state university of economy and service,
Russia
Intellectual system
of noN-structured data processing
In
recent years Russia is taking active steps to create the information society. At such large spaces, which our country has, we have
no way of establishing sustainable relationships in economics and management
except to develop rapidly this research area.
The scientific development to create new
methods and technologies of data processing are very important in these
circumstances. The direction of processing quality and unstructured data is one
of new research directions in data processing.
We use
mass questionnaires data, which included, along with the usual free-answer
questions, to test developed by us information technology. If earlier we processed
only simple phrases, now using results of our research we can process the answers
consisting of a series of offers, formulated in any form.
To be able to represent non-structured or
poorly structured data, we have expanded concept of classical data presentation
in tabular form ”object-property” [1], introducing the concept of a combined
marker [2,3].
A
combined marker appears when a respondent can choose several answers for the
same question. For instance, a respondent
can point out several cities to the question “What large cities have you
visited for the recent three years”. Thus, a combined marker consists of at
least more than one simple answer. To identify the combined marker we need to
introduce a single divider symbol. A simple answer may consist of several words
or even a whole sentence.
The
transformation from non-structured data representation to structured representation
is based on the typification operation.
Typification
operation is a substitution of source simple statement (in a form of text) by
generalizing statement (in a form of text) with similar or close meaning. Typification operation is performed with the help of a
tabulated “marker meaning list”. One of the columns in the table contains all
the unique meanings of the source marker.
While a combined marker is subject to
typification operation, all the simple statements in a complex or combined
statement are included into “marker meaning list”. “Marker meaning list” table
contains a column where meaning frequency is calculated. Typification operation
is used for “marker meaning list” table data rather than “object-marker” table
data.
There
are three levels of typification with different level of the initial values
generalization (the columns of the table "marker meaning list"). Developed information technology helps to automate
user work. But main intellectual work remains in the researcher competence.
Knowledge base formed during processing has a special role for automation. This
unit is particularly useful for the massive and repeated surveys. Knowledge
base is represented by three dictionaries: “Redundant information dictionary”,
“Substitution dictionary”, “Key word dictionary”
As a result of open question data processing we will
receive the following output:
- three new
representations of marker (property) included into the source data table, which
can be subject to subsequent processing for obtaining informative conclusions;
- a “marker
meaning list” table which can be used for a repeated questionnaire survey or
typification practice with any other questionnaires developed for the process investigation;
- knowledge base
in the form of three dictionaries: “substitution dictionary”, “key word
dictionary” and “redundant information dictionary”.
Technology of processing unstructured data is given in Fig. 1. Efficiency
of qualitative data processing computer technology can be increased by
knowledge base creation and use. Computer technologies allowing for knowledge
base use belong to expert system class. The main distinctive feature of an
expert system is that it is capable of correct forecasting. By giving various
hints to user during operation specialized software can save much time [1].
User hints are generated by special dictionaries. The dictionaries are formed in during typification operations. These dictionaries store user’s experience gained through solving qualitative marker typification tasks. Âñå ñëîâàðè õðàíÿòñÿ â îäíîì ôàéëå áàçû äàííûõ Access.
Fig.
1. Qualitative data processing computer
technology scheme
Extended structure of the data table "object-property" demanded the creation special processing tools for obtaining informative conclusions. The proposed technology represents a full cycle of non-structured data processing. Technology is implemented as EXCEL macros, which allows wide range of users to use all features of this package.
The above technology received approval by processin large questionaries. We used the questionnaires data collected by the department of marketing and commerce, Vladivostok State University of Economics and Service and developed a series of consumer recreational resources typologies of Primorye. Some of the processing results can be found in [4,5].
However,
the technology can be used not only for questionnaire data processing, but also
for computer processing of the population complaints or electoral mandate.
Creating
a knowledge base requires a lot of time. But then it can be very useful not
only to the researcher, but other researchers worked on similar problems. Knowledge
base, created by one researcher, can be transmitted as a file.
Above
technology is constantly improving. The
authors plan to expand intellectual abilities of the knowledge base .
References
1.
Zagoruiko N.G. Applied methods of data and knowledge analysis. — Novosibirsk: Publishing house of Mathematics Institute,
P. 199. — 270.
2.
Martyshenko S.N., Martyshenko N.S., Kustov D.A. Improvement of mathematical
and software processing of primary data in economic and sociological researches//
Bulletin of the Pacific State Economic University. — 2006.
— ¹2. P. 91–103.
3.
Martyshenko S.N., Martyshenko N.S., Kustov D.A. Development tools of typologies
by questionnaires data in EXCEL / / Academic journal of western Siberia. -
2007. - ¹ 1. P. 115-117.
4.
Martyshenko N.S. Formation of the tourism cluster and management of its
development in the Primorye territory / Region: systems, economics, management.
- 2008. - ¹ 1. P. 122-132.
5.
Martyshenko N.S. Formation of the development strategy for tourism enterprises:
Monograph: Dal'nauka, 2009. P. 214.