Martyshenko S.N., Egorov E.A.

Vladivostok state university of economy and service, Russia

Intellectual system of noN-structured data processing

In recent years Russia is taking active steps to create the information society. At such large spaces, which our country has, we have no way of establishing sustainable relationships in economics and management except to develop rapidly this research area.

The scientific development to create new methods and technologies of data processing are very important in these circumstances. The direction of processing quality and unstructured data is one of new research directions in data processing.

We use mass questionnaires data, which included, along with the usual free-answer questions, to test developed by us information technology. If earlier we processed only simple phrases, now using results of our research we can process the answers consisting of a series of offers, formulated in any form.

To be able to represent non-structured or poorly structured data, we have expanded concept of classical data presentation in tabular form ”object-property” [1], introducing the concept of a combined marker [2,3].

A combined marker appears when a respondent can choose several answers for the same question. For instance, a respondent can point out several cities to the question “What large cities have you visited for the recent three years”. Thus, a combined marker consists of at least more than one simple answer. To identify the combined marker we need to introduce a single divider symbol. A simple answer may consist of several words or even a whole sentence.

The transformation from non-structured data representation to structured representation is based on the typification operation.

Typification operation is a substitution of source simple statement (in a form of text) by generalizing statement (in a form of text) with similar or close meaning. Typification operation is performed with the help of a tabulated “marker meaning list”. One of the columns in the table contains all the unique meanings of the source marker.

While a combined marker is subject to typification operation, all the simple statements in a complex or combined statement are included into “marker meaning list”. “Marker meaning list” table contains a column where meaning frequency is calculated. Typification operation is used for “marker meaning list” table data rather than “object-marker” table data.

There are three levels of typification with different level of the initial values generalization (the columns of the table "marker meaning list"). Developed information technology helps to automate user work. But main intellectual work remains in the researcher competence. Knowledge base formed during processing has a special role for automation. This unit is particularly useful for the massive and repeated surveys. Knowledge base is represented by three dictionaries: “Redundant information dictionary”, “Substitution dictionary”, “Key word dictionary”

As a result of open question data processing we will receive the following output:

- three new representations of marker (property) included into the source data table, which can be subject to subsequent processing for obtaining informative conclusions;

- a “marker meaning list” table which can be used for a repeated questionnaire survey or typification practice with any other questionnaires developed for  the process investigation;

- knowledge base in the form of three dictionaries: “substitution dictionary”, “key word dictionary” and “redundant information dictionary”.

Technology of processing unstructured data is given in Fig. 1. Efficiency of qualitative data processing computer technology can be increased by knowledge base creation and use. Computer technologies allowing for knowledge base use belong to expert system class. The main distinctive feature of an expert system is that it is capable of correct forecasting. By giving various hints to user during operation specialized software can save much time [1].

User hints are generated by special dictionaries. The dictionaries are formed in during typification operations. These dictionaries store user’s experience gained through solving qualitative marker typification tasks. Âñå ñëîâàðè õðàíÿòñÿ â îäíîì ôàéëå áàçû äàííûõ Access.

Fig. 1. Qualitative data processing computer  technology scheme

Extended structure of the data table "object-property" demanded the creation special processing tools for obtaining informative conclusions. The proposed technology represents a full cycle of non-structured data processing. Technology is implemented as EXCEL macros, which allows wide range of users to use all features of this package.

The above technology received approval by processin large questionaries. We used the questionnaires data collected by the department of marketing and commerce, Vladivostok State University of Economics and Service and developed a series of consumer recreational resources typologies of Primorye. Some of the processing results can be found in [4,5].

However, the technology can be used not only for questionnaire data processing, but also for computer processing of the population complaints or electoral mandate.

Creating a knowledge base requires a lot of time. But then it can be very useful not only to the researcher, but other researchers worked on similar problems. Knowledge base, created by one researcher, can be transmitted as a file.

Above technology is constantly improving. The authors plan to expand intellectual abilities of the knowledge base .

References

1.        Zagoruiko N.G. Applied methods of data and knowledge analysis. — Novosibirsk: Publishing house of Mathematics Institute, P. 199. — 270.

2.        Martyshenko S.N., Martyshenko N.S., Kustov D.A. Improvement of mathematical and software processing of primary data in economic and sociological researches// Bulletin of the Pacific State Economic University. 2006. ¹2. P. 91–103.

3.        Martyshenko S.N., Martyshenko N.S., Kustov D.A. Development tools of typologies by questionnaires data in EXCEL / / Academic journal of western Siberia. - 2007. - ¹ 1. P. 115-117.

4.        Martyshenko N.S. Formation of the tourism cluster and management of its development in the Primorye territory / Region: systems, economics, management. - 2008. - ¹ 1. P. 122-132.

5.        Martyshenko N.S. Formation of the development strategy for tourism enterprises: Monograph: Dal'nauka, 2009. P. 214.