Pedagogy

THE ROLE OF CORPUS LNGUSTICS AND CORPUS APPROACH IN STUDENTS’ AUTONOUMOUS WORK

Darya N. Asanova, Elena O. Akkerman, Togzhan S. Toleuova

The article considers "corpus-based approach" in Foreign Language Teaching and the basic characteristics of this method that determine its reliability. The world practice of development of Corpus Linguistics (CL) proves the efficiency of these applications, though in today Kazakhstan the rich opportunities of CL methods have not been thoroughly realized in applied linguistics, linguistic pedagogy, native and foreign language teaching. We specify the types of corpora which can be used in practice of language teaching, and present the examples of corpora accessible to a teacher of foreign languages.

Key words: corpus-based approach, corpus Linguistics, linguistic pedagogy, autonomous work, concordance.

 

I. Introduction. At this period in the development of applied linguistics becomes increasingly important corpus linguistics. This science is the foundation and an integral part of most linguistic studies. More Several decades ago the creation of buildings was carried out only manually, which takes a lot of time, effort and cost. Therefore, they were created only in cases where there is a wide range of stakeholders. Now, organize and synchronize any material much easier, "cost" of this process has fallen sharply due to the development first computer, and then the corpus linguistics. We can create a body of texts almost seamlessly to any area of science and life, for a much narrower range of users, which is highly relevant for research in these areas.

II. The setting of objectives. In connection with the widespread distribution and greater and greater availability of personal computers and the Internet and translate the text into an electronic format, a new linguistic direction - corpus linguistics, based on a study of almost infinite array of texts. As was said V.P Zakharov, in his article "The Czech National corpus: the organization and methods of use", was formed in the past two decades, based on electronic computers and studying the construction of linguistic buildings, methods of data processing in cabinets and proper methodology for creating and using buildings. The term "housing", a key concept of corpus linguistics, have already been applied to the first collection of texts in English in electronic form: a Brown corpus - Brown corpus (about 1 million tokens of American English, 1961), the London housing - London-Lund corpus (about 500 thousand tokens of British English, collected in the late 60's - early 70's.). British researchers T.McEnery and A.Wilson suggest a common definition of "housing": 1) (free use) any text, 2) (the most common use) text in electronic form; 3) (special use) texts in electronic form which are selected to represent all functional styles of language.

One might say that linguistic, or language, the texts - is a "big, submitted in electronic form, a unified, structured, labeled, philologically competent array of language data, intended to address the specific linguistic problems". The concept of "corpus" also includes a management system textual and linguistic data, called the Corps manager (or the case manager). This is a specialized search engine, which includes software tools for data retrieval in the case, statistical information and providing results to the user in a convenient form. Corpus requires a modern linguist; they provide new opportunities for research, saving time and provide instant access to a very large volume of information. With the housing can learn the frequency of word forms, lexemes, etc., to trace the co-occurrence of words, especially their compatibility and management. Find the body of data allows for any word to build concordance - a list of all occurrences of the words in context, citing a source. Housings can be used to obtain various information and statistics on language and speech units. Representative array of language data for a certain period allows us to study the dynamics of change in the lexical structure of language, an analysis of lexical and grammatical characteristics of different genres and by different authors, and so on  [1,2].

The history of corpus linguistics has its origins in the 60-ies of XX century, when the first language text corpus. In 1963, U.S. scientists at Brown University W. Francis and H. Kuchera was first created a large body of texts on storage media (Brown Corpus). It is a set of five hundred printed texts in English, each of two thousand words. Lyrics belong to the fifteen most massive genres of printed English prose of the United States and were published in 1961, attached to the body and the frequency of alpha-frequency dictionary, a variety of statistical distributions.

This was followed by Lancaster corpus of English (Lancaster-Oslo-Bergen Corpus, LOB), Uppsala corpus of Russian language. Among the modern buildings of English are best known British National Corpus (British National Corpus), Corps International English Language (International Corpus of English), English linguistic Bank (Bank of English), etc. In the present set up enclosures for many languages around the world. Work is under way and on the creation of Russian National Corpus (RNC). In the first half of the 90’s Corpus linguistics was finally formed as a separate section of the science of language. However, she is working closely with computational linguistics, using her accomplishments and in turn enriching it.

In Russia, corpus linguistics began to emerge in early 1980. When were actively working on the project of Machine Fund of Russian Language. In this case, its main feature consists in the theoretical nature of the work: a question considered in detail what should be the Russian electronic corpus, he must be formed, what linguistic information must be present, etc. But the practical part of the research is clearly lagging behind, not enough support for a specific practice of creating housing.

According to V.P Zakharov, there are two main ways of dividing the hull into classes:

1) Contrasting the buildings belonging to the whole language, to any sub-language (genre, style and language of a certain age or social group, etc.);

2) Division of buildings by type of markup language: morphological, syntactic, semantic, prosodic, etc. (For details, types of layouts will be discussed later). In this respect, most of the buildings belong to the corps of morphological or syntactic type. Moreover, there is opposition between marked and unpartitioned cases.

According to the data type enclosures are divided into writing, speech, mixed. By genre - literary, dramatic, journalistic, etc. According to the structure allocated and the central archive, nuclear and peripheral enclosures. You can use many different criteria for classification, for example, the amount of text, availability, dynamism, and others. A variety of housing is determined by a variety of research and applied tasks for which they are created.

Currently, Corpus Linguistics and Corpus-Based Approach of written and oral texts have been successfully used in teaching foreign language and in linguistic pedagogy. On the basis of lists of buildings formed an active vocabulary of students, frequency lists of terms for use in professional courses, etc. Developers’ academic dictionaries and textbooks are based on arrays of authentic texts (Corpora). In addition, collections, libraries and arrays of texts reflect the actual functioning of a language, and their transfer to the computer environment only enhances their practical and widely used in applied linguistics.

Global practice development in this field proves the effectiveness of such applications, although at present the possibility of corpus linguistics methods in Kazakhstan have not yet been adequately implemented in applied linguistics, language learning, teaching native and foreign languages. Determined by the types of cases that can be used in the practice of teaching foreign languages, are examples of buildings that are available for classroom teachers of foreign languages. In this article, as an example, the practical use of parallel bodies in language training and translation services, and educational buildings in studies related to problems of development of foreign language. The report also reveals the effectiveness of the use of such computer-assisted corpus linguistics as a software-concordances in language tasks, including "computer-aided learning" foreign languages. In conclusion, marked the real application corpus linguistics, a method of structure-analysis in linguistic studies and practice of teaching foreign languages.

Corpus linguistics provides material for various studies of language and its variants, and defines the basic method of analyzing texts based on the chassis (Corpus-Based Approach) [3,4].

Corpus-Based Approach, or method of linguistic research based on the text corpus, is focused on applied learning the language, its functioning in real environments and texts, which is important for language teaching. For example, the lexicographic analysis based on the hull clearly helps to open the popup use of certain words, especially synonyms (e.g, small / little, big /large), the frequency of their compatibility with other words, the regularity in various styles, and to clearly define their semantics.

III.  Results. The main characteristics of the method, determining its reliability and validity of the following:

- An empirical and analyzes the real usage of natural language environment,

- Use a large enough, a representative selection of texts,

- Actively uses computers and special software-concordances for analysis in automatic and interactive modes,

- Based on the methods of statistical and qualitative analysis of text,

- Is the target, i.e should be focused on the actual application and results.

One important feature of the method of analysis based on the corpus is to study not only the purely linguistic phenomena (grammatical or lexical features of words, their relationships with other tokens), but also such phenomena as, for example, the frequency of lexical or grammatical constructions in different genres and dialects.

Electronic enclosures provide rich linguistic material for educational and research purposes. Currently, the Internet presents a number of classic electronic enclosures in foreign languages. The most famous of them are British and the American National Corpus of English, German-shell LIMAS, COSMAS. Of the most affordable for the average user, a teacher of foreign languages are Gutenberg Texts, British National Corpus Sampler, The Longman Corpus, LIMAS enclosures news Reuters, the electronic archives of major newspapers (for example, The Times).

As to the kind corpus, in Applied Linguistics is possible to use these types as:

Research – to study various aspects of the language system;

Illustrative, including training (Learner Corpus) - to confirm and justify the linguistic facts;

Monitor - to study the dynamics of language material, conducting content analysis, for example, the case of journalism;

Static - to study the styles, for example, copyrights, shell or body of texts writers;

Multimedia - text + video + audio;

Housings parallel texts - for comparative analysis with "original-translation" for teaching methods and techniques of translation. There are two basic forms of these buildings: "original-translation / s" (Unidirectional), "the original - translation - back translation" (Bidirectional or reciprocal), arranged in parallel.

In this article, as an example, we consider the practical use of parallel bodies in language training and educational buildings in studies related to problems of development of foreign language.

The methodology of teaching the language (grammar-conversion method) and method of teaching translation interesting application is the development of parallel electronic text corpus (Parallel Corpora) and the use of software-concordances parallel texts. Such developments in Russia are under development, although the parallel texts have long been used for comparative translation and learning [5].

In practical terms, the translation must respect the limitations of post-editing, comparing and evaluating different strategies and interpretations in the context. Translator (especially beginners) requires resources, which could act as standards of interpretation and evaluation of translation in one or another "standard" conditions. By some estimates about 50%, and at the initial stage of training to 80% of the time spent on translation of an appeal to abstract information, such as dictionaries. Electronic enclosures and linguistic parallel computer technology can significantly reduce these time costs, and provide samples of a professional translation in the study of techniques and methods of translation.

At the present time is particularly common body (or parallel texts), literature, although for teaching translation at university should develop a corps of different genres and styles, and primarily focus on the scientific, technical, journalistic and business texts.

Under the academic buildings (Learner Corpus) means an electronic corpus of people studying foreign language. The main purpose of the organization of educational buildings is their analysis to identify the methods and efficiency of development of the studied language (Language Acquisition).

This kind of enclosures, for example, can be used for linguistic analysis to identify the lexical or syntax errors during the development of foreign language. This approach helps to establish the incidence of certain types of language errors, typical contexts, it is necessary to formulate plans and instructional techniques for further correction in language learning [6,7].

Academic buildings are most common in Asia and Europe. The most famous is the International English corpus ICLE (International Corpus of Learner English) essays the students an advanced language level [4, 5]. This housing is mostly used for discourse analysis and statistical analysis of vocabulary students of comparative research. This case is a showcase for developments in the case and applied linguistics.

In the field of applied linguistics concordances (Concordances) received special recognition linguists due to new studies of language and effective processing of lexical material texts of various kinds. In recent years, computer concordances have been actively used for automated foreign language teaching (or CALL - Computer Assisted Language Learning).

Concordance program - a special word processing program, which put some linguistic task of finding a particular morpheme, word or phrase in context. For example, in the case of the English language - found in this group of text versions of the indefinite article, or all words ending in "-ing". As a result of program-concordance will give all the words with this ending, along with the context, as a rule - a line of text.

Thus, the teacher gets a lot of examples of how the grammatical and lexical forms of words (in this example, verbal nouns, gerunds, form of the verb - participle I, etc.). The student, in turn, receives natural examples demonstrate one or another grammatical or lexical phenomena that can independently carry out linguistic research, engage in research [8,9].

In the study of grammar of a foreign language student can be offered to find and analyze the forms of expression and the use of difficult times (e.g., Perfect), modal verbs and their role in the sentence, place adverbs in a sentence, etc. In the lexicon - for example, to find and explain the examples of these often cause trouble using words like MAKE / DO, RISE / RAISE, TELL / SAY, LIE / LAY, etc. In the syntax - for example, examine the punctuation of a language and to identify differences compared with the native language. Sources for such work can serve not only as a special corps of electronic texts, but also a variety of electronic publishing, digital libraries (e.g., Internet).

IV. Conclusion. In conclusion, we note that concordances are modern efficient tool for the analysis of text that should be actively used in the practice of language teaching and language problems.

Analysis of the corpus, methods and achievements of corpus linguistics is a promising direction in the Teaching of Foreign Languages. Global practice development in this field proves the effectiveness of such applications, although at present the possibility of corpus linguistics methods have not yet been properly implemented in applied linguistics, language learning, teaching native and foreign languages.

 

References:

1) Apresyan Y.D., Iomdin L.L., Sannikov A.V., Sizov V.G. Semantic markup in a deeply annotated corpus of Russian language. / / Collection: Proceedings of International Conference Proceedings of International Conference “MegaLing'2005. Applied Linguistics in search of new ways” - 2005.

2) Vakhitova D.T Creating a corpus of Corpus Linguistics, course work, 2 year. 2006.

3) Zakharov V.P. Corpus linguistics, educational allowance, St. Petersburg, 2005.

4) Leontiev N.B. Incompleteness and meaningful contraction in the text body // Collection: Proceedings of the International Conference “Proceedings of the International Conference” Mega Ling'2005. Applied Linguistics in search of new ways - 2005.

5) Marchuk Y.N. Corpus and extra-linguistic data base // Collection: Proceedings of the International Conference on Corpus Linguistics - 2002. - Publishing of St. Petersburg University, 2002.

6) Milchonoka E.P. Overview of the resources of Latvian language at the Institute of Mathematics and Computer Science, University of Latvia // Proceedings: Proceedings of the International Conference on Corpus Linguistics - 2002. - Publishing of St. Petersburg University, 2002.

7) Shimkova M.Z. Representativeness of the body as a linguistic problem: Collection: Proceedings of International Conference Proceedings of International Conference “MegaLing'2005. Applied Linguistics in search of new ways”- 2005. Web sites:

8)Averin, A.N. “Problems of ramatical analysis” // http://corpspb.narod.ru/Docs/Averin.doc

9) Rykov V.V. Pragmatically oriented corpus // http://rykov-cl.narod.ru/t.html