Darya N. Asanova, Elena O. Akkerman, Togzhan S. Toleuova
The article considers
"corpus-based approach" in Foreign Language Teaching and the basic
characteristics of this method that determine its reliability. The world
practice of development of Corpus Linguistics (CL) proves the efficiency of
these applications, though in today Kazakhstan the rich opportunities of CL
methods have not been thoroughly realized in applied linguistics, linguistic
pedagogy, native and foreign language teaching. We specify the types of corpora
which can be used in practice of language teaching, and present the examples of
corpora accessible to a teacher of foreign languages.
Key words: corpus-based approach, corpus Linguistics, linguistic pedagogy,
autonomous work, concordance.
I. Introduction. At this period in the development of applied linguistics
becomes increasingly important corpus linguistics. This science is the
foundation and an integral part of most linguistic studies. More Several decades ago the
creation of buildings was carried out only manually, which takes a lot of time,
effort and cost. Therefore, they were created only in cases where there is a
wide range of stakeholders. Now, organize and synchronize any material much
easier, "cost" of this process has fallen sharply due to the
development first computer, and then the corpus linguistics. We can create a
body of texts almost seamlessly to any area of science and life, for a much
narrower range of users, which is highly relevant for research in these areas.
II. The setting of
objectives. In connection with the widespread distribution and greater and
greater availability of personal computers and the Internet and translate the text
into an electronic format, a new linguistic direction - corpus linguistics,
based on a study of almost infinite array of texts. As was said V.P Zakharov,
in his article "The Czech National corpus: the organization and methods of
use", was formed in the past two decades, based on electronic computers
and studying the construction of linguistic buildings, methods of data
processing in cabinets and proper methodology for creating and using buildings.
The term "housing", a key concept of corpus linguistics, have already
been applied to the first collection of texts in English in electronic form: a
Brown corpus - Brown corpus (about 1 million tokens of American English, 1961),
the London housing - London-Lund corpus (about 500 thousand tokens of British
English, collected in the late 60's - early 70's.). British researchers T.McEnery
and A.Wilson suggest a common definition of "housing": 1)
(free use) any text, 2) (the most common use) text in electronic form; 3)
(special use) texts in electronic form which are selected to represent all
functional styles of language.
One might say that linguistic, or language, the texts - is a "big,
submitted in electronic form, a unified, structured, labeled, philologically
competent array of language data, intended to address the specific linguistic
problems". The concept of "corpus" also includes a management
system textual and linguistic data, called the Corps manager (or the case
manager). This is a specialized search engine, which includes software tools
for data retrieval in the case, statistical information and providing results
to the user in a convenient form. Corpus requires a modern linguist; they
provide new opportunities for research, saving time and provide instant access
to a very large volume of information. With the housing can learn the frequency
of word forms, lexemes, etc., to trace the co-occurrence of words, especially
their compatibility and management. Find the body of data allows for any word
to build concordance - a list of all
occurrences of the words in context, citing a source. Housings can be used to
obtain various information and statistics on language and speech units.
Representative array of language data for a certain period allows us to study
the dynamics of change in the lexical structure of language, an analysis of
lexical and grammatical characteristics of different genres and by different
authors, and so on [1,2].
The history of corpus linguistics has its origins in the 60-ies of XX
century, when the first language text corpus. In 1963, U.S. scientists at Brown
University W. Francis and H. Kuchera was first created a large body of texts on
storage media (Brown Corpus). It is a set of five hundred printed texts in
English, each of two thousand words. Lyrics belong to the fifteen most massive genres
of printed English prose of the United States and were published in 1961,
attached to the body and the frequency of alpha-frequency dictionary, a variety
of statistical distributions.
This was followed by Lancaster corpus of English (Lancaster-Oslo-Bergen
Corpus, LOB), Uppsala corpus of Russian language. Among the modern buildings of
English are best known British National Corpus (British National Corpus), Corps
International English Language (International Corpus of English), English
linguistic Bank (Bank of English), etc. In the present set up enclosures for
many languages around the world. Work is under way and on the creation of
Russian National Corpus (RNC). In the first half of the 90’s Corpus linguistics
was finally formed as a separate section of the science of language. However,
she is working closely with computational linguistics, using her
accomplishments and in turn enriching it.
In Russia, corpus linguistics began to emerge in early 1980. When were actively working on the project of
Machine Fund of Russian Language. In this case, its main feature consists in
the theoretical nature of the work: a question considered in detail what should
be the Russian electronic corpus, he must be formed, what linguistic
information must be present, etc. But the practical part of the research is
clearly lagging behind, not enough support for a specific practice of creating
housing.
According to V.P Zakharov, there are two main ways of dividing the hull
into classes:
1) Contrasting the buildings
belonging to the whole language, to any sub-language (genre, style and language
of a certain age or social group, etc.);
2) Division of buildings by type
of markup language: morphological, syntactic, semantic, prosodic, etc. (For
details, types of layouts will be discussed later). In this respect, most of
the buildings belong to the corps of morphological or syntactic type. Moreover,
there is opposition between marked and unpartitioned cases.
According to the data type enclosures are divided into writing, speech,
mixed. By genre - literary, dramatic, journalistic, etc. According to the
structure allocated and the central archive, nuclear and peripheral enclosures.
You can use many different criteria for classification, for example, the amount
of text, availability, dynamism, and others. A variety of housing is determined by a variety of research and
applied tasks for which they are created.
Currently, Corpus Linguistics and
Corpus-Based Approach of
written and oral texts have been successfully used in teaching foreign language
and in linguistic pedagogy. On the basis of lists of buildings formed an active
vocabulary of students, frequency lists of terms for use in professional
courses, etc. Developers’ academic dictionaries and textbooks are based on
arrays of authentic texts (Corpora). In addition, collections, libraries and
arrays of texts reflect the actual functioning of a language, and their transfer
to the computer environment only enhances their practical and widely used in
applied linguistics.
Global practice development in this field proves the effectiveness of such
applications, although at present the possibility of corpus linguistics methods
in Kazakhstan have not yet been adequately implemented in applied linguistics,
language learning, teaching native and foreign languages. Determined by the
types of cases that can be used in the practice of teaching foreign languages,
are examples of buildings that are available for classroom teachers of foreign
languages. In this article, as an example, the practical use of parallel bodies
in language training and translation services, and educational buildings in
studies related to problems of development of foreign language. The report also
reveals the effectiveness of the use of such computer-assisted corpus
linguistics as a software-concordances in language tasks, including
"computer-aided learning" foreign languages. In conclusion, marked
the real application corpus linguistics, a method of structure-analysis in
linguistic studies and practice of teaching foreign languages.
Corpus linguistics provides material for various studies of language and
its variants, and defines the basic method of analyzing texts based on the chassis
(Corpus-Based Approach) [3,4].
Corpus-Based Approach, or method of linguistic research based on the text
corpus, is focused on applied learning the language, its functioning in real
environments and texts, which is important for language teaching. For example,
the lexicographic analysis based on the hull clearly helps to open the popup
use of certain words, especially synonyms (e.g, small / little, big /large),
the frequency of their compatibility with other words, the regularity in
various styles, and to clearly define their semantics.
III. Results. The main
characteristics of the method, determining its reliability and validity of the
following:
- An empirical and analyzes the real usage of natural language environment,
- Use a large enough, a representative selection of texts,
- Actively uses computers and special software-concordances for analysis in
automatic and interactive modes,
- Based on the methods of statistical and qualitative analysis of text,
- Is the target, i.e should be focused on the actual application and
results.
One important feature of the method of analysis based on the corpus is to
study not only the purely linguistic phenomena (grammatical or lexical features
of words, their relationships with other tokens), but also such phenomena as,
for example, the frequency of lexical or grammatical constructions in different
genres and dialects.
Electronic enclosures provide rich linguistic material for educational and
research purposes. Currently, the Internet presents a number of classic
electronic enclosures in foreign languages. The most famous of them are British
and the American National Corpus of English, German-shell LIMAS, COSMAS. Of the
most affordable for the average user, a teacher of foreign languages are
Gutenberg Texts, British National Corpus Sampler, The Longman Corpus, LIMAS
enclosures news Reuters, the electronic archives of major newspapers (for
example, The Times).
As to the kind corpus, in Applied Linguistics is possible to use these
types as:
Research – to study various aspects of the language system;
Illustrative, including training (Learner Corpus) - to confirm and justify
the linguistic facts;
Monitor - to study the dynamics of language material, conducting content
analysis, for example, the case of journalism;
Static - to study the styles, for example, copyrights, shell or body of
texts writers;
Multimedia - text + video + audio;
Housings parallel texts - for comparative analysis with
"original-translation" for teaching methods and techniques of
translation. There are two basic forms of these buildings:
"original-translation / s" (Unidirectional), "the original -
translation - back translation" (Bidirectional or reciprocal), arranged in
parallel.
In this article, as an example, we consider the practical use of parallel
bodies in language training and educational buildings in studies related to
problems of development of foreign language.
The methodology of teaching the language (grammar-conversion method) and
method of teaching translation interesting application is the development of
parallel electronic text corpus (Parallel Corpora) and the use of software-concordances
parallel texts. Such developments in Russia are under development, although the
parallel texts have long been used for comparative translation and learning [5].
In practical terms, the translation must respect the limitations of
post-editing, comparing and evaluating different strategies and interpretations
in the context. Translator (especially beginners) requires resources, which
could act as standards of interpretation and evaluation of translation in one
or another "standard" conditions. By some estimates about 50%, and at
the initial stage of training to 80% of the time spent on translation of an
appeal to abstract information, such as dictionaries. Electronic enclosures and
linguistic parallel computer technology can significantly reduce these time costs,
and provide samples of a professional translation in the study of techniques
and methods of translation.
At the present time is particularly common body (or parallel texts),
literature, although for teaching translation at university should develop a corps
of different genres and styles, and primarily focus on the scientific,
technical, journalistic and business texts.
Under the academic buildings (Learner Corpus) means an electronic corpus of
people studying foreign language. The main purpose of the organization of
educational buildings is their analysis to identify the methods and efficiency
of development of the studied language (Language Acquisition).
This kind of enclosures, for example, can be used for linguistic analysis
to identify the lexical or syntax errors during the development of foreign
language. This approach helps to establish the incidence of certain types of
language errors, typical contexts, it is necessary to formulate plans and
instructional techniques for further correction in language learning [6,7].
Academic buildings are most common in Asia and Europe. The most famous is
the International English corpus ICLE (International Corpus of Learner English)
essays the students an advanced language level [4, 5]. This housing is mostly
used for discourse analysis and statistical analysis of vocabulary students of
comparative research. This case is a showcase for developments in the case and
applied linguistics.
In the field of applied linguistics concordances (Concordances) received
special recognition linguists due to new studies of language and effective
processing of lexical material texts of various kinds. In recent years,
computer concordances have been actively used for automated foreign language
teaching (or CALL - Computer Assisted Language Learning).
Concordance program - a special word processing program, which put some
linguistic task of finding a particular morpheme, word or phrase in context.
For example, in the case of the English language - found in this group of text
versions of the indefinite article, or all words ending in "-ing". As
a result of program-concordance will give all the words with this ending, along
with the context, as a rule - a line of text.
Thus, the teacher gets a lot of examples of how the grammatical and lexical
forms of words (in this example, verbal nouns, gerunds, form of the verb -
participle I, etc.). The student, in turn, receives natural examples
demonstrate one or another grammatical or lexical phenomena that can
independently carry out linguistic research, engage in research [8,9].
In the study of grammar of a foreign language student can be offered to
find and analyze the forms of expression and the use of difficult times (e.g.,
Perfect), modal verbs and their role in the sentence, place adverbs in a
sentence, etc. In the lexicon - for example, to find and explain the examples
of these often cause trouble using words like MAKE / DO, RISE / RAISE, TELL /
SAY, LIE / LAY, etc. In the syntax - for example, examine the punctuation of a
language and to identify differences compared with the native language. Sources
for such work can serve not only as a special corps of electronic texts, but
also a variety of electronic publishing, digital libraries (e.g., Internet).
IV. Conclusion. In conclusion, we note that concordances are modern efficient tool for the analysis
of text that should be actively used in the practice of language teaching and
language problems.
Analysis of the corpus, methods and achievements of corpus linguistics is a
promising direction in the Teaching of Foreign Languages. Global practice
development in this field proves the effectiveness of such applications,
although at present the possibility of corpus linguistics methods have not yet
been properly implemented in applied linguistics, language learning, teaching
native and foreign languages.
1) Apresyan Y.D.,
Iomdin L.L., Sannikov A.V., Sizov V.G. Semantic markup in a deeply annotated
corpus of Russian language. / / Collection: Proceedings of International
Conference Proceedings of International Conference “MegaLing'2005. Applied
Linguistics in search of new ways” - 2005.
2) Vakhitova D.T
Creating a corpus of Corpus Linguistics, course work, 2 year. 2006.
3) Zakharov V.P.
Corpus linguistics, educational allowance, St. Petersburg, 2005.
4) Leontiev N.B.
Incompleteness and meaningful contraction in the text body // Collection:
Proceedings of the International Conference “Proceedings of the International
Conference” Mega Ling'2005. Applied Linguistics in search of new ways - 2005.
5) Marchuk Y.N.
Corpus and extra-linguistic data base // Collection: Proceedings of the
International Conference on Corpus Linguistics - 2002. - Publishing of St.
Petersburg University, 2002.
6) Milchonoka
E.P. Overview of the resources of Latvian language at the Institute of
Mathematics and Computer Science, University of Latvia // Proceedings:
Proceedings of the International Conference on Corpus Linguistics - 2002. -
Publishing of St. Petersburg University, 2002.
7) Shimkova M.Z.
Representativeness of the body as a linguistic problem: Collection: Proceedings
of International Conference Proceedings of International Conference “MegaLing'2005.
Applied Linguistics in search of new ways”- 2005. Web sites:
8)Averin, A.N.
“Problems of ramatical analysis” // http://corpspb.narod.ru/Docs/Averin.doc
9) Rykov V.V. Pragmatically
oriented corpus // http://rykov-cl.narod.ru/t.html