2.4.6 Corpus-driven Study of Cross-linguistic Equivalence
In the field of translation studies,the concept of equivalence comes under regular scrutiny.There are so many different perspectives from which this concept is interpreted that perhaps this alone would fill a book.Sager,for example,argues that“fundamental to all theories of translation is the concept of equivalence”(1994:142).However,it is notoriously difficult to offer a definition of equivalence widely adopted in translation studies.An obvious reason is that there are various levels of equivalence(e.g.Baker 1992)and there is no consensus as to what and how many different levels should exactly be taken into consideration as parameters of equivalence.
Such difficulty is partially due to the fact that“the conceptual world evolves differently in different languages”(Altenberg&Granger 2002:21).Different historical,cultural,geographical and social settings make complete equivalence between expressions in different languages extremely difficult.The most typical example is the way of encoding kinship in English and Chinese.For example,machines are normally incapable of pinpointing the most suitable equivalent of uncle because the right choice among a number of possible Chinese counterparts(e.g.舅舅,叔叔,姑父,etc.)can only be determined by the relationship relative to other family members.
Another difficulty is related to the controversies arising from the issue of unit of meaning.Any cross-linguistic comparison presupposes that the compared items are in some sense similar or comparable.For the lexical contrastive study,the basis of the comparison is primarily semantic or functional(e.g.James 1980).The simplest way of dealing with equivalence is the model of bi/multilingual dictionary in which all the possible equivalents are listed corresponding to all possible meanings of a word.However,as we have argued previously,word-by-word bilingual equivalence is severely limited.
The limitation is noted by Sinclair when he deals with different models of language organization.Sinclair et al.observes that only a few open-class words in the large vocabulary of English“have detachable meaning”and“usually have little to do with the phraseology that surrounds them”(1996a:176).These items are at one end of the language spectrum,reflecting what Sinclair calls the‘terminological tendency’(2004:29),and a few closely associated notions,such as the open-choice principle(1991),‘sublanguage’(2004:151),the‘Academic approach’(ibid:156),and the‘extended term bank’(ibid:162).In scientific and technical language,for example,attempts are frequently made to keep the meaning of terms constant,unique,and unrepeatable so that they can be understood by all users of the language in all contexts.
Terminological tendency provides an appropriate model for a lexicon in which the meaning of each term is clear.However,there are clear drawbacks and risks in developing such a lexicon.For example,many of the grammatical or function words in the language have to be excluded from this analysis.It is often the case in which words make meanings by their combinations and language patterning appears to be in line with another tendency—‘phraseological tendency’(Sinclair 2004:29),and a few related concepts such as the idiom principle(Sinclair 1991),the‘Thespian approach’(Sinclair 2004:156),and the‘empty lexicon’(ibid:162).In fact,most open-class words exist somewhere between the two extremes and the fact that they tend to be polysemous is always the biggest obstacle for machine translation.If we assume that most words can be disambiguated by the syntactic structures and co-texts,then it can be equally well applied to translation which is seen by Sinclair et al.as“a kind of disambiguation,with the differentiation of meaning shown by the way a word is translated”(1996a:176).
Cross-linguistic applications of this corpus-driven approach exploring the possibility of identifying equivalent units have been tested in several European projects with the most remarkable exemplars being Sinclair et al.(1996a).The general procedure is as follows(see also Altenberg&Granger 2002:32):
(1)A number of pre-selected words are examined on the basis of concordances from monolingual corpora representing the compared languages.
(2)Recurrent contextual patterns specifying the different meanings of a given word are identified.
(3)A translation equivalent is defined for each meaning of the given word.
This project(Sinclair et al.1996a)set out in 1990 to produce a sample of a multilingual dictionary on the basis of evidence drawn from corpora in several European languages.It was based on a hypothesis which can be stated as follows:
[t]ranslation equivalences exist between two languages.There are likely to be parallels between the textual environment of a word in one language and a word that is used to translate it in another.The computer should be able to detect such equivalences,to identify the environment and thus to establish which occurrences of the word in the first language are possible candidates for translation by the same equivalent word in the second language.Considerable proportions of the central patterning of the language can be prepared for lexicography in this way.(Sinclair 1996b:179)
Based on the same principle,Tognini-Bonelli(2001,2002)postulates a linking element that stands between the source language and the target language.Thus the procedure of the identification of translation equivalence involves three steps.The first is to identify and classify the formal patterning of a given expression against the evidence of the source language and to identify a meaning/function pairing for each pattern.Step two posits a prima facie translation equivalent for each meaning/function on the basis of a translation corpus,reference books(such as dictionaries and grammars),or past experience on the part of the analyst.Step three analyzes the prima facie translation equivalent in terms of its formal features.
We can see that it is both a prerequisite and a natural consequence of this approach to make a comprehensive and systematic re-description of each compared language in light of the inadequacy of research in the past.Teubert for example states that“further improvement depends on re-analyzing the languages involved from scratch with the aid of multilingual corpora”(1996:238).
Teubert is right in pointing out the role of multilingual corpora in lexical contrastive study.In fact,in addition to the theoretical contribution made by what we might call‘the British contextualism’,another factor that has exerted the greatest influence upon the contrastive study of lexis is the development of multilingual corpora.Several types of multilingual corpora are distinguished and used in cross-linguistic research.However,it is unfortunate that the terminology is not entirely consistent,which sometimes leads to a great deal of confusion.A widely used typology can be seen in Granger(2003)and is shown in Figure 2.3.
Figure 2.3 Corpora in cross-linguistic research(adapted from Granger 2003)
As shown in Figure 2.3,there are two types of multilingual corpora.Translation corpora contain source texts and their translations and may be either unidirectional or bi/multidirectional.The term parallel corpus is used in this book to refer to aligned translation corpora where a unit in the original text is linked to the corresponding unit in the translation.Another type of multilingual corpora,comparable corpora,consist of original texts in each language,matched as far as possible in terms of text type,subject matter,etc.
There is another type of comparable corpora(monolingual corpora)which contain texts in one and the same language.They are useful for the investigation of the differences between,for example,Chinese original texts and Chinese texts as translations from texts of other languages,and between texts produced by Chinese native speakers and by Chinese learners.However,this type of comparable corpora is irrelevant to our current purposes,and we shall use the term monolingual corpus unambiguously to refer to the two corpora(one English corpus and one Chinese corpus both consisting of texts belonging to the academic register)that constitute the comparable corpora.
The advantages and disadvantages of these types of corpora have been extensively discussed(e.g.Aijmer,Alternberg&Johansson 1996;Altenberg&Granger 2002;Teubert 1996;Tognini-Bonelli 2001,2002).Table 2.3 compares the two types of corpora.
Table 2.3 A comparison of two types of multilingual corpora
The role of intuition must not be underestimated as it was heavily relied upon in a number of studies where well-constructed translation corpora have not been available by that time.However,it should be emphasized that the reliance on intuition is only a stopgap measure in the face of a lack of electronic sources.We believe that bilingual concordances generated from the parallel corpus are much more reliable when they yield a wealth of information on the most frequent items and their equivalents in the other language.