Workshop: Corpus part II

Corpus-guided editing and translation, part 2
Building specialised corpora to guide translation and editing:
concepts and tools

As noted in MET’s introductory corpus workshop, many communication facilitators work in non-Anglophone settings or intensively with English texts from non-native speakers of the language in specialist knowledge fields. We require efficient ways to research or update our understanding of language use. Although the WWW can be a valuable source of field knowledge, undisciplined/uninformed language research can lead to register violations, patchy style, real error, or simply translationese.

This follow-up workshop shows how to build a specialist, genre-criterion-based corpus—with genre defined as a text type used by a particular discourse community for sharing knowledge either among its members or with outside readers or listeners. Genres might belong to academic knowledge fields. Or they might belong to companies or institutions such as those created by governments, foundations or social movements.

We discuss how to quickly assemble criteria for planning a corpus of model texts and then gather the material and clean it of artifacts if necessary. We discuss the concept of “quick corpus” and frankly analyze when further effort to clean and log a more stable corpus might be warranted. We address the issues of size, including when and how to further check a “small-corpus hypothesis” with surrogate corpora (through specialist search engines) or on the WWW. Finally, we use hands-on tasks to compare corpus analysis with an inexpensive desktop indexer (which requires no pre-processing of model text collections) to “traditional” applied-linguistics style analysis with “concordancing software” (such as the freeware AntConc we recommend).

Ailish Maher
Mary Ellen Kerans
Stephen Waller

Purpose: To establish the concept of using genre-based criteria when gathering a corpus. To show how and where texts can be obtained in different fields. To recommend simple storing, labeling and (if necessary) logging and cleaning practices for corpus input.

Description: The presentation of essential premises will be handled through interactive discussion—and you’ll soon be working with the tools through practical examples on your own computer.


  • Brief discussion of the notion of genre as used in applied linguistics—light theory bolstered by examples
  • Demonstration of the differences between quick corpora and cleaned corpora, and of when and how to verify hunches based on small corpora—with tasks
  • Planning the corpus and gathering model texts. Storing. Possibly logging and cleaning
  • Using an indexer—as opposed to or in addition to a concordancer (of the type introduced in MET’s previous corpus workshop: Mining target-language corpora to guide English editing and translation: an introduction to a problem-solving approach)

Who should attend? Anyone who has taken the previous corpus workshop. Anyone familiar with the use of corpora who has not yet perfected corpus-building practices for application to specialised translation or editing. Instructors of English for special or academic purposes can also use the skills presented in this and the previous workshop.

Outcome skills: On the basis of practical examples, hands-on experience and discussion, participants will:

  • Understand the issues involved in building a corpus based on genre criteria
  • Know the steps to take to build a corpus
  • See that the key to this approach is the quality of the corpus—that even an indexer can be used to analyse the right corpus of texts in any format collected as models for translation or editing

Pre-meeting information:

See an example (from the field of business) that shows how an indexer (or desktop search program) can be used to research unfamiliar terminology.

In contrast, see an example of how a “concordancer” displays corpus results, helping a translator resolve a doubt. (Use of “concordancing” was covered in part 1 of this workshop series.)

The pre-meeting information from part 1 of this workshop series will lead you to more examples and further information about the corpus-guided approach.

Further reading

Our 2008 Journal of Specialised Translation article on the corpus approach is available online at That article has links to further practical reading suggestions. Kevin Lossner, a technical translator based in Germany, posted a review of the article and makes further suggestions at


I forgot my password