A keyword corpus to go: exploring the potential of WebBootCat
As cobwebs form on our dictionaries and encyclopedias and trips to the local library become a distant memory, translators and editors have been learning to mine the vast amounts of information available on the Internet, with its instant access to texts about the subject we are working on. A simple Google search may give us the answer we need in many cases, but it holds pitfalls. Can we trust the number of hits a search brings up and how confident can we be of a decision based on the results shown on the first page (two pages?)? And Internet search algorithms change constantly. Do we need to keep learning ever more sophisticated search strategies?
Another (re)search strategy that is slowly but surely gaining ground is corpus mining. Text corpora—collections of texts that can be analysed with concordancing software—help us to study patterns in the specialist language we need to learn about fast so we can do our jobs. MET has shared highly specialised corpora in past workshops and online corpora are also available. However, as we move from field to field we often feel the need for a set of even more highly job-specific texts to help guide terminology and phrasing choices. And we need such a corpus fast. In an age where time is money, we want a healthy balance between time spent researching and time spent doing billable work.
WebBootCat is an online algorithm-based tool that can select a body of subject-specific texts almost instantly (in about 2 minutes) and deliver it in a format that can be analysed in a concordancer. Its usefulness depends on numerous factors, including the keywords we feed into the tool, our choice of settings, and a basic understanding of its output. In this hands-on workshop, we will learn to use WebBootCat efficiently and explore its potential.
To learn the basic features of WebBootCat; create, download, and use a range of text (.txt) corpora; explore how to choose keyword combinations that work; create additional corpora in the same subject area with just a few extra clicks; learn to limit sampling to certain URLs; and discuss the potential usefulness and limitations of working with a keyword corpus.*
Developer and facilitator:
We will start with a brief description of the tool and a discussion of what a keyword corpus is and how it differs from other corpus types. We will then start using the tool—the main focus of this workshop—and close with a discussion of lessons learned and possible models for best practices in relation to WebBootCat and keyword corpora.
Who should attend:
Generalist or specialist translators and editors, as well as writers, researchers, and educators curious to learn how a tool like WebBootCat might be of use in their work. Some familiarity with corpus mining would be useful but is by no means necessary. If you’re new to corpus mining, you can prepare yourself by taking Mary Ellen Kerans’s “Corpus-guided decision-making for translators and editors” and/or reading the texts Mary Ellen recommends for her workshop (see reading list below).
Participants will learn how to create a customised keyword corpus and to use it critically.
About the facilitator: Anne Murray
is a freelance medical translator and editor based in a little village surrounded by vineyards in Tarragona, Spain. She has a degree in translation and a foundation certificate in medical writing from the European Medical Writers Association. She has served on MET’s council since 2006 and is currently its Vice Chair.
My sincerest thanks to Ailish Maher for introducing me to this simple yet sometimes lifesaving tool many years ago.
*As the development of this workshop is a work in progress, there may be variations in both its specific purpose and structure as I research the tool in greater depth to answer many questions I myself and others have posed in recent months.