Note: this is a copy of the post I wrote for CELFS at the University of Bristol, updated and edited to make the information available for my students in China.
My research involved a very useful tool for teaching and learning academic vocabulary. In this blog post I will provide a ‘walkthrough’ example of how this tool can be used by both English for Academic Purposes (EAP) teachers and, more importantly, students (this version assumes quite advanced students).
Demonstration
For the purpose of this demonstrative ‘walkthrough’, I have chosen to offer an answer to a question raised in training sessions on the pre-sessional course at the University of Bristol in 2016. This is a question that often gets raised in discussions between EAP tutors – whether we can use first person pronouns in academic writing.
For clarity I have used italics for lexical items explored through the corpus tools and bold to signify words with operational functions on the Sketch 文 Engine website.
Conducting a Simple query
First navigate to the Sketch 文 Engine and select BAWE from the list of options (Figure 1).
Then select concordance from the list of options (Figure 2)
Next, simply enter the word or words that you want to investigate and click Search (Figure 3).
After entering the word we and running this search, students who have been told that academic writing does not use we are in for a bit of a shock. The simple query search includes instances of us as well as we and combined they occur 15,718 times (or 1,885.50 per million words) in the corpus (Figure 4). These words are in fact used at a much greater frequency than some words EAP teachers often actively encourage students to use (try searching furthermore and you’ll find it occurs 1,319 times or 158.22 per million words, and in conclusion occurs 428 times or 51.34 per million words). This indicates quite clearly that the instruction not to use first person pronouns is overly simplistic.
KWIC and Concordancing
Having discovered that we is frequently used in successful academic assignments, students could usefully explore how we and us are used in context. To do this, look at the key word in context (KWIC) which appears in red embedded in the listed concordance lines. These concordance lines can be studied and patterns of usage observed (Figure 5).
Note that by clicking on the blue words to the left of the concordance lines list, a pop up box appears that gives you more information about the text (Figure 6), which is how I discovered that the only instances of pay attention on in the corpus were all written by Chinese students so can be safely interpreted as a Chinglish grammar error. You can also see a bigger sample of the writing in the pop up box by clicking on the red key word in the middle of the concordance lines.
To start a new search, click on this icon on the menu at the left of the screen:
The material below needs updating but please do go ahead and explore the functions the software has to offer 🙂
Returning to the case of we, for some direction on interpreting the functions of we, I recommend Tang and John (1999) who categorise ‘the writer identity in student academic writing through the first person pronoun’. They identify 6 different functions, including positioning the writer as: representative, guide through the essay, architect of the essay, recounter of the research process, opinion-holder and originator (with representative and guide being the most frequent). If you are a student who is now puzzled why your respected English teacher told you not to use we in an academic essay, try a new search using this string of words that is frequently used in Chinese student writing: as we all know. The result will show you that there are ways of using we that are not acceptable in academic writing. So, if you cannot understand the difference between as we all know and the examples of we in the concordance lines generated by the first search, it would be advisable to follow your teacher’s advice and avoid writing we.
To search for a specific word form (e.g. exclude us) click on query type – word and enter your search item in the word form field (Figure 5). This returns 13,222 instances of we (1,586.08 per million). So we is clearly a lot more commonly used than us.
It may also be useful to know that you can control the word form of a search item to some extent just by being aware of what you type into the simple query. For example, a search of maintains will give you only instances of maintains, whereas searching maintain will return instances of multiple word forms (maintains, maintained, maintaining). You can also use * to indicate missing letters which allows you to enter the basic stem of a given word and broaden your search for different word forms (try it by running a search for analyse and another for analys*). The * can also be used to indicate a missing word so, for example, you could check what adverbs might be appropriate to put between is and argued by running a search for is * argued (compare the results with a search for is argued).
Collocations Tool
To further analyse the use of the word we, after running the simple query search, use the options down the left hand side of the Sketch 文 Engine screen. A useful tool is Collocations (near the bottom of the options) and if the default settings are used (just click Make candidate list when some complex looking options appear), it is very clear that we strongly collocates with can and see (Figure 6), which suggests the collocation we can see is frequently used (this could be tested by entering we can see into the simple query and running another search as in Figure 7).
Filters
The texts that comprise the corpus can be filtered in various ways. For example, listed under Frequency (in the list of options on the left-hand side of the screen), the Text types option will give you graphical data of the breakdown of usage across different types of discipline, text genre and author so you can, for example, discover that:
- we is used more often by 1st and 2nd year undergraduates than 3rd years
- we is used most often in Philosophy and Mathematics, but very rarely in Planning
- we is used most often in the ‘Methodology recount + Narrative recount’ genres
- we is used in greater relative frequency by L1 Welsh and Mongolian speakers
This provides plenty of data about different specific contexts in which we is used and by whom in academic writing but is probably of limited interest to most students.
Any of the filters can be used to narrow the range of the corpus in simple query searches (scroll down and below the simple query input field you will see lists of check box options). Again, this is probably of limited interest to most students but it does allow you to discover information such as we occurs 623 times (or 74.73 per million words) in Social Sciences – Economics, and occurs 98 times (11.76 per million words) in Physical Sciences – Chemistry. When comparing the statistics between sub-corpora like this, it is important to use the per million words figure rather than the number of instances so as to take account for the fact that employing different filters could create sub-corpora of significantly different sizes (convert per million into per thousand or a percentage if it makes it easier to conceptualise).
A Caveat
Whilst BAWE provides a very useful reference corpus for exploring the use of lexis in successful academic student writing, it is important to remember that it is still non-expert writing and may contain grammar, spelling and other language errors. It cannot be assumed that the assignments are models of excellent writing, only that they are of sufficiently good quality to have been deemed successful. Furthermore, BAWE is clearly not exhaustive so if a particular search item does not return many hits, this does not necessarily mean that it is not academic. There could be other explanations such as a lack of papers on a particular topic area or insufficient instances of a particular word to reveal a comprehensive list of collocates. Nevertheless, if a search item turns up few or no instances and no other reason is identifiable, it certainly does suggest that it might be sensible to avoid using it.
Taking it further
There ends my demonstration of how the BAWE reference corpus can be used by EAP teachers and students. Given the high numbers of Chinese students who currently make up the student body on EAP pre-sessionals, teachers and students might find it helpful to run searches on common clichés like every coin has two sides, what’s more and with the development of the society/technology/economy. I have developed an editable web page for this at: http://mushroom-scholars.org/learning-community/docs/exploring-academic-vocabulary-2/ or for those who can access Google Docs.
For teachers and students who are really interested in exploring corpora further, whole texts can be compared to each other using tools available on Compleat Lexical Tutor. This website also allows you to conduct Key Word Analyses against a selection of reference corpora. There is also, of course, the mighty Google Book Corpus but this isn’t readily accessible in China.
Additional Useful Resources
Sinclair (2004) Corpus and Text — Basic Principles, in Wynn (Ed.) Developing Linguistic Corpora: a Guide to Good Practice
Tom Cobb’s Compleat Lexical Tutor
Mark Davies’ (formally) Word and Phrase (requires registration)
References
Nesi, H. & Gardner, S. (2012) Genres across the disciplines student writing in higher education, Cambridge University Press.
Robb, T. (2003) ‘Google as a Quick ‘n Dirty Corpus Tool’, The Electronic Journal for English as a Second Language, 7:2, http://www.tesl-ej.org/wordpress/issues/volume7/ej26/ej26int/
Tang, R. and John, S. (1999) ‘The ‘I’ in identity: Exploring writer identity in student academic writing through the first person pronoun’, English for Specific Purposes 18, pp.S23-S39.
[…] For help using the SketchEngine tool, please refer to my ‘walkthrough’. […]