Morphological & Morphosyntactic Analysis

Morphological Analysis

Normally, a ‘search tool’ such as a concordance program may not strike us as being particularly useful when it comes to conducting morphological analyses. However, since often what we want to investigate in area of morphology either deals with something that occurs at the beginning (i.e. prefixes) of words or at the end (i.e. suffixes), we can certainly make use of the features of concordance programs in order to select appropriate examples.

In order to select specific prefixes, we can either limit our searches by specifying particular patterns via regular expressions or starting out with alphabetically sorted frequency lists which will the allow us to run concordances on particular words. A useful starting pattern for finding negative prefixes may thus be ‘\b(in|im|un)[\-a-z]+\b’.

Conversely, when investigating prefixes, such as {-ing}, we could start out with a search pattern like ‘\b[\-a-z]+(ing)\b’ and then sort the search results. However, this would depend on us actually knowing which suffix(es) we want to investigate, but what if we wanted to be ‘inspired’ by our data and build up a list of potential suffixes? Well, in this case we can make use of another feature many concordancers offer, that of creating reverse sorted word lists. If you want to do this in AntConc, all you need to do is to generate an ordinary word list and then re-sort it according to the ‘Sort by Word End’ option. Note: although you should normally be able to create a reverse sorted list from scratch, the program seems to hang if you do this, so it’s best to resort ordinary frequency lists.

Morphosyntactic Analysis (Tagging)

A further highly useful feature that can be exploited in concordances is the search for particular part-of-speech (PoS) tags. Searching for these obviously assumes that we have PoS tags associated with each word in our corpus data. We have already seen some examples of how to exploit this feature when we ran some lemma queries in BNCweb, where we were able to select a PoS category from the dropdown list. If we want to use this feature in standard queries, we simply need to type in the word, followed by an equals sign (=) and the name of the category, e.g ‘take=NN1’ in order to search for only occurrences where the word form take is tagged as a singular common noun.

If you want to work with other pre-compiled corpora, you can try to obtain a PoS tagged version of the appropriate corpus. If, on the other hand, you want to use your own data, you’ll have to find a way to have it tagged. One way of doing this is to use the Stuttgart Tree Tagger, which you can download and use free of charge and which is also installed in the computer lab. Normally, TreeTagger only produces vertical output, i.e. each word of the text appears on an individual line and is separated from its tag by a tab, but I have modified our version to produce a more suitable horizontal format, where each word is joined to its tag by an underscore and line breaks from the original file are preserved. In order to use it, you simply open a command prompt and change to the TreeTagger folder. To tag a text, you simply run the batch file ‘tag-english(.bat)’, providing the name of an input and an output file, e.g. ‘tag-english test.txt test_tagged.txt’.