1.8. Using a segmentation to filter anotherΒΆ

In some cases, the number of forms to be selectively included in or excluded from a segmentation is too large for using the Select widget. A typical example is the removal of “stopwords” from a text: in English for instance, although the list of such words is finite, it is too long to try to encode it by means of a regex (cf. an example of such a list).

The purpose of widget Intersect is precisely to solve that kind of problem. It takes two segmentations in input and lets the user include in or exclude from the first (source) segmentation those segments whose content is the same as that of a segment in the second (filter) segmentation. The widget’s basic interface is shown on figure 1 below).

Interface of widget Intersect configured for stopword removal

Figure 1: Interface of widget Intersect configured for stopword removal.

Similarly to widget Select, user must choose between modes Include and Exclude. The next step is to specify which incoming segmentation plays the role of the Source segmentation and the Filter segmentation. (Here again, we will ignore the Annotation key option for the time being.)

In order to try out the widget, set up a scheme similar to the one shown on figure 2 below). The first instance of Text Field contains the text to process (for instance the Universal Declaration of Human Rights), while the second instance, Text Field (1), contains the list of English stopwords mentioned above. Both instances of Segment produce a word segmentation with regex \w+; the only difference in their configuration is the output segmentation label , i.e. words for Segment and stopwords for Segment (1). Finally, the instance of Intersect is configured as shown on figure 1 above.

Scheme illustrating the use of the Intersect widget for stopword removal

Figure 2: Example scheme for removing stopword using widget Intersect .

The content of the first segments of the resulting segmentation is:

PREAMBLE Whereas recognition inherent dignity equal inalienable rights members human family foundation freedom justice peace world ...

Previous topic

1.7. Partitioning segmentations

Next topic

2. Tables

This Page