«Text processing»

The «Text processing» module offers several methods of processing the text/ Some tasks require knowledge of how much the text is different between the standard and the specified text. These are tasks related to computer linguistics and artificial intelligence.

Module interface

The module window consists of the part of compiling a command, command control buttons and a list of commands in the form of a table. The part of creating a command consists of 4 main fields: «Action» - drop-down list with methods for text processing, 2 fields «Line 1/2», are intended to enter two lines of text, or variables with text, and the last field is «Variable» - for the name of the variable in which the result of the module will be put.

![Screenshot](img/TA_1.png)

By clicking the «Add» button the created command will be put into the «Command list» section. To edit a command you need to select it from the list, make all desired changes and then click the «Edit» button. To delete a command, select it from the list and click the «Delete» button. You can change the position of commands by clicking the arrow keys, as in Excel module (part 2).. ## Text processing methods

The following text processing methods are offered in this module:

1. Levenshtein distance – calculates the difference between two lines. For example – «Lexema RPA» and «Lexema SR» differ by 3 characters – the words «Lexema» stay the same, and the rest of the characters are different, so the result that will be put into the variable is 3;

2. 3-граммы – - a method based on working with n-grams, in our case n = 3 - the similarity of every 3 characters is estimated. The larger the number (up to 1), the greater the similarity of the lines. In the «Lexema RPA» and «Lexema SR» example the result is gonna be the 0,52 number.

3. Jaro winkler similarity is a measure of line similarity for measuring the distance between two sequences of characters. The smaller the Jaro-Winkler distance for two lines, the more similar these lines are. For the «Lexema RPA» and «Lexema SR» example the result is 0,5.