Information Engineering

Using PERL to generate test data

PERL is a programming language optimised for manipulating data. The PERL acronym stands for Practical Extraction and Report Language, which provides a clue to its strengths. This can be very useful for performance testers, because we all know that extracting valid test data, in the correct format, is one of tha biggest hurdles in the race to test completion.

At the heart of the language’s power is the “regex” (Regular Expression). Using regexes we can define a data format template to be applied to a list of values for pattern matching and extraction of sub-strings. PERL syntax implements regexes between slashes using tokens and modifiers, of which the following are examples:

  • /.*/ matches zero or more of any character except a newline
  • /\d+-\d+/ matches two blocks of numeric digits separated by a dash

To unpack this, the . in the first example is a token meaning “any character except a newline” and \\d in the second is a token meaning “any decimal digit” (careful because \\D means “any non-digit”, which is different from any alphabetic character!). The * modifies its preceding value to capture “zero or more” and the + modifies it as “one or more”.

Here’s a brief example program (the unenclosed # character means that the remainder of the line is a comment):

use strict;             # strict syntax checking
for (<>) {              # for every line in the input stream
   chomp;               # trim trailing whitespace (including CRLF)
   if (/(\d+-\d+)/) {   # match two blocks of one or more digits separated by a dash "-", saving all between "()"as $1
      print "$1\n";     # output the saved item ($1) followed by a newline
   }
}

Using the following input stream:

  1. garbage1324-255more garbage
  2. garbage 1324-255 more garbage
  3. garbage 1324- 255 more garbage
  4. garbage 1324-a255 more garbage

The first and second lines would match and both print 1324-255 (surrounding stuff is ignored), but both of the others would not, as there are other characters within the pattern, a space on line 3 and the letter “a” on line 4.

Have a play. Download and install PERL and the optional Komodo Edit (you can use any text editor) from http://activestate.com.