Thread: Regular Expression Coding [Advanced]

View Single Post

  #1 (permalink)  

Old 03-11-2005, 07:48 PM

Regular Expression Coding [Advanced]

Regular Expressions are present throughout many, many different languages, and can sometimes be a hassle to deal with.

Firstly, just what are Regular Expressions good for?
Well, Regular Expressions can be used for many different things, namely:
  1. Searching for string in a text
  2. validating a string
  3. pulling out text from string
  4. more..

Regular expressions can be used to validate VALID email addresses, VALID phone numbers, VALID website urls; or they can be used to search for $STRING inside of $TEXT; or they can be used for pulling out $STRING from $TEXT.

Firstly, there are for PHP, 4 types of operations which use regular expressions:
  1. ereg : performs regular expression operation
  2. eregi : performs case-insensitive regular expression operation
  3. preg_match : performs perl-compatible regular expression operation
  4. preg_replace : performs perl-compatible regular expression REPLACE operation
  5. ereg_replace : performs regular expression REPLACE operation
  6. eregi_replace : same as above; case-insensitive
One of these operations are required to perform regular expression operations.
Functions: ereg, eregi, preg_match follow the same format:
Code:
function($stringNeedle, $stringStak, $stringArray)
$stringNeedle is the regular expression (what you are looking for).
$stringStack is the overall text being looked for.
$stringArray is the array registers (will be explained later).

Functions: ereg_replace, eregi_replace,preg_replace use the following format:
Code:
function($stringFormat, $stringReplace, $stringString)
In which:
$stringFormat is the regular expression (what you are looking for).
$stringReplace is the text to replace.
$stringText is the body of text which is to be replaced.

In the _replace functions, you look for a certain text ($stringFormat), within a body of text ($stringText), and replace what you find with a text ($stringReplace).

Now to the good stuff:
There are different types of regular expressions to use:
First, good coding format dictates that a CLAUSE be inside parentheses ( ). Specific searches for text or numerics, or certain types of charactors are to be placed in brackets [ ]. Some regular expressions are case-sensitive, so good form dictates to use case-sensitive searches ...always!

Learn by example:
1. numeric - string: ([0-9])
2. lowercase alpha - string: ([a-z])
3. uppercase alpha - string: ([A-Z])
4. special charactors: ([/,._-!@#$%^&*])

In the four cases, it is also accepted to combine certain types into one specific clause.

1. lowercase alpha & uppercase alpha strings: ([a-zA-Z])
2. alpha(lowercase) & numeric strings : ([a-z0-9])
3. alpha(uppercase) & numeric strings : ([A-Z0-9])
4. alpha (both) & numeric strings : ([a-zA-Z0-9])
5. numerc & special strings : ([0-9/,._-!@#$%^&*])
6. Everything above: ([a-zA-Z0-9/,._-!@#$%^&*])

If you know the specific number of charactors in a text, it is optional to place that number in { } brackets at the end of the [ ] brackets.

Case 1:
7 - charactor string (chicago).
For the full string try the following: ([a-zA-Z]{7})
In using that, the following variations of chicago should be found:

ChIcAgO
CHicAGo
CHICAGO
chicago
ChICagO
...Anything

Case 2:
Selecting a string up to a certain point is used like so:
Code:
([REGEX]{a,b})
REGEX, being the regular expression used, and a,b being the coordinates of the string. For selecting from string CHICAGOthe first 3 letters, we use the coordinates 0,3; hence:
Code:
([a-zA-Z]{0,3})
Would render: chi.

Case 3:
If the number of strings is not known, but want to select a whole line, or a whole text, use the indicator + to repeat the regular expression multiple times.

So using the PHP, lets try this:
PHP Code:
#First, define the $string as a sentence
#Then, try to pull out the whole line
#using regular expressions
$string1 "Chicago is known as the windy city";
$string2 "Chicago";
$string3 "ChIcAgO";
$string4 "ChIcAgO iS kNoWn As ThE wInDy CiTy";

#try to pull out Chicago in $string2:
ereg("([a-zA-Z])"$string2); #will render: C

#try to pull out Chicago in $string2 using "+"
ereg("([a-zA-Z]+)" $string2); #will render: Chicago

#try to pull out Chicago in $string2 using the known charactor amount (7)
ereg("([a-zA-Z]{7})" $string2); #will render: Chicago

#try to pull out Chi in $string2:
ereg("([a-zA-Z]{0,3})"$string2); #will render: Chi

#try to pull out Chicago in $string1 using "+"
ereg("([a-zA-Z]+)" $string1); #will render:Chicago is known as the windy city
#this is because the "+" indicator will continue to search the string
#as long as the string meets the requirements of the regular expression,
#it will never stop. Hence, it grabs the full line

#now, try to pull out Chicago with the knowledge that it is at the very
#beginning of the string, and goes up to 7 charactors:
ereg("([a-zA-Z]{0,7})"$string1); #will render: Chicago 
Additionaly, actual TEXT can be used inside a regular expression, however they are static, meaning they never change. This may decrease the chance of you performing sucessful regular expressions.

Instead of using the regular expression, ([a-zA-Z0-9/,._-!@#$%^&*]+) to find a WHOLE string of any type of charactor, the regex : ((.+)) may be used. The . (period) is any kind of charactor, and the + repeats the cycle.

So in our example:
PHP Code:
$string1 "Chicago is known as the windy city";

#try to pull out Chicago in $string1 using "+"
ereg("((.+))" $string1); #will render:Chicago is known as the windy city 
Or, as you've just learned, you can use static text. This can be usefull if you know what the text returned is going to look like. Hence, you can search for a specific text within the string like so:
PHP Code:
$string1 "Chicago is known as the windy city";

#try to pull out Chicago in $string1 using "+"
ereg("is ((.+)) as" $string1); #will render: known
#this is because it searches the whole string for "is $x as", where $x
#can be absolutely anything, and it turns out that the $x
#is equal to "known" 
Why REGEX is useful:
Because if the regex doesn't work, it returns BOOL FALSE, you can use it in IF - ELSE clauses. Therefore, it can be usefull in validation functions:
PHP Code:
function validateEmail$strEmail )
{
  if( 
ereg("((.+))@([a-zA-Z]+).([a-zA-Z0-9]{3})"$strEmail) )
  {
    return 
true;
  }
    else {
     return 
false;
    }
}
#####
validateEmail"emailtest.com" #=false
validateEmail"email@test.f" #=false
validateEmail"email.com@test" #=false
validateEmail"email@test.com" #=true 
Using Array Registers: Pulling $var from $text
Well, so far we've discussed the usefullness of REGEX, and the proper coding technique. Now comes the fun stuff.

In the ereg() functions displayed above, we've always used 2 parameters; $stringNeedle, $stringStack. But we know that there is 3; the third being $stringArray. If there are multiple (or even 1) regular expressions in the same ereg() function, each value that each clause represent are equal to $stringStack[$i] respectively; $i incrementing by 1 from the beginning of the regex expression.

Wow.

Well, this simply means:
PHP Code:
#in this kind of regular expression,
#we will try to get the DOMAIN type
#from the email address (.com, .net, .org, etc)
$stringEmail "testemail@test.com";
ereg("((.+))@((.+)).([a-zA-Z0-9]{3})"$stringEmail$Emails);
#now, you can see there are 3 clauses in the regex.
#1 is any text before the @
#2 is the text after @ and before the .
#3 is the domain name
#so we can set up some kind of simple variables:
$emailname $Emails[1]; #first regex clause
$domainfirst$Emails[2]; #second regex clause
$domainlast $Emails[3]; #third regex clause
/*
So, if the $stringEmail is valid and returns BOOL TRUE by the ereg() functions, the array register values ($Emails) are equal to:
If $i = 1 and goes to $NUMBER_OF_REGEX_CLAUSES by 1, $i should =:
1, 2, 3
So, starting at 1, going to however many number of regex clauses there are,
$arrayregister[$i] is equal to the $i'th clause.

Basically what we defined above:
$emailname = $Emails[1]; #first regex clause
$domainfirst= $Emails[2]; #second regex clause
$domainlast = $Emails[3]; #third regex clause
*/

echo "Email Name:" $emailname " domain first name: " $domainfirst " domain type: " $domain last " domain name: " $domainfirst $domainlast;

#outputs:
/*Email Name: testemail domain first name: test domain type: .com domain name: test.com
*/ 
As time progresses, I will elaborate further on the _replace functions, but bear in mind that they function nearly the same as str_replace().
[ reply ]
borednerd

borednerd is offline registered.

borednerd's Avatar

Join Date: Feb 2005

Posts: 31