Regular Expression Coding [Advanced]
Regular Expressions are present throughout many, many different languages, and can sometimes be a hassle to deal with.
Firstly, just what are Regular Expressions good for?
Well, Regular Expressions can be used for many different things, namely:
Regular expressions can be used to validate VALID email addresses, VALID phone numbers, VALID website urls; or they can be used to search for $STRING inside of $TEXT; or they can be used for pulling out $STRING from $TEXT.
Firstly, there are for PHP, 4 types of operations which use regular expressions:
Functions: ereg, eregi, preg_match follow the same format:
$stringNeedle is the regular expression (what you are looking for).
$stringStack is the overall text being looked for.
$stringArray is the array registers (will be explained later).
Functions: ereg_replace, eregi_replace,preg_replace use the following format:
In which:
$stringFormat is the regular expression (what you are looking for).
$stringReplace is the text to replace.
$stringText is the body of text which is to be replaced.
In the _replace functions, you look for a certain text ($stringFormat), within a body of text ($stringText), and replace what you find with a text ($stringReplace).
Now to the good stuff:
There are different types of regular expressions to use:
First, good coding format dictates that a CLAUSE be inside parentheses ( ). Specific searches for text or numerics, or certain types of charactors are to be placed in brackets [ ]. Some regular expressions are case-sensitive, so good form dictates to use case-sensitive searches ...always!
Learn by example:
1. numeric - string: ([0-9])
2. lowercase alpha - string: ([a-z])
3. uppercase alpha - string: ([A-Z])
4. special charactors: ([/,._-!@#$%^&*])
In the four cases, it is also accepted to combine certain types into one specific clause.
1. lowercase alpha & uppercase alpha strings: ([a-zA-Z])
2. alpha(lowercase) & numeric strings : ([a-z0-9])
3. alpha(uppercase) & numeric strings : ([A-Z0-9])
4. alpha (both) & numeric strings : ([a-zA-Z0-9])
5. numerc & special strings : ([0-9/,._-!@#$%^&*])
6. Everything above: ([a-zA-Z0-9/,._-!@#$%^&*])
If you know the specific number of charactors in a text, it is optional to place that number in { } brackets at the end of the [ ] brackets.
Case 1:
7 - charactor string (chicago).
For the full string try the following: ([a-zA-Z]{7})
In using that, the following variations of chicago should be found:
Case 2:
Selecting a string up to a certain point is used like so:
REGEX, being the regular expression used, and a,b being the coordinates of the string. For selecting from string CHICAGOthe first 3 letters, we use the coordinates 0,3; hence:
Would render: chi.
Case 3:
If the number of strings is not known, but want to select a whole line, or a whole text, use the indicator + to repeat the regular expression multiple times.
So using the PHP, lets try this:
Additionaly, actual TEXT can be used inside a regular expression, however they are static, meaning they never change. This may decrease the chance of you performing sucessful regular expressions.
Instead of using the regular expression, ([a-zA-Z0-9/,._-!@#$%^&*]+) to find a WHOLE string of any type of charactor, the regex : ((.+)) may be used. The . (period) is any kind of charactor, and the + repeats the cycle.
So in our example:
Or, as you've just learned, you can use static text. This can be usefull if you know what the text returned is going to look like. Hence, you can search for a specific text within the string like so:
Why REGEX is useful:
Because if the regex doesn't work, it returns BOOL FALSE, you can use it in IF - ELSE clauses. Therefore, it can be usefull in validation functions:
Using Array Registers: Pulling $var from $text
Well, so far we've discussed the usefullness of REGEX, and the proper coding technique. Now comes the fun stuff.
In the ereg() functions displayed above, we've always used 2 parameters; $stringNeedle, $stringStack. But we know that there is 3; the third being $stringArray. If there are multiple (or even 1) regular expressions in the same ereg() function, each value that each clause represent are equal to $stringStack[$i] respectively; $i incrementing by 1 from the beginning of the regex expression.
Wow.
Well, this simply means:
As time progresses, I will elaborate further on the _replace functions, but bear in mind that they function nearly the same as str_replace().
Firstly, just what are Regular Expressions good for?
Well, Regular Expressions can be used for many different things, namely:
- Searching for string in a text
- validating a string
- pulling out text from string
- more..
Regular expressions can be used to validate VALID email addresses, VALID phone numbers, VALID website urls; or they can be used to search for $STRING inside of $TEXT; or they can be used for pulling out $STRING from $TEXT.
Firstly, there are for PHP, 4 types of operations which use regular expressions:
- ereg : performs regular expression operation
- eregi : performs case-insensitive regular expression operation
- preg_match : performs perl-compatible regular expression operation
- preg_replace : performs perl-compatible regular expression REPLACE operation
- ereg_replace : performs regular expression REPLACE operation
- eregi_replace : same as above; case-insensitive
Functions: ereg, eregi, preg_match follow the same format:
Code:
function($stringNeedle, $stringStak, $stringArray)
$stringStack is the overall text being looked for.
$stringArray is the array registers (will be explained later).
Functions: ereg_replace, eregi_replace,preg_replace use the following format:
Code:
function($stringFormat, $stringReplace, $stringString)
$stringFormat is the regular expression (what you are looking for).
$stringReplace is the text to replace.
$stringText is the body of text which is to be replaced.
In the _replace functions, you look for a certain text ($stringFormat), within a body of text ($stringText), and replace what you find with a text ($stringReplace).
Now to the good stuff:
There are different types of regular expressions to use:
First, good coding format dictates that a CLAUSE be inside parentheses ( ). Specific searches for text or numerics, or certain types of charactors are to be placed in brackets [ ]. Some regular expressions are case-sensitive, so good form dictates to use case-sensitive searches ...always!
Learn by example:
1. numeric - string: ([0-9])
2. lowercase alpha - string: ([a-z])
3. uppercase alpha - string: ([A-Z])
4. special charactors: ([/,._-!@#$%^&*])
In the four cases, it is also accepted to combine certain types into one specific clause.
1. lowercase alpha & uppercase alpha strings: ([a-zA-Z])
2. alpha(lowercase) & numeric strings : ([a-z0-9])
3. alpha(uppercase) & numeric strings : ([A-Z0-9])
4. alpha (both) & numeric strings : ([a-zA-Z0-9])
5. numerc & special strings : ([0-9/,._-!@#$%^&*])
6. Everything above: ([a-zA-Z0-9/,._-!@#$%^&*])
If you know the specific number of charactors in a text, it is optional to place that number in { } brackets at the end of the [ ] brackets.
Case 1:
7 - charactor string (chicago).
For the full string try the following: ([a-zA-Z]{7})
In using that, the following variations of chicago should be found:
...Anything
ChIcAgO
CHicAGo
CHICAGO
chicago
ChICagO
Case 2:
Selecting a string up to a certain point is used like so:
Code:
([REGEX]{a,b}) Code:
([a-zA-Z]{0,3}) Case 3:
If the number of strings is not known, but want to select a whole line, or a whole text, use the indicator + to repeat the regular expression multiple times.
So using the PHP, lets try this:
PHP Code:
#First, define the $string as a sentence
#Then, try to pull out the whole line
#using regular expressions
$string1 = "Chicago is known as the windy city";
$string2 = "Chicago";
$string3 = "ChIcAgO";
$string4 = "ChIcAgO iS kNoWn As ThE wInDy CiTy";
#try to pull out Chicago in $string2:
ereg("([a-zA-Z])", $string2); #will render: C
#try to pull out Chicago in $string2 using "+"
ereg("([a-zA-Z]+)" $string2); #will render: Chicago
#try to pull out Chicago in $string2 using the known charactor amount (7)
ereg("([a-zA-Z]{7})" $string2); #will render: Chicago
#try to pull out Chi in $string2:
ereg("([a-zA-Z]{0,3})", $string2); #will render: Chi
#try to pull out Chicago in $string1 using "+"
ereg("([a-zA-Z]+)" $string1); #will render:Chicago is known as the windy city
#this is because the "+" indicator will continue to search the string
#as long as the string meets the requirements of the regular expression,
#it will never stop. Hence, it grabs the full line
#now, try to pull out Chicago with the knowledge that it is at the very
#beginning of the string, and goes up to 7 charactors:
ereg("([a-zA-Z]{0,7})", $string1); #will render: Chicago
Instead of using the regular expression, ([a-zA-Z0-9/,._-!@#$%^&*]+) to find a WHOLE string of any type of charactor, the regex : ((.+)) may be used. The . (period) is any kind of charactor, and the + repeats the cycle.
So in our example:
PHP Code:
$string1 = "Chicago is known as the windy city";
#try to pull out Chicago in $string1 using "+"
ereg("((.+))" $string1); #will render:Chicago is known as the windy city
PHP Code:
$string1 = "Chicago is known as the windy city";
#try to pull out Chicago in $string1 using "+"
ereg("is ((.+)) as" $string1); #will render: known
#this is because it searches the whole string for "is $x as", where $x
#can be absolutely anything, and it turns out that the $x
#is equal to "known"
Because if the regex doesn't work, it returns BOOL FALSE, you can use it in IF - ELSE clauses. Therefore, it can be usefull in validation functions:
PHP Code:
function validateEmail( $strEmail )
{
if( ereg("((.+))@([a-zA-Z]+).([a-zA-Z0-9]{3})", $strEmail) )
{
return true;
}
else {
return false;
}
}
#####
validateEmail( "emailtest.com" ) #=false
validateEmail( "email@test.f" ) #=false
validateEmail( "email.com@test" ) #=false
validateEmail( "email@test.com" ) #=true
Well, so far we've discussed the usefullness of REGEX, and the proper coding technique. Now comes the fun stuff.
In the ereg() functions displayed above, we've always used 2 parameters; $stringNeedle, $stringStack. But we know that there is 3; the third being $stringArray. If there are multiple (or even 1) regular expressions in the same ereg() function, each value that each clause represent are equal to $stringStack[$i] respectively; $i incrementing by 1 from the beginning of the regex expression.
Wow.
Well, this simply means:
PHP Code:
#in this kind of regular expression,
#we will try to get the DOMAIN type
#from the email address (.com, .net, .org, etc)
$stringEmail = "testemail@test.com";
ereg("((.+))@((.+)).([a-zA-Z0-9]{3})", $stringEmail, $Emails);
#now, you can see there are 3 clauses in the regex.
#1 is any text before the @
#2 is the text after @ and before the .
#3 is the domain name
#so we can set up some kind of simple variables:
$emailname = $Emails[1]; #first regex clause
$domainfirst= $Emails[2]; #second regex clause
$domainlast = $Emails[3]; #third regex clause
/*
So, if the $stringEmail is valid and returns BOOL TRUE by the ereg() functions, the array register values ($Emails) are equal to:
If $i = 1 and goes to $NUMBER_OF_REGEX_CLAUSES by 1, $i should =:
1, 2, 3
So, starting at 1, going to however many number of regex clauses there are,
$arrayregister[$i] is equal to the $i'th clause.
Basically what we defined above:
$emailname = $Emails[1]; #first regex clause
$domainfirst= $Emails[2]; #second regex clause
$domainlast = $Emails[3]; #third regex clause
*/
echo "Email Name:" . $emailname . " domain first name: " . $domainfirst . " domain type: " . $domain last . " domain name: " . $domainfirst . $domainlast;
#outputs:
/*Email Name: testemail domain first name: test domain type: .com domain name: test.com
*/






Linear Mode

