Username: Save?
Password:
Home Forum Links Search Login Register*
    News: Keep The TechnoWorldInc.com Community Clean: Read Guidelines Here.
Recent Updates
[April 03, 2024, 06:11:00 PM]

[April 03, 2024, 06:11:00 PM]

[April 03, 2024, 06:11:00 PM]

[April 03, 2024, 06:11:00 PM]

[March 06, 2024, 02:45:27 PM]

[March 06, 2024, 02:45:27 PM]

[March 06, 2024, 02:45:27 PM]

[March 06, 2024, 02:45:27 PM]

[February 14, 2024, 02:00:39 PM]

[February 14, 2024, 02:00:39 PM]

[February 14, 2024, 02:00:39 PM]

[February 14, 2024, 02:00:39 PM]

[February 08, 2024, 10:26:18 AM]
Subscriptions
Get Latest Tech Updates For Free!
Resources
   Travelikers
   Funistan
   PrettyGalz
   Techlap
   FreeThemes
   Videsta
   Glamistan
   BachatMela
   GlamGalz
   Techzug
   Vidsage
   Funzug
   WorldHostInc
   Funfani
   FilmyMama
   Uploaded.Tech
   MegaPixelShop
   Netens
   Funotic
   FreeJobsInc
   FilesPark
Participate in the fastest growing Technical Encyclopedia! This website is 100% Free. Please register or login using the login box above if you have already registered. You will need to be logged in to reply, make new topics and to access all the areas. Registration is free! Click Here To Register.
+ Techno World Inc - The Best Technical Encyclopedia Online! » Forum » THE TECHNO CLUB [ TECHNOWORLDINC.COM ] » Programming Zone » Others
 Regular Expressions in Scripting
Pages: [1]   Go Down
  Print  
Author Topic: Regular Expressions in Scripting  (Read 2723 times)
Taruna
Elite Member
*****



Karma: 13
Offline Offline

Posts: 845

Hi ALL


View Profile
Regular Expressions in Scripting
« Posted: January 04, 2007, 01:18:32 AM »


Regular Expressions in Scripting


No matter how beautiful and advanced the GUIs and window managers are, the real power of Linux remains in automation through the command line. Automation means having a program perform a task without any manual intervention. A program doesnt necessarily have to be a compiled language such as C. In fact, automation programs are best written in interpreted languages. This is because interpreted programs are faster to modify and re-run themselves. The other important requirements for such automated programs are:

* Simple and easy file operations (redirections) .
* Simple and easy pattern operations (regular expressions) .
* Ability to achieve the maximum functionality with minimum coding by having powerful functions / commands and
dynamic variable typing.
* Ability to mix and match many such languages (invocations) .

Languages that meet the above-mentioned requirements are generally referred to as scripting languages. Tcl/Tk, Perl, Python and Shell are a few of the popular scripting languages. All of these are almost equally powerful in computation. The difference lies in the power of expression of logic in each one of them, and that is the deciding factor when selecting a scripting language. The best part of these scripting languages is the ability to mix and match many such languages to leverage the best of each.

What do we mean by script or scripting. A script is a program that is written in any of the scripting languages. Initially, these programs used to be simple and repetitive in nature. But as things evolved, scripts started to be used for sophisticated tasks too. And the term scripting is a colloquial term for the writing of such a program.

In this article, we will talk about one of the most important components of scripting: Regular Expressions.
The concept of regular expression comes from the Formal Automata Theory. For the sake of definition, a regular expression is a string that describes or matches a set of strings, according to certain syntax rules. They are usually used to give a concise description of a set, without having to list all elements. In layman terms, it is a shorthand notation for a particular set of patterns.

Listed below are the basic regular operators in Formal Automata Theory (FAT):

* OR denoted by `+'
* AND denoted by `.'
* POWER denoted by superscript
* Special cases of power are superscripted * and +, denoting 0, or more, and 1 or more powers, respectively And their
equivalents in scripting languages are:
* OR denoted by [ ]
* AND denoted by juxtaposing (as in algebra)
* POWER denoted by { }
* * and + denoted by * and + but not superscripted

The regular operands are the symbols from a pre-defined symbol set S. For the simplest case, let S = {0, 1}. Here are a few regular expression examples based on what we have discussed so far:

Description FAT Notation Script Notation
For all binary strings/numbers (0 + 1)+ [01]+
For all even binary numbers (0 + 1)*0 [01]*0
For all 5-digit binary numbers (0 + 1)5 [01]{5}
For all 5-digit binary numbers not starting with 0 1(0 + 1)4 1[01]{4}
(Some food for thought: Try and write a regular expression for the following:)

* All binary numbers containing 1, only in pairs
* All binary palindromes
* All binary numbers divisible by 3

Now, in real life, scripting for just binary patterns is not that useful. We need the complete character set as the set S. With that, we have too many symbols to use, apart from more complicated real-life patterns. So, additional notations were added. Some of the most often used ones are:

* RANGE denoted by `-', e.g., [ABC...Z] may be represented as [A-Z]; [abc...z] as [a-z] and [01...9] as [0-9]
* `.' denotes `Any single character'
* `^' denotes Start of line
* `$' denotes End of line
* {m,n} is shorthand for OR of POWERS {m}, {m+1}, ..., {n}.
n could be left blank to indicate infinity

In the scripting world, this extended set S and all such associated notations are the ones that form the regular expressions.
Examples from scripting:

* An identifier: [_A-Za-z][_A- Za-z0-9]*
* All hex values: [0-9A-Fa-f]+
* A complete line: ^.*$
* All decimal values with less than or equal to 10 digits: [0- 9]{1,10}
* {0,} is identical to *
* {1,} is identical to +

As an exercise, try to write a regular expression for an e-mail id. One might wonder what these will be used for. One of the most common and powerful uses is search and replace.

Another much needed use is extracting patterns and operating on them, e.g., extracting e-mail IDs from a file to send e-mails in groups.
In our next article, we will explore these abilities and will look at how to use these regular expressions in various shell commands.

Logged

Pages: [1]   Go Up
  Print  
 
Jump to:  

Copyright © 2006-2023 TechnoWorldInc.com. All Rights Reserved. Privacy Policy | Disclaimer
Page created in 0.116 seconds with 24 queries.