Regular Expression Analysis using C # documents (updated)

  Regular Expression Analysis using C # documents (Updated) Jack H Hansen [2004-07-28] Keywords C # Regular Expression (Regular Expression) Syntax Highlighting presumably many readers have written to the code by syntax coloring process.    And this in a period of time before it is a very difficult task.    Code you need to write a large number of grammar - and this is often the most difficult part.    Until, the regular expression (Regular Expression) the emergence, we will be able to work from the heavy relief.    Regular expressions are a number of methods available (standard mode), so that we can efficiently create, modify and string comparison, as well as the rapid analysis of large text and data to search, remove and replace text mode [1].    DotNET Framework provides System.Text.RegularExpression namespace to achieve their commitment to the function. 

  1. A regular expression [2] 

  First of all, I would like to briefly talk about the regular expression. 

  Regular expressions are the first by the mathematician Stephen Kleene made in 1956, he is in the natural language of incremental research results on the basis of the past.    With the integrity of the regular expression syntax used in the form of matching characters regard, the melt was later applied to the field of information technology.    Since then, the regular expression through periods of development, and now has been the standard ISO (International Standards Organization) approved by the Open Group and organizations identified. 

  The regular expression is not a language, but it can be used in a document or characters in the Find and replace text of a standard.    It has two standards: a regular expression (BRE), Extended Regular Expression (ERE).    ERE including BRE other functions and other concepts. 

  Xsh have advanced, egrep, sed, vi and UNIX platforms under the program the regular expression.    They can be adopted by many languages, such as HTML and XML, which usually is only adopted a subset of the standard.    Along with the regular expression transplanted to cross-platform programming language development, its functions are increasingly integrity, and extensive use gradually. 

  2. Related to the expression 

  Is the regular expression I can only so much - it is not a small body of knowledge, not the few words to explain.    Here, I only introduced with C # syntax analysis related to a guitar string matching.    Please refer to the detailed contents of the collection points Blog Regular Expression Specification [The Open Group].    Also, if you have a regular expression of a considerable understanding, then you can skip Below each one explained to complete as soon as possible the full text. 

  I> string "(\ \ ?.)*?" 

  In addition to the regular expression. $ ^ ([(|) * +? \, The other characters with their own match.    In the above Shizi, the quotation mark on both sides of the matching string is that on both sides of the quotes.    "\ \" Said a "\" character.    Following behind the "?" Said match zero or one character.    "." With the exception \ n characters from any match. 

  "()" That capture the match in the string.    Usage () caught under the order from left brackets automatically No. 1.    Capture elements of a code zero capture by the regular expression pattern matching text.    Parentheses after the "*" indicated the existence of one or more of such strings.    Namely, "*" is the role in the "(\ \?.)". 

  "?" The existence of empty string can also be captured. 

  Ii> verbatim string"(""|.)*?" 

  Matching similar to the @ "Hello" and "World" and "!" String. 

  And use | (vertical) separation of any characters matching a term, for example, cat | dog | tiger.    The most successful use of the left side of the match. 

  Iii> C # documentation information in the xml element / / / \ s *<.*> 

  C # automated matching XML documents.    "\ S" any blank characters.    It must be noted that, please do not be modified case.    Because in the regular expression is case sensitive, in its wildcard in the case characters often said exactly the opposite meaning.    For example, "\ S" that any non-blank characters.    (Below "\ Z" is such) 

  Iv> C # documentation of the contents of information / / / \ s? .* 

  V> empty row ^ \ s * \ Z 

  "^" Designated match must appear at the beginning or in the string to the beginning.    "\ Z" said designated match must appear on the end of a string or the end of the string \ n before. 

  Vi> C # Notes / / .* 

  Vii> C # keyword (abstract | where | while | yield) (1) (\. | (\ S )+|;|,| \ (| \ [) (1) 

  Space constraints, only listed here a few keywords (C # at least 80 keywords ^_^)。    It must be noted that the left parser will match the success of the first.    Therefore, the relationship has included the word should pay attention to the order: Includes have to be included on those before.    For example: (in | int) will be its analytical Chabudao int, it should be (int | in). 

  In addition to all the brackets (\ (| \ [| \ (| \) | \] | \)). 

  3. Related with the members [3] 

[Serializable]

  Public class Regex: ISerializable 

  / / That can not be changed is the regular expression. 

  Regex class includes a number of static methods, so that you do not need explicit Regex object can be used to create a regular expression.    Use the static method is equivalent to Regex object structure, the use of the object once and then their destruction. 

  Regex class is not changed (read-only), and has inherent security thread.    In any thread can create Regex object, and shared among threads. 

  Excerpt from Microsoft over the development of documentation.    We also need to use several of its members: 

   / / Designated input in the search string Regex specified in the constructor function of the regular expression matches.  

  Public Match Match ( 

  String intput 

)

  Match of the 

[Serializable]

  Public class Match: Group 

  / / That is the regular expression matching individual results.    Group related to the detailed information, please refer to Microsoft's development document. 

  We will use the following members 

  / / String found in the original capture of a string of initial position to start from scratch. 

  Public int Index (get;) 

  / / Capture of the length of the string. 

  Public int Length (get;) 

  / / By matching the actual capture of a string. 

  Public int Value (get;) 

  / / Get a value, the value of matching the success of instructions. 

  Public bool Success (get;) 

  / / Get from the regular expression matching collection of the group. 

  Public virtual GroupCollection Groups (get;) 

  / / From the end of a match here (in the last match of the character after character) 

  / / Return a matching contains the results of a new Match. 

  Public Match NextMatch (); 

  Group category, as well as the corresponding members (listed above of the members of the Match, by the first four attributes are inherited from the Group category, these members will no longer be an exhaustive list). 

  Matching string must Regex instance of the class initialization time specified.    You can use the constructor function creates an instance, use it and then destroyed it.    Or directly use the static method, which is equivalent to creation of examples.    However, after testing, I found that the static method to slightly slower than compiled Regex object.    See below for a group of test data: 

  4. Write code 

  We now need to set out in section III of the C # language element analysis.    I have taken a progressive analysis (if we are to take more of the need to amend the relevant expression [4]). 

  Using System.Text.RegularExpression; 

  / / … Some other codes … 

  / / Create Regex first examples (string analytical example). 

  Regex DoubleQuotedString = new Regex ( "\" (\ \ \ \ ?.)*? \ ""); 

  / / And then to match string. 

  Match m; 

  For (m = DoubleQuotedString.Match (strSomeCodes); m.Success; m.NextMatch ()) ( 

  Foreach (Group g in m.Groups) ( 

  / / Do some drawings 

  ) 

  ) 

  The remaining issue is the color code has written. 

  5. Source code 

  Note: 

  [1] "can…… text mode" since the primers. NET Framework conventional reference in the regular expression language elements 

  [2] Introduction here is the expression of the regular expression is a brief introduction of the reference ZDNet China in the development of technology and related content. 

  [3] of this section and in the function of the signatures were from the Notes and Microsoft documents. 

  [4] Analysis more details, please see the trip. NET Framework conventional reference to the regular expression language elements 

Bookmark it: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • Sphinn
  • del.icio.us
  • Google
  • DotNetKicks
  • DZone
  • Furl
  • Netvouz

Tags:

Releated Articles


0 Comments to “Regular Expression Analysis using C # documents (updated)”

No Comments. Send your comment.

Leave a Reply

You must be logged in to post a comment.