发布于 2015-07-11 02:53:07 | 315 次阅读 | 评论: 0 | 来源: 网络整理
正则表达式是一个特殊的字符序列可以帮助匹配或者找到其他字符串或串套,使用的模式保持一个专门的语法。
正则表达式文本是一个模式之间的斜线之间或任意分隔符 %r 如下:
/pattern/
/pattern/im # option can be specified
%r!/usr/local! # general delimited regular expression
#!/usr/bin/ruby
line1 = "Cats are smarter than dogs";
line2 = "Dogs also like meat";
if ( line1 =~ /Cats(.*)/ )
puts "Line1 starts with Cats"
end
if ( line2 =~ /Cats(.*)/ )
puts "Line2 starts with Dogs"
end
这将产生以下结果:
Line1 starts with Cats
正则表达式的文字可以包括一个可选的修饰符来控制各方面的匹配。修改指定第二个斜杠字符后,如前面所示,可表示为这些字符之一:
| 修饰符 | 描述 |
|---|---|
| i | Ignore case when matching text. |
| o | Perform #{} interpolations only once, the first time the regexp literal is evaluated. |
| x | Ignores whitespace and allows comments in regular expressions |
| m | Matches multiple lines, recognizing newlines as normal characters |
| u,e,s,n | Interpret the regexp as Unicode (UTF-8), EUC, SJIS, or ASCII. If none of these modifiers is specified, the regular expression is assumed to use the source encoding. |
%Q分隔字符串文字一样,Ruby允许正则表达式带 %r,然后由所选择的定界符。这是非常有用的,当所描述的模式中包含正斜杠字符不希望转义:
# Following matches a single slash character, no escape required
%r|/|
# Flag characters are allowed with this syntax, too
%r[</(.*)>]i
除控制字符, (+ ? . * ^ $ ( ) [ ] { } | ), 所有字符匹配。可以转义控制字符前面加上反斜线。
下表列出了可在Ruby的正则表达式语法。
| 模式 | 描述 |
|---|---|
| ^ | Matches beginning of line. |
| $ | Matches end of line. |
| . | Matches any single character except newline. Using m option allows it to match newline as well. |
| [...] | Matches any single character in brackets. |
| [^...] | Matches any single character not in brackets |
| re* | Matches 0 or more occurrences of preceding expression. |
| re+ | Matches 1 or more occurrence of preceding expression. |
| re? | Matches 0 or 1 occurrence of preceding expression. |
| re{ n} | Matches exactly n number of occurrences of preceding expression. |
| re{ n,} | Matches n or more occurrences of preceding expression. |
| re{ n, m} | Matches at least n and at most m occurrences of preceding expression. |
| a| b | Matches either a or b. |
| (re) | Groups regular expressions and remembers matched text. |
| (?imx) | Temporarily toggles on i, m, or x options within a regular expression. If in parentheses, only that area is affected. |
| (?-imx) | Temporarily toggles off i, m, or x options within a regular expression. If in parentheses, only that area is affected. |
| (?: re) | Groups regular expressions without remembering matched text. |
| (?imx: re) | Temporarily toggles on i, m, or x options within parentheses. |
| (?-imx: re) | Temporarily toggles off i, m, or x options within parentheses. |
| (?#...) | Comment. |
| (?= re) | Specifies position using a pattern. Doesn't have a range. |
| (?! re) | Specifies position using pattern negation. Doesn't have a range. |
| (?> re) | Matches independent pattern without backtracking. |
| w | Matches word characters. |
| W | Matches nonword characters. |
| s | Matches whitespace. Equivalent to [tnrf]. |
| S | Matches nonwhitespace. |
| d | Matches digits. Equivalent to [0-9]. |
| D | Matches nondigits. |
| A | Matches beginning of string. |
| Z | Matches end of string. If a newline exists, it matches just before newline. |
| z | Matches end of string. |
| G | Matches point where last match finished. |
| b | Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets. |
| B | Matches nonword boundaries. |
| n, t, etc. | Matches newlines, carriage returns, tabs, etc. |
| 1...9 | Matches nth grouped subexpression. |
| 10 | Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code. |
| 例子 | 描述 |
|---|---|
| /ruby/ | Match "ruby". |
| ¥ | Matches Yen sign. Multibyte characters are suported in Ruby 1.9 and Ruby 1.8. |
| 例子 | 描述 |
|---|---|
| /[Rr]uby/ | Match "Ruby" or "ruby" |
| /rub[ye]/ | Match "ruby" or "rube" |
| /[aeiou]/ | Match any one lowercase vowel |
| /[0-9]/ | Match any digit; same as /[0123456789]/ |
| /[a-z]/ | Match any lowercase ASCII letter |
| /[A-Z]/ | Match any uppercase ASCII letter |
| /[a-zA-Z0-9]/ | Match any of the above |
| /[^aeiou]/ | Match anything other than a lowercase vowel |
| /[^0-9]/ | Match anything other than a digit |
| 例子 | 描述 |
|---|---|
| /./ | Match any character except newline |
| /./m | In multiline mode . matches newline, too |
| /d/ | Match a digit: /[0-9]/ |
| /D/ | Match a nondigit: /[^0-9]/ |
| /s/ | Match a whitespace character: /[ trnf]/ |
| /S/ | Match nonwhitespace: /[^ trnf]/ |
| /w/ | Match a single word character: /[A-Za-z0-9_]/ |
| /W/ | Match a nonword character: /[^A-Za-z0-9_]/ |
| 例子 | 描述 |
|---|---|
| /ruby?/ | Match "rub" or "ruby": the y is optional |
| /ruby*/ | Match "rub" plus 0 or more ys |
| /ruby+/ | Match "rub" plus 1 or more ys |
| /d{3}/ | Match exactly 3 digits |
| /d{3,}/ | Match 3 or more digits |
| /d{3,5}/ | Match 3, 4, or 5 digits |
此相匹配的最小的重复次数:
| 例子 | 描述 |
|---|---|
| /<.*>/ | Greedy repetition: matches "<ruby>perl>" |
| /<.*?>/ | Nongreedy: matches "<ruby>" in "<ruby>perl>" |
| 例子 | 描述 |
|---|---|
| /Dd+/ | No group: + repeats d |
| /(Dd)+/ | Grouped: + repeats Dd pair |
| /([Rr]uby(, )?)+/ | Match "Ruby", "Ruby, ruby, ruby", etc. |
这再次匹配先前匹配的组:
| 例子 | 描述 |
|---|---|
| /([Rr])uby&1ails/ | Match ruby&rails or Ruby&Rails |
| /(['"])(?:(?!1).)*1/ | Single or double-quoted string. 1 matches whatever the 1st group matched . 2 matches whatever the 2nd group matched, etc. |
| 例子 | 描述 |
|---|---|
| /ruby|rube/ | Match "ruby" or "rube" |
| /rub(y|le))/ | Match "ruby" or "ruble" |
| /ruby(!+|?)/ | "ruby" followed by one or more ! or one ? |
这需要指定匹配位置
| 例子 | 描述 |
|---|---|
| /^Ruby/ | Match "Ruby" at the start of a string or internal line |
| /Ruby$/ | Match "Ruby" at the end of a string or line |
| /ARuby/ | Match "Ruby" at the start of a string |
| /RubyZ/ | Match "Ruby" at the end of a string |
| /bRubyb/ | Match "Ruby" at a word boundary |
| /brubB/ | B is nonword boundary: match "rub" in "rube" and "ruby" but not alone |
| /Ruby(?=!)/ | Match "Ruby", if followed by an exclamation point |
| /Ruby(?!!)/ | Match "Ruby", if not followed by an exclamation point |
| 例子 | 描述 |
|---|---|
| /R(?#comment)/ | Matches "R". All the rest is a comment |
| /R(?i)uby/ | Case-insensitive while matching "uby" |
| /R(?i:uby)/ | Same as above |
| /rub(?:y|le))/ | Group only without creating 1 backreference |
String方法最重要的,使用正则表达式sub 和 gsub,他们就地变种sub! 和 gsub!
所有这些方法执行搜索和替换操作过程中使用一个正则表达式模式。sub & sub!替换第一次出现的模式 gsub & gsub!替换所有出现。
sub! 和 gsub! 返回一个新的字符串,未经修改的原始 sub 和 gsub 他们被称为修改字符串。
下面的例子:
#!/usr/bin/ruby
phone = "2004-959-559 #This is Phone Number"
# Delete Ruby-style comments
phone = phone.sub!(/#.*$/, "")
puts "Phone Num : #{phone}"
# Remove anything other than digits
phone = phone.gsub!(/D/, "")
puts "Phone Num : #{phone}"
这将产生以下结果:
Phone Num : 2004-959-559
Phone Num : 2004959559
下面是另一个例子:
#!/usr/bin/ruby
text = "rails are rails, really good Ruby on Rails"
# Change "rails" to "Rails" throughout
text.gsub!("rails", "Rails")
# Capitalize the word "Rails" throughout
text.gsub!(/brailsb/, "Rails")
puts "#{text}"
这将产生以下结果:
Rails are Rails, really good Ruby on Rails