├── zh ├── README.md ├── Makefile ├── extract_examples.pl ├── 01-chapter0.markdown ├── 01-chapter4.markdown ├── 01-chapter3.markdown ├── 01-chapter2.markdown └── 01-chapter1.markdown ├── Makefile ├── README.md ├── extract_examples.pl ├── 01-chapter0.markdown ├── 01-chapter3.markdown ├── 01-chapter4.markdown ├── 01-chapter2.markdown └── 01-chapter1.markdown /zh/README.md: -------------------------------------------------------------------------------- 1 | Golang 正则表达式教程 2 | ===================== 3 | 4 | 这是一个针对 Go 语言的正则表达式的教程。 5 | 6 | 更多 Go 的知识可到 Go 语言官网:http://www.golang.org. 7 | 8 | 9 | 本著作基于 Creative Commons Attribution-NonCommercial-ShareAlike 3.0 许可证。 10 | 11 | 中文翻译:[B1nj0y](https://github.com/gingerhot) 12 | 13 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | no: 2 | echo This is not the Makefile you were looking for. 3 | 4 | all: 5 | perl extract_examples.pl 01-chapter1.markdown > r1.go 6 | perl extract_examples.pl 01-chapter2.markdown > r2.go 7 | perl extract_examples.pl 01-chapter3.markdown > r3.go 8 | perl extract_examples.pl 01-chapter4.markdown > r4.go 9 | go build r1.go 10 | go build r2.go 11 | go build r4.go 12 | 13 | html: 14 | perl Markdown.pl 01-chapter1.markdown > chapter1.html 15 | 16 | 17 | -------------------------------------------------------------------------------- /zh/Makefile: -------------------------------------------------------------------------------- 1 | no: 2 | echo This is not the Makefile you were looking for. 3 | 4 | all: 5 | perl extract_examples.pl 01-chapter1.markdown > r1.go 6 | perl extract_examples.pl 01-chapter2.markdown > r2.go 7 | perl extract_examples.pl 01-chapter3.markdown > r3.go 8 | perl extract_examples.pl 01-chapter4.markdown > r4.go 9 | go build r1.go 10 | go build r2.go 11 | go build r4.go 12 | 13 | html: 14 | perl Markdown.pl 01-chapter1.markdown > chapter1.html 15 | 16 | 17 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![License: CC BY-NC-SA 4.0](https://licensebuttons.net/l/by-nc-sa/4.0/80x15.png)](https://creativecommons.org/licenses/by-nc-sa/4.0/) 2 | 3 | Golang-Regex-Tutorial 4 | ===================== 5 | 6 | Golang - Regular Expression Tutorial 7 | 8 | This is a regular expression tutorial for Go, the language. 9 | 10 | - [Introduction](01-chapter0.markdown) 11 | - [Part 1: The basics](01-chapter1.markdown) 12 | - [Part 2: Advanced](01-chapter2.markdown) 13 | - [Part 3: Cookbook](01-chapter3.markdown) 14 | - [Part 4: Alternatives](01-chapter4.markdown) 15 | 16 | Go to http://www.golang.org for more information about Go. 17 | 18 | 19 | This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License. 20 | 21 | I am planning to setup gh-pages: http://stefanschroeder.github.io/Golang-Regex-Tutorial/ 22 | 23 | [A Chinese version translated](zh/) by [B1nj0y](https://github.com/gingerhot). 24 | -------------------------------------------------------------------------------- /zh/extract_examples.pl: -------------------------------------------------------------------------------- 1 | # 2 | # Extract golang code from Markdown tutorial. 3 | # 4 | # Author: Stefan Schroeder, 2012-09-08 5 | use strict; 6 | 7 | my $i = 0; 8 | my $e = ""; 9 | my $section = ""; 10 | my $insection = 0; 11 | my $marker = "\t \t"; 12 | 13 | my $code = <) 25 | { 26 | if(m/$marker/) 27 | { 28 | $insection = ($insection + 1) % 2; 29 | if ($insection) 30 | { 31 | $i++; 32 | } 33 | else 34 | { 35 | $e .= "{ // $i \n" . $section . "} // $i \n"; 36 | $section = ''; 37 | } 38 | } 39 | elsif($insection) 40 | { 41 | $section .= $_; 42 | } 43 | } 44 | 45 | $code =~ s/EXAMPLES/$e/; 46 | 47 | if ($code =~ /strings\./) 48 | { 49 | $code =~ s/package main\n/package main\nimport "strings"\n/; 50 | } 51 | 52 | print $code; 53 | 54 | -------------------------------------------------------------------------------- /extract_examples.pl: -------------------------------------------------------------------------------- 1 | # 2 | # Extract golang code from Markdown tutorial. 3 | # 4 | # Author: Stefan Schroeder, 2012-09-08 5 | use strict; 6 | 7 | my $i = 0; 8 | my $e = ""; 9 | my $section = ""; 10 | my $insection = 0; 11 | my $marker = "\t \t"; 12 | 13 | my $code = <) 27 | { 28 | if(m/$marker/) 29 | { 30 | $insection = ($insection + 1) % 2; 31 | if ($insection) 32 | { 33 | $i++; 34 | } 35 | else 36 | { 37 | $e .= "{ // $i \n" . $section . "} // $i \n"; 38 | $section = ''; 39 | } 40 | } 41 | elsif($insection) 42 | { 43 | $section .= $_; 44 | } 45 | } 46 | 47 | $code =~ s/EXAMPLES/$e/; 48 | 49 | if ($code =~ /strings\./) 50 | { 51 | $code =~ s/package main\n/package main\nimport "strings"\n/; 52 | } 53 | 54 | print $code; 55 | 56 | -------------------------------------------------------------------------------- /zh/01-chapter0.markdown: -------------------------------------------------------------------------------- 1 | ## 简介 ## 2 | 3 | 这篇短小的教程是想教你一些使用 Go 语言正则表达式的基础知识。但它既不是 Go 语言更不是正则的入门教程。本教程所针对的是那些已经对两者都有所了解且希望对这两者的结合使用进行实践的读者。它使用 cookbook 的方式,每一个例子尽可完整,完全可以复制粘贴。 4 | 5 | 文章使用 Markdown 的排版。 6 | 7 | 在这个第一版里我打算仅仅讲一下 regexp 包里处理字符串的函数,因为我觉得这是最常见的用法。regexp 包大约有四十个函数,所以你最好读一下该包的文档。 8 | 9 | [第一部分:基础知识](01-chapter1.markdown): 正则表达式使用的基础。 10 | 11 | [第二部分:高级](01-chapter2.markdown): 相对复杂些的正则。 12 | 13 | [第三部分:Cookbook](01-chapter3.markdown): 一些示例程序。 14 | 15 | [第四部分:换个思路](01-chapter4.markdown): 有时正则并非最佳方案。 16 | 17 | *参考文档:* 18 | 19 | [regexp 包官方文档](http://golang.org/pkg/regexp/) (译注:翻墙吧,骚年) 20 | 21 | [re2 正则库](https://code.google.com/p/re2/) 22 | 23 | [Russ Cox 收集的有关正则表达式的入口页](http://swtch.com/~rsc/regexp/) 24 | 25 | Mark McGranaghan 创建的一个很棒的 Go 语言程序例子的网站。这里也有一页 26 | [关于正则的页面](https://gobyexample.com/regular-expressions) 27 | 28 | Rob Pike 有话说:关于 [用正则进行词法分析和解析(lexing and parsing)](http://commandcenter.blogspot.ch/2011/08/regular-expressions-in-lexing-and.html). 29 | 30 | 如果碰到有关 Go 的问题,你自己解决不了了,你可以去 31 | [Golang-Nuts 邮件列表](https://groups.google.com/group/golang-nuts)。 32 | 当然你也许已经知道这个了。 33 | 34 | [Perl 正则教程](http://perldoc.perl.org/perlretut.html) 去寻找点灵感吧。(译注:作为一个 Perler,我很欣慰 ^_^) 35 | 36 | > Version 0.1 Initial. 37 | 38 | > This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. 39 | 40 | 作者:Stefan Schroeder 41 | 42 | -------------------------------------------------------------------------------- /zh/01-chapter4.markdown: -------------------------------------------------------------------------------- 1 | # 第四部分:换个思路 # 2 | 3 | ## 把一句话分词 ## 4 | 5 | 如果输入部分字面量是字符串,你则不必使用正则。 6 | 7 | s := "abc,def,ghi" 8 | r, err := regexp.Compile(`[^,]+`) // everything that is not a comma 9 | res := r.FindAllString(s, -1) 10 | // Prints [abc def ghi] 11 | fmt.Printf("%v", res) 12 | 13 | 14 | *strings* 包里面的 *Split* 函数就是用来做这个的,而且语法更可读。 15 | 16 | s := "abc,def,ghi" 17 | res:= strings.Split(s, ",") 18 | // Prints [abc def ghi] 19 | fmt.Printf("%v", res) 20 | 21 | 22 | ## 验证在一个字符串里是否存在一个指定的子字符串 ## 23 | 24 | 使用 *MatchString* 函数可以在一个字符串里查找另一个字面量的字符串。 25 | 26 | 27 | s := "OttoFritzHermanWaldoKarlSiegfried" 28 | r, err := regexp.Compile(`Waldo`) 29 | res := r.MatchString(s) 30 | // Prints true 31 | fmt.Printf("%v", res) 32 | 33 | 34 | 但是使用 *strings.Index* 函数可以在字串中获取匹配到子串的索引。当不匹配时则返回的索引为-1。 35 | 36 | 37 | s := "OttoFritzHermanWaldoKarlSiegfried" 38 | res:= strings.Index(s, "Waldo") 39 | // Prints true 40 | fmt.Printf("%v", res != -1) 41 | 42 | 43 | ## 删除空格 44 | 45 | 每当你读一些来自文件或是用户的文本时,你可能都想忽略那些句子开头和末尾的空格。 46 | 47 | 你可以用正则来搞定: 48 | 49 | s := " Institute of Experimental Computer Science " 50 | r, err := regexp.Compile(`\s*(.*)\s*`) 51 | res:= r.FindStringSubmatch(s) 52 | // 53 | fmt.Printf("<%v>", res[1]) 54 | 55 | 首次移除空格大作战以失败告终。只有字符串开头前面的空格被删除了,接下来的 .* 这个片段是贪婪匹配,所以它会捕获余下的全部内容。但是对于这样的任务我不想继续折腾正则了,因为我知道还有 *strings.TrimSpace* 这个东东。 56 | 57 | s := " Institute of Experimental Computer Science " 58 | // 59 | fmt.Printf("<%v>", strings.TrimSpace(s)) 60 | 61 | 62 | TrimSpace 删除了开头和结尾的空格。翻阅 *strings* 包的文档会发现 Trim 家族还有其它一些函数。 63 | -------------------------------------------------------------------------------- /01-chapter0.markdown: -------------------------------------------------------------------------------- 1 | ## Introduction ## 2 | 3 | This small tutorial tries to teach you the basics of using regular expressions with Go. It is neither an introduction into Go, nor an introduction into regular expressions. The intended audience consists of developers that already have a concept of both and want to see how these go together in practice. It sports a cookbook approach. Every example is supposed to be complete, ready for copy-and-paste. 4 | 5 | The text was written in Markdown. 6 | 7 | In the first version I am going to cover only functions from the regexp-package that deal with strings, because I feel that this is the most common use. The regexp-package has some forty functions; make sure you read the package documentation. 8 | 9 | [Part 1: The Basics](01-chapter1.markdown): The basics of using regular expressions. 10 | 11 | [Part 2: Advanced](01-chapter2.markdown): More sophisticated regular expressions. 12 | 13 | [Part 3: Cookbook](01-chapter3.markdown): A few example programs. 14 | 15 | [Part 4: Alternatives](01-chapter4.markdown): When regexps are not the right solution. 16 | 17 | *References:* 18 | 19 | [Official documentation of the regexp package](http://golang.org/pkg/regexp/) 20 | 21 | [re2 regular expression library](https://code.google.com/p/re2/) 22 | 23 | [Russ Cox' entry page for things regular expressions](http://swtch.com/~rsc/regexp/) 24 | 25 | Mark McGranaghan set up a nice website with Go examples. There is also a 26 | [page on regular expressions](https://gobyexample.com/regular-expressions) 27 | 28 | Rob Pike has more to say about [Regular expressions in lexing and parsing](http://commandcenter.blogspot.ch/2011/08/regular-expressions-in-lexing-and.html). 29 | 30 | If you have a Go related problem that you cannot solve alone, you want to go to 31 | [Golang-Nuts mailing list](https://groups.google.com/group/golang-nuts). 32 | But you probably already knew that. 33 | 34 | [Perl regexp tutorial](http://perldoc.perl.org/perlretut.html) For inspiration. 35 | 36 | > Version 0.1 Initial. 37 | 38 | > This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. 39 | 40 | Written by Stefan Schröder. 41 | 42 | -------------------------------------------------------------------------------- /zh/01-chapter3.markdown: -------------------------------------------------------------------------------- 1 | # 第三部分:示例 Cookbook # 2 | 3 | ## grep ## 4 | 5 | 这个 grep 工具用来在文本文件中搜索匹配一个正则表达式。读取到的每行文本都会和命令行中给定的正则进行匹配,匹配到的行会被打印出来。 6 | 7 | package main 8 | 9 | import ( 10 | "flag" 11 | "regexp" 12 | "bufio" 13 | "fmt" 14 | "os" 15 | ) 16 | 17 | func grep(re, filename string) { 18 | regex, err := regexp.Compile(re) 19 | if err != nil { 20 | return // there was a problem with the regular expression. 21 | } 22 | 23 | fh, err := os.Open(filename) 24 | f := bufio.NewReader(fh) 25 | 26 | if err != nil { 27 | return // there was a problem opening the file. 28 | } 29 | defer fh.Close() 30 | 31 | buf := make([]byte, 1024) 32 | for { 33 | buf, _ , err = f.ReadLine() 34 | if err != nil { 35 | return 36 | } 37 | 38 | s := string(buf) 39 | if regex.MatchString(s) { 40 | fmt.Printf("%s\n", string(buf)) 41 | } 42 | } 43 | } 44 | 45 | func main() { 46 | flag.Parse() 47 | if flag.NArg() == 2 { 48 | grep(flag.Arg(0), flag.Arg(1)) 49 | } else { 50 | fmt.Printf("Wrong number of arguments.\n") 51 | } 52 | } 53 | 54 | 如果你不知道 grep 为何物,可以在命令行里运行 'man grep' 一下。 55 | 56 | ## 搜索替换 ## 57 | 58 | 这个工具是上面 grep 工具的升级版。它在搜索匹配一个模式的同时会用其它内容替换掉匹配到的内容。显然我们是在对上面已有的 grep 版本基础上进行一些二次加工。 59 | 60 | 用法: ./replacer old new filename 61 | 62 | 63 | package main 64 | 65 | import ( 66 | "flag" 67 | "regexp" 68 | "bufio" 69 | "fmt" 70 | "os" 71 | ) 72 | 73 | func replace(re, repl, filename string) { 74 | regex, err := regexp.Compile(re) 75 | if err != nil { 76 | return // there was a problem with the regular expression. 77 | } 78 | 79 | fh, err := os.Open(filename) 80 | f := bufio.NewReader(fh) 81 | 82 | if err != nil { 83 | return // there was a problem opening the file. 84 | } 85 | defer fh.Close() 86 | 87 | buf := make([]byte, 1024) 88 | for { 89 | buf, _ , err = f.ReadLine() 90 | if err != nil { 91 | return 92 | } 93 | 94 | s := string(buf) 95 | result := regex.ReplaceAllString(s, repl) 96 | fmt.Print(result + "\n") 97 | } 98 | } 99 | 100 | func main() { 101 | flag.Parse() 102 | if flag.NArg() == 3 { 103 | repl(flag.Arg(0), flag.Arg(1), flag.Arg(2)) 104 | } else { 105 | fmt.Printf("Wrong number of arguments.\n") 106 | } 107 | } 108 | 109 | ## 验证电子邮件地址 ## 110 | 111 | RFC2822 对于电子邮件的格式定义的过于宽松,以至于很难用简单的正则表达式验证一个邮件地址是否合规。很有趣啊。大多数情况下尽管你的程序会对邮件地址做一些预设,但是我发现下面这条正则对所有的情况都是实地有效的: 112 | 113 | (\w[-._\w]*\w@\w[-._\w]*\w\.\w{2,3}) 114 | 115 | 邮件地址必须以一个字符 \w 开头,接下来是任何数量的包含了破折号、英文句点以及下划线在内的字符。同时,在 @ 之前的最后一个字符必须又是一个“正常的”字符才行。对于域名部分我们也是同样的规则,但域名的后缀部分必须只由2到3个字符组成。这个规则基本可以覆盖大多数的情况。如果你碰到一个和这个正则不匹配的邮件地址,那很可能是故意拼凑起来逗你玩儿的,忽略即可。 116 | 117 | -------------------------------------------------------------------------------- /01-chapter3.markdown: -------------------------------------------------------------------------------- 1 | # Part 3: Cookbook # 2 | 3 | ## grep ## 4 | 5 | The grep-tool searches for (regular) expressions in text files. Every single line is read and if the line matches the pattern provided on the command line, that line is printed. 6 | 7 | ```go 8 | package main 9 | 10 | import ( 11 | "flag" 12 | "regexp" 13 | "bufio" 14 | "fmt" 15 | "os" 16 | ) 17 | 18 | func grep(re, filename string) { 19 | regex, err := regexp.Compile(re) 20 | if err != nil { 21 | return // there was a problem with the regular expression. 22 | } 23 | 24 | fh, err := os.Open(filename) 25 | f := bufio.NewReader(fh) 26 | 27 | if err != nil { 28 | return // there was a problem opening the file. 29 | } 30 | defer fh.Close() 31 | 32 | buf := make([]byte, 1024) 33 | for { 34 | buf, _ , err = f.ReadLine() 35 | if err != nil { 36 | return 37 | } 38 | 39 | s := string(buf) 40 | if regex.MatchString(s) { 41 | fmt.Printf("%s\n", string(buf)) 42 | } 43 | } 44 | } 45 | 46 | func main() { 47 | flag.Parse() 48 | if flag.NArg() == 2 { 49 | grep(flag.Arg(0), flag.Arg(1)) 50 | } else { 51 | fmt.Printf("Wrong number of arguments.\n") 52 | } 53 | } 54 | ``` 55 | 56 | If you don't know what grep does, search 'man grep'. 57 | 58 | 59 | ## Search and Replace ## 60 | 61 | This tool is an improved version of grep. It does not only search for a pattern, but also replaces the pattern with something else. We will obviously want to build on the existing grep solution. 62 | 63 | Usage: ./replacer old new filename 64 | 65 | ```go 66 | package main 67 | 68 | import ( 69 | "flag" 70 | "regexp" 71 | "bufio" 72 | "fmt" 73 | "os" 74 | ) 75 | 76 | func replace(re, repl, filename string) { 77 | regex, err := regexp.Compile(re) 78 | if err != nil { 79 | return // there was a problem with the regular expression. 80 | } 81 | 82 | fh, err := os.Open(filename) 83 | f := bufio.NewReader(fh) 84 | 85 | if err != nil { 86 | return // there was a problem opening the file. 87 | } 88 | defer fh.Close() 89 | 90 | buf := make([]byte, 1024) 91 | for { 92 | buf, _ , err = f.ReadLine() 93 | if err != nil { 94 | return 95 | } 96 | 97 | s := string(buf) 98 | result := regex.ReplaceAllString(s, repl) 99 | fmt.Print(result + "\n") 100 | } 101 | } 102 | 103 | func main() { 104 | flag.Parse() 105 | if flag.NArg() == 3 { 106 | repl(flag.Arg(0), flag.Arg(1), flag.Arg(2)) 107 | } else { 108 | fmt.Printf("Wrong number of arguments.\n") 109 | } 110 | } 111 | ``` 112 | 113 | ## Verifying an email-address ## 114 | 115 | Interestingly the RFC 2822 which defines the format of email-addresses is pretty permissive. 116 | That makes it hard to come up with a simple regular expression that matches a valid 117 | email address. In most cases though your 118 | application can make some assumptions about addresses and I found this one sufficient for 119 | all practical purposes: 120 | 121 | (\w[-._\w]*\w@\w[-._\w]*\w\.\w{2,3}) 122 | 123 | It must start with a character of the \w class. Then we can have any number of characters 124 | including the hyphen, the '.' and the underscore. We want the last character before the @ to 125 | be a 'regular' character again. We repeat the same pattern for the domain, only that the 126 | suffix (part behind the last dot) can be only 2 or 3 characters. This will cover most cases. 127 | If you come across an email address that does not match this regexp it has probably deliberately 128 | been setup to annoy you and you can therefore ignore it. 129 | 130 | -------------------------------------------------------------------------------- /01-chapter4.markdown: -------------------------------------------------------------------------------- 1 | # Part 4: Alternatives # 2 | 3 | ## Splitting up a line into tokens ## 4 | 5 | If literal strings separate the fields in your input you don't need to use regexps. 6 | 7 | ```go 8 | s := "abc,def,ghi" 9 | r, err := regexp.Compile(`[^,]+`) // everything that is not a comma 10 | res := r.FindAllString(s, -1) 11 | // Prints [abc def ghi] 12 | fmt.Printf("%v", res) 13 | ``` 14 | 15 | The *Split*-function in the *strings*-package serves the same purpose and the syntax is more readable 16 | 17 | ```go 18 | s := "abc,def,ghi" 19 | res:= strings.Split(s, ",") 20 | // Prints [abc def ghi] 21 | fmt.Printf("%v", res) 22 | ``` 23 | 24 | As a convenience the Standard library also provides the *Fields* function in the strings-package, 25 | that splits a string at white space: 26 | 27 | ```go 28 | fmt.Printf("Fields are: %q", strings.Fields(" Frodo Thorin Dwalin ")) 29 | ``` 30 | 31 | yields: 32 | 33 | Fields are: ["Frodo" "Thorin" "Dwalin"] 34 | 35 | You can even provide a more sophisticated function the variant *FieldsFunc*. It takes 36 | your string and a function as parameter. The function must accept a rune as a parameter. 37 | 38 | ## FieldsFunc-Example 39 | 40 | Suppose you want to process comma separated values (good ol' CSV). The naive implementation 41 | with *Split* would work in most cases, but sometimes you have commas embedded in a single field. 42 | Typically the user then uses quotes to protect that field (and thus the comma inside) from being split. 43 | 44 | This example uses a global boolean (boo!) to keep track of quoting (obviously there is more than one 45 | way to break this), but it works for simple cases. 46 | 47 | ```go 48 | package main 49 | import ( 50 | "fmt" 51 | "strings" 52 | ) 53 | var inQuotes = false 54 | func main() { 55 | s := " 1 , 4, \" xx,yy \", 5 " 56 | f := func(c rune) bool { 57 | if c== '"' { 58 | inQuotes = !inQuotes 59 | } 60 | if inQuotes == false && c == ',' { 61 | return true 62 | } 63 | return false 64 | } 65 | for k, v := range strings.FieldsFunc(s, f) { 66 | fmt.Printf ("%v: %v\n", k, v) 67 | } 68 | } 69 | ``` 70 | 71 | Prints: 72 | 73 | 0: 1 74 | 1: 4 75 | 2: " xx,yy " 76 | 3: 5 77 | 78 | As an exercise you might want to delete the quotes (Hint: *Trim* is your friend.) 79 | 80 | 81 | ## Testing if a specific substring exists in your string ## 82 | 83 | The *MatchString*-function allows you to find a literal string in another string. 84 | 85 | ```go 86 | s := "OttoFritzHermanWaldoKarlSiegfried" 87 | r, err := regexp.Compile(`Waldo`) 88 | res := r.MatchString(s) 89 | // Prints true 90 | fmt.Printf("%v", res) 91 | ``` 92 | 93 | But you can avoid the regexp if you use the *strings.Index*-function to retrieve the index of your substring in the string. Index returns -1 if the substring is not present. 94 | 95 | ```go 96 | s := "OttoFritzHermanWaldoKarlSiegfried" 97 | res:= strings.Index(s, "Waldo") 98 | // Prints true 99 | fmt.Printf("%v", res != -1) 100 | ``` 101 | 102 | ## Removing Spaces 103 | 104 | Whenever you are reading text from a file or from the user you probably want to discard spaces at the beginnning and at the end of the line. 105 | 106 | You could use regexps to accomplish that: 107 | 108 | ```go 109 | s := " Institute of Experimental Computer Science " 110 | r, err := regexp.Compile(`\s*(.*)\s*`) 111 | res:= r.FindStringSubmatch(s) 112 | // 113 | fmt.Printf("<%v>", res[1]) 114 | ``` 115 | 116 | The first attempt to remove the spaces failed - only the spaces at the head of the string have been removed, the next piece .* was greedy and captured the rest - but I don't bother to figure out the correct regexp for this task, because I know *strings.TrimSpace*. 117 | 118 | ```go 119 | s := " Institute of Experimental Computer Science " 120 | // 121 | fmt.Printf("<%v>", strings.TrimSpace(s)) 122 | ``` 123 | 124 | TrimSpace removes the spaces at the beginning and the end; lookup the documentation of the strings package, there are a few more functions in the Trim-family. 125 | -------------------------------------------------------------------------------- /zh/01-chapter2.markdown: -------------------------------------------------------------------------------- 1 | # 第二部分:高级用法 # 2 | 3 | ## 捕获分组 ## 4 | 5 | 有时你用正则匹配一个字符串,但其实只是想留意之中的某一小段内容。而在前一章我们一直都停留在匹配到的*整个的*字符串上。 6 | 7 | //[[cat] [sat] [mat]] 8 | re, err := regexp.Compile(`.at`) 9 | res := re.FindAllStringSubmatch("The cat sat on the mat.", -1) 10 | fmt.Printf("%v", res) 11 | 12 | 你可以使用括号来捕捉你真正需要的那部分,而不是整个正则匹配到的全部内容。 13 | 14 | //[[cat c] [sat s] [mat m]] 15 | re, err := regexp.Compile(`(.)at`) // want to know what is in front of 'at' 16 | res := re.FindAllStringSubmatch("The cat sat on the mat.", -1) 17 | fmt.Printf("%v", res) 18 | 19 | 你可以有多个捕获分组。 20 | 21 | // 结果是 [[ex e x] [ec e c] [e e ]] 22 | s := "Nobody expects the Spanish inquisition." 23 | re1, err := regexp.Compile(`(e)(.)`) // Prepare our regex 24 | result_slice := re1.FindAllStringSubmatch(s, -1) 25 | fmt.Printf("%v", result_slice) 26 | 27 | *FindAllStringSubmatch* 这个方法对每一个捕获都返回一个数组,其中第一个元素是整个的匹配结果,接下来的元素是每个匹配到的分组的结果。最后每一个这样的数组再全部包进一个外层的数组里。 28 | 29 | 如果你有一个可选的捕获分组在一个字符串中没有出现,结果数组里会包含一个空的字符串的壳儿。换句话说,结果数组的元素数量总是分组数量再加上一。 30 | 31 | s := "Mr. Leonard Spock" 32 | re1, err := regexp.Compile(`(Mr)(s)?\. (\w+) (\w+)`) 33 | result:= re1.FindStringSubmatch(s) 34 | 35 | for k, v := range result { 36 | fmt.Printf("%d. %s\n", k, v) 37 | } 38 | // Prints 39 | // 0. Mr. Leonard Spock 40 | // 1. Mr 41 | // 2. 42 | // 3. Leonard 43 | // 4. Spock 44 | 45 | 你不能把捕获分组进行部分叠加。比如下面的例子我们想让第一个正则匹配 'expects the',另外一个匹配 'the Spanish',这里括号要分开用才行。 46 | 助记法:最后开始的,最先闭合。这里的在 'the' 之前开启的括号要在其之后是闭合的。 47 | 48 | s := "Nobody expects the Spanish inquisition." 49 | re1, err := regexp.Compile(`(expects (...) Spanish)`) 50 | // Wanted regex1 -------------- 51 | // Wanted regex2 -------------- 52 | result:= re1.FindStringSubmatch(s) 53 | 54 | for k, v := range result { 55 | fmt.Printf("%d. %s\n", k, v) 56 | } 57 | // 0. expects the Spanish 58 | // 1. expects the Spanish 59 | // 2. the 60 | 61 | *FindStringSubmatchIndex* 函数... 62 | 63 | ## 命名捕获 ## 64 | 65 | 仅仅把匹配到的内容存入数组中的序列里会略显不便,会出现两个问题。 66 | 67 | 首先,当你在正则的某处插入一个新的分组时,在其后的分组在结果数组中的索引值肯定会增加。这是件麻烦事儿。 68 | 69 | 其次,正则本身也许是在运行时拼成的,这可能会包含很多超出我们控制的括号。也就是说我们不知道我们精心拼成的括号匹配到的内容的索引是多少。 70 | 71 | 为了解决这个问题,_named matches_ 应运而生。允许给匹配的内容取一个符号化的名称用来到匹配的结果中进行查询。 72 | 73 | re := regexp.MustCompile("(?P.)(?P.*)(?P.)") 74 | n1 := re.SubexpNames() 75 | r2 := re.FindAllStringSubmatch("Super", -1)[0] 76 | 77 | md := map[string]string{} 78 | for i, n := range r2 { 79 | fmt.Printf("%d. match='%s'\tname='%s'\n", i, n, n1[i]) 80 | md[n1[i]] = n 81 | } 82 | fmt.Printf("The names are : %v\n", n1) 83 | fmt.Printf("The matches are: %v\n", r2) 84 | fmt.Printf("The first character is %s\n", md["first_char"]) 85 | fmt.Printf("The last character is %s\n", md["last_char"]) 86 | 87 | 在该例中字符串 'Super' 使用一个由三部分组成的正则进行匹配: 88 | 89 | 一个单字符(.),命名为 'first_char'。 90 | 91 | 一个中间由一串字符组成的部分,命名为 'middle_part'。 92 | 93 | 一个结尾的字符(.),因此命名为 'last_char'。 94 | 95 | 为了简化匹配结果的使用,我们把所有的捕获命名都存在 n1 中,然后和匹配的结果 r2 一一对应后存储到一个新的叫 _md_ 的 map,其中匹配结果是作为捕获命名的值。 96 | 97 | 注意整个字符串 'Super' 这个值用的是空字符这样一个伪键。 98 | 99 | 该例子会打印出: 100 | 101 | 0. match='Super' name='' 102 | 1. match='S' name='first_char' 103 | 2. match='upe' name='middle_part' 104 | 3. match='r' name='last_char' 105 | The names are : [ first_char middle_part last_char] 106 | The matches are: [Super S upe r] 107 | The first character is S 108 | The last character is r 109 | 110 | # 重复:高级篇 # 111 | 112 | ## 非匹配捕获/分组重复 ## 113 | 114 | 如果一个复杂的正则表达式有多个分组,你可能会碰到使用括号进行分组但是对捕获到的内容并不需要关心的情况。这时你可以使用 (?:regex) 这样一个“非捕获分组”的方式丢弃一组匹配到的内容。问号加上冒号会告诉编译器用这个模式匹配但是不要作保存。 115 | 116 | 不包括非捕获分组: 117 | 118 | s := "Mrs. Leonora Spock" 119 | re1, err := regexp.Compile(`Mr(s)?\. (\w+) (\w+)`) 120 | result:= re1.FindStringSubmatch(s) 121 | for k, v := range result { 122 | fmt.Printf("%d. %s\n", k, v) 123 | } 124 | // 0. Mrs. Leonora Spock 125 | // 1. s 126 | // 2. Leonora 127 | // 3. Spock 128 | 129 | 带有一个非捕获分组: 130 | 131 | s := "Mrs. Leonora Spock" 132 | re1, err := regexp.Compile(`Mr(?:s)?\. (\w+) (\w+)`) 133 | result:= re1.FindStringSubmatch(s) 134 | for k, v := range result { 135 | fmt.Printf("%d. %s\n", k, v) 136 | } 137 | // 0. Mrs. Leonora Spock 138 | // 1. Leonora 139 | // 2. Spock 140 | 141 | ## 到底是多少? ## 142 | 143 | 你可能非常清楚需要重复的具体次数。当你知道一个正则中你需要的部分有具体多少个实例的时候我们就需要 {}。 144 | 145 | s := "11110010101111100101001001110101" 146 | re1, err := regexp.Compile(`1{4}`) 147 | res := re1.FindAllStringSubmatch(s,-1) 148 | fmt.Printf("<%v>", res) 149 | // <[[1111] [1111]]> 150 | 151 | res2 := re1.FindAllStringIndex(s,-1) 152 | fmt.Printf("<%v>", res2) 153 | // <[[0 4] [10 14]]> 154 | 155 | {} 的语法并不是很常用。其中一个原因是很多或是也许所有的情形下你都会通过简单地重复写出这些重复的部分来修改正则表达式。[但是假如重复的数量是120次的话,我觉得你应该就不愿意这么写了吧] 仅当你有非常明确的需求(比如 {123,130})时你才会想使用 {}。 156 | 157 | (ab){3} == (ababab) 158 | (ab){3,4} == (ababab(ab)??) 159 | 160 | > 注:?? 表示 “零个或是一个,更倾向零个”。 161 | 162 | {} 的通用模式是 x{n,m}。这里 'n' 是 x 出现的最小数量,'m' 是出现的最大数量。 163 | 164 | Go-regexp 包支持 {} 家族中略多一些的模式。 165 | 166 | # 标志项 # 167 | 168 | regexp 包有如下的标志项可用 [引自文档]: 169 | 170 | * i 不区分大小写 (默认区分) 171 | * m 多行模式: ^ 和 $ 匹配整个文本的开头/结尾的同时也匹配每行的开头和结尾(默认不匹配) 172 | * s 让 . 匹配 \n (默认不匹配) 173 | * U 非贪婪:对 x* 和 x*?, x+ 和 x+? 等模式进行切换(默认是关闭的) 174 | 175 | 标志项的语法是 xyz(设置)或 -xyz(清除)或是 xy-z(设置 xy,清除 z)。 176 | 177 | ## 区分大小写 ## 178 | 179 | 也许你已经知道有些字符存在两种形式:大写和小写。[ 你也许会说:“我当然知道这个,大家都知道!” 好吧,如果你觉得这问题有点儿吹毛求疵那你看下这些特例的大小写问题:a, $, 本, ß, Ω。好了,我们别把问题复杂化了,还是先只考虑英语的情况吧。] 180 | 181 | 如果你明确地想忽略大小写的情况,或者说你想在一个正则或是其中的一部分允许大小写,那就使用 'i' 标志符。 182 | 183 | s := "Never say never." 184 | r, err := regexp.Compile(`(?i)^n`) // 是否是以 'N' 或 'n' 开头? 185 | fmt.Printf("%v", r.MatchString(s)) // true, 不区分大小写 186 | 187 | 在现实世界中我们很少会去匹配一个不区分大小写的正则。通常我们都倾向于先把整个字符串转换成大写或者小写,然后再去只匹配这一种情形: 188 | 189 | sMixed := "Never say never." 190 | sLower := strings.ToLower(sMixed) // 不要忘记 import "strings" 包 191 | r, err := regexp.Compile(`^n`) 192 | fmt.Printf("%v ", r.MatchString(sMixed)) // false, N != n 193 | fmt.Printf("%v ", r.MatchString(sLower)) // true, n == n 194 | 195 | ## 贪婪匹配 vs 非贪婪匹配 ## 196 | 197 | 如前所见,正则表达式可能包含重复的部分。在大多情况下,对于给定的字符串会有不止一种可行方案的正则。 198 | 199 | 比如,使用正则 '.*' (包括单引号部分),对下面的字符串匹配的结果是怎样的? 200 | 201 | 'abc','def','ghi' 202 | 203 | 你可能只是想取到 *'abc'* 部分,但是却非如此。正则表达式默认情况下是 _贪婪的_。它们在能匹配的情况下会尽可能多的去取字符。所以这里答案是 *'abc','def','ghi'*,因为中间部分的引号也是和 "." 匹配的!如下: 204 | 205 | r, err := regexp.Compile(`'.*'`) 206 | res := r.FindString(" 'abc','def','ghi' ") 207 | fmt.Printf("<%v>", res) 208 | // Will print: <'abc','def','ghi'> 209 | 210 | 如果想确认进行最短可能匹配(即非贪婪),你要在正则表达式后面加上特殊符合 '?'。 211 | 212 | r, err := regexp.Compile(`'.*?'`) 213 | res := r.FindString(" 'abc','def','ghi' ") 214 | fmt.Printf("<%v>", res) 215 | // Will print: <'abc'> 216 | 217 | 没有捷径可以让你写一条匹配 'abc','def' 的这样的正则。 218 | 219 | 你可以使用 U 这个标志项把正则表达式的行为恢复到默认非贪婪的模式。 220 | 221 | r, err := regexp.Compile(`(?U)'.*'`) 222 | res := r.FindString(" 'abc','def','ghi' ") 223 | fmt.Printf("<%v>", res) 224 | // Will print: <'abc'> 225 | 226 | 在你的正则里你可以前后相继地在这两个行为之间进行切换。 227 | 228 | ## 点号是否匹配换行符? ## 229 | 230 | 当我们有一个多行字符串(也就是包含换行符 '\n' 的字符串)你可以使用 (?s) 标志符控制是否让 '.' 匹配 231 | 换行符。默认是不匹配。哪位能贡献一个更合理的用例吗? 232 | 233 | r, err := regexp.Compile(`a.`) 234 | s := "atlanta\narkansas\nalabama\narachnophobia" 235 | res := r.FindAllString(s, -1) 236 | fmt.Printf("<%v>", res) 237 | // <[at an ar an as al ab am ar ac]> 238 | 239 | 这时如果使用 (?s) 标志符,换行符就会在结果中保留。 240 | 241 | r, err := regexp.Compile(`(?s)a.`) 242 | s := "atlanta\narkansas\nalabama\narachnophobia" 243 | res := r.FindAllString(s, -1) 244 | fmt.Printf("<%v>", res) 245 | // Prints 246 | // <[at an a 247 | // ar an as al ab am a 248 | // ar ac]> 249 | 250 | ## 要不要 ^/$ 匹配换行符? ## 251 | 252 | 对于多行文本,你可以通过'(?m)' 这个标志符来控制 '^' 或者 '$' 是否匹配换行符。默认是不匹配。('^' 表示行的起始符 BOL=Begin-of-line, '$' 表示行的结尾符 EOL=End-of-line) 253 | 254 | r, err1 := regexp.Compile(`a$`) // without flag 255 | s := "atlanta\narkansas\nalabama\narachnophobia" 256 | // 01234567 890123456 78901234 5678901234567 257 | // - 258 | res := r.FindAllStringIndex(s,-1) 259 | fmt.Printf("<%v>\n", res) 260 | // 1 match 261 | // <[[37 38]]> 262 | 263 | t, err2 := regexp.Compile(`(?m)a$`) // with flag 264 | u := "atlanta\narkansas\nalabama\narachnophobia" 265 | // 01234567 890123456 78901234 5678901234567 266 | // -- -- - 267 | res2 := t.FindAllStringIndex(u,-1) 268 | fmt.Printf("<%v>", res2) 269 | // 3 matches 270 | // <[[6 7] [23 24] [37 38]]> 271 | 272 | -------------------------------------------------------------------------------- /01-chapter2.markdown: -------------------------------------------------------------------------------- 1 | # Part 2: Advanced # 2 | 3 | ## Groups ## 4 | 5 | Sometimes you want to match against a string, but want to peek at a particular slice only. In the previous chapter we always looked at the *entire* matching string. 6 | 7 | ```go 8 | //[[cat] [sat] [mat]] 9 | re := regexp.MustCompile(`.at`) 10 | res := re.FindAllStringSubmatch("The cat sat on the mat.", -1) 11 | fmt.Printf("%v", res) 12 | ``` 13 | 14 | Parentheses allow to capture that piece of the string that you are actually interested in, instead of the entire regex. 15 | 16 | ```go 17 | //[[cat c] [sat s] [mat m]] 18 | re := regexp.MustCompile(`(.)at`) // want to know what is in front of 'at' 19 | res := re.FindAllStringSubmatch("The cat sat on the mat.", -1) 20 | fmt.Printf("%v", res) 21 | ``` 22 | 23 | You can have more than one group. 24 | 25 | ```go 26 | // Prints [[ex e x] [ec e c] [e e ]] 27 | s := "Nobody expects the Spanish inquisition." 28 | re1 := regexp.MustCompile(`(e)(.)`) // Prepare our regex 29 | result_slice := re1.FindAllStringSubmatch(s, -1) 30 | fmt.Printf("%v", result_slice) 31 | ``` 32 | 33 | The *FindAllStringSubmatch*-function will, for each match, return an array with the entire match in the first field and the content of the groups in the remaining fields. The arrays for all the matches are then captured in a container array. 34 | 35 | If you have an optional group that does not appear in the string, the resulting array will have an empty string in its cell, in other words, the number of fields in the resulting array always matches the number of groups plus one. 36 | 37 | ```go 38 | s := "Mr. Leonard Spock" 39 | re1 := regexp.MustCompile(`(Mr)(s)?\. (\w+) (\w+)`) 40 | result:= re1.FindStringSubmatch(s) 41 | 42 | for k, v := range result { 43 | fmt.Printf("%d. %s\n", k, v) 44 | } 45 | // Prints 46 | // 0. Mr. Leonard Spock 47 | // 1. Mr 48 | // 2. 49 | // 3. Leonard 50 | // 4. Spock 51 | ``` 52 | 53 | You cannot have partially overlapping groups. If we wanted the first regexp to match 'expects the' and the other to match 'the Spanish', the parentheses would be interpreted differently. Motto is: Last opened, first closed. The parentheses that is opened for 'the' is closed after 'the'. 54 | 55 | ```go 56 | s := "Nobody expects the Spanish inquisition." 57 | re1 := regexp.MustCompile(`(expects (...) Spanish)`) 58 | // Wanted regex1 -------------- 59 | // Wanted regex2 -------------- 60 | result:= re1.FindStringSubmatch(s) 61 | 62 | for k, v := range result { 63 | fmt.Printf("%d. %s\n", k, v) 64 | } 65 | // 0. expects the Spanish 66 | // 1. expects the Spanish 67 | // 2. the 68 | ``` 69 | 70 | The *FindStringSubmatchIndex*-function ... 71 | 72 | ## Named matches ## 73 | 74 | It is somewhat awkward that the matches are simply stored in sequence in arrays. Two different kinds of problems arise. 75 | 76 | First, when you insert a new group somewhere in your regular expression all the array indexes in the following matches must be incremented. That's a nuisance. 77 | 78 | Second, the string might be constructed at runtime and may contain a number of parentheses that is beyond our control. That means that we don't know at which index our nicely constructed parentheses match. 79 | 80 | To resolve this issue _named matches_ were introduced. They allow to give a symbolic name to the match that can be used to look up the result. 81 | 82 | ```go 83 | re := regexp.MustCompile("(?P.)(?P.*)(?P.)") 84 | n1 := re.SubexpNames() 85 | r2 := re.FindAllStringSubmatch("Super", -1)[0] 86 | 87 | md := map[string]string{} 88 | for i, n := range r2 { 89 | fmt.Printf("%d. match='%s'\tname='%s'\n", i, n, n1[i]) 90 | md[n1[i]] = n 91 | } 92 | fmt.Printf("The names are : %v\n", n1) 93 | fmt.Printf("The matches are: %v\n", r2) 94 | fmt.Printf("The first character is %s\n", md["first_char"]) 95 | fmt.Printf("The last character is %s\n", md["last_char"]) 96 | ``` 97 | 98 | In this example the string 'Super' is matched against a regexp that has three parts: 99 | 100 | A single character (.) which is named 'first_char'. 101 | 102 | A middle part composed of a sequence of characters, named 'middle_part' 103 | 104 | A last character (.), consequently named 'last_char'. 105 | 106 | To simplify the usage of the results, we store all the names in n1 and zip them together with the match result r2 into a new map in which we store the results as values for the named variables in a map named _md_. 107 | 108 | Note that the entire string 'Super' has the empty-string as a pseudo-key. 109 | 110 | The sample prints 111 | 112 | 0. match='Super' name='' 113 | 1. match='S' name='first_char' 114 | 2. match='upe' name='middle_part' 115 | 3. match='r' name='last_char' 116 | The names are : [ first_char middle_part last_char] 117 | The matches are: [Super S upe r] 118 | The first character is S 119 | The last character is r 120 | 121 | 122 | # Advanced Repetition # 123 | 124 | ## Non-matching capture/group repetition # 125 | 126 | If a complex regular expressions has several groups you might arrive at a situation where we use parentheses for grouping but are not the least interested in the captured string. To discard the match of a group you can make it a 'non-capturing group' with (?:regex). The question mark and colon tell the compiler to use the pattern for matching but not to store it. 127 | 128 | Without a non-capturing group: 129 | 130 | ```go 131 | s := "Mrs. Leonora Spock" 132 | re1 := regexp.MustCompile(`Mr(s)?\. (\w+) (\w+)`) 133 | result:= re1.FindStringSubmatch(s) 134 | for k, v := range result { 135 | fmt.Printf("%d. %s\n", k, v) 136 | } 137 | // 0. Mrs. Leonora Spock 138 | // 1. s 139 | // 2. Leonora 140 | // 3. Spock 141 | ``` 142 | 143 | With a non-capturing group: 144 | 145 | ```go 146 | s := "Mrs. Leonora Spock" 147 | re1 := regexp.MustCompile(`Mr(?:s)?\. (\w+) (\w+)`) 148 | result:= re1.FindStringSubmatch(s) 149 | for k, v := range result { 150 | fmt.Printf("%d. %s\n", k, v) 151 | } 152 | // 0. Mrs. Leonora Spock 153 | // 1. Leonora 154 | // 2. Spock 155 | ``` 156 | 157 | ## How many exactly? ## 158 | 159 | The number of required repetitions might be well known. If you know how many instances you need of parts of your regexp we will need {}. 160 | 161 | ```go 162 | s := "11110010101111100101001001110101" 163 | re1 := regexp.MustCompile(`1{4}`) 164 | res := re1.FindAllStringSubmatch(s,-1) 165 | fmt.Printf("<%v>", res) 166 | // <[[1111] [1111]]> 167 | 168 | res2 := re1.FindAllStringIndex(s,-1) 169 | fmt.Printf("<%v>", res2) 170 | // <[[0 4] [10 14]]> 171 | ``` 172 | 173 | The {} syntax is rarely used. One of the reasons being that in many (all?) cases you can rewrite the regexp by simply writing out the number of repetitions literally. [I can see that you might not want to do that for, say, 120.] Only when you have very specific requirements (like {123,130}) you will want to use {}. 174 | 175 | ```go 176 | (ab){3} == (ababab) 177 | (ab){3,4} == (ababab(ab)??) 178 | ``` 179 | 180 | > Side note: ?? stands for "zero or one x, prefer zero". 181 | 182 | The general pattern for {} is x{n,m}. 'n' is the minimum number of occurrences and 'm' is the maximum number of occurences. 183 | 184 | The Go-regexp package supports a few more patterns in the {} familiy. 185 | 186 | # Flags # 187 | 188 | The regexp package knows the following flags [quote from documentation]: 189 | 190 | * i case-insensitive (default false) 191 | * m multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false) 192 | * s let . match \n (default false) 193 | * U ungreedy: swap meaning of x* and x*?, x+ and x+?, etc (default false) 194 | 195 | Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z). 196 | 197 | ## Case sensitivity ## 198 | 199 | You might already know that some characters exist in two cases: Upper and lower. [Now you might say: "Of course I know that, everybody knows that!" Well, if you think that this is trivial then consider the upper/lowercase question is these few cases: a, $, 本, ß, Ω. Ok, let's not complicate things and only consider English.] 200 | 201 | If you explicitly want to ignore the case, in other words, if you want to permit both cases for a regexp or a part of it, you use the 'i' flag. 202 | 203 | ```go 204 | s := "Never say never." 205 | r := regexp.MustCompile(`(?i)^n`) // Do we have an 'N' or 'n' at the beginning? 206 | fmt.Printf("%v", r.MatchString(s)) // true, case insensitive 207 | ``` 208 | 209 | Matching against a case-insensitive regexp is rarely done is the real world. Usually we prefer to convert the entire string to either upper or lower case in the first place and then match only against that case: 210 | 211 | ```go 212 | sMixed := "Never say never." 213 | sLower := strings.ToLower(sMixed) // don't forget to import "strings" 214 | r := regexp.MustCompile(`^n`) 215 | fmt.Printf("%v ", r.MatchString(sMixed)) // false, N != n 216 | fmt.Printf("%v ", r.MatchString(sLower)) // true, n == n 217 | ``` 218 | 219 | ## Greedy vs. Non-Greedy ## 220 | 221 | As we saw before, regular expressions may contain repetition symbols. In some cases, there is actually more than one solution possible for a regexp to match a given string. 222 | 223 | E.g. given the regexp '.*' (including the quotes), how would this match against: 224 | 225 | 'abc','def','ghi' 226 | 227 | You are probably expecting to retrieve *'abc'*. Not so. By default, regular expressions are _greedy_. They will take as many characters as possible to match the regexp. Thus the answer is *'abc','def','ghi'*, because the quotes in between also match the dot "."! Like here: 228 | 229 | ```go 230 | r := regexp.MustCompile(`'.*'`) 231 | res := r.FindString(" 'abc','def','ghi' ") 232 | fmt.Printf("<%v>", res) 233 | // Will print: <'abc','def','ghi'> 234 | ``` 235 | 236 | To identify the shortest possible match (=non-greedy) you add the special character '?' to your regular expression. 237 | 238 | ```go 239 | r := regexp.MustCompile(`'.*?'`) 240 | res := r.FindString(" 'abc','def','ghi' ") 241 | fmt.Printf("<%v>", res) 242 | // Will print: <'abc'> 243 | ``` 244 | 245 | There is no easy way that would allow you to specify a regexp that would match 'abc','def'. 246 | 247 | You can revert the behavior of the regular expression to make being non-greedy the default with the flag U 248 | 249 | ```go 250 | r := regexp.MustCompile(`(?U)'.*'`) 251 | res := r.FindString(" 'abc','def','ghi' ") 252 | fmt.Printf("<%v>", res) 253 | // Will print: <'abc'> 254 | ``` 255 | 256 | It is possible to switch back and forth between the two behaviors inside your regexp. (FIXME Example) 257 | 258 | ## Shall Dot Match Newline? ## 259 | 260 | When we have a multiline string (=a string that contains newlines '\n') you can control 261 | if the '.' matches against the newline character using the (?s) flag. Default is false. Could someone please provide a sensible use-case? 262 | 263 | ```go 264 | r := regexp.MustCompile(`a.`) 265 | s := "atlanta\narkansas\nalabama\narachnophobia" 266 | res := r.FindAllString(s, -1) 267 | fmt.Printf("<%v>", res) 268 | // <[at an ar an as al ab am ar ac]> 269 | ``` 270 | 271 | Now using the the (?s) flag, the newline is kept in the result. 272 | 273 | ```go 274 | r := regexp.MustCompile(`(?s)a.`) 275 | s := "atlanta\narkansas\nalabama\narachnophobia" 276 | res := r.FindAllString(s, -1) 277 | fmt.Printf("<%v>", res) 278 | // Prints 279 | // <[at an a 280 | // ar an as al ab am a 281 | // ar ac]> 282 | ``` 283 | 284 | Clear out multi-line comments in css file 285 | 286 | ```go 287 | s := `/* multi line 288 | comment with 289 | url("http://commented1.test.com/img.jpg") */ 290 | body { 291 | background: #ffffff url("actual1.png") no-repeat right top; 292 | } 293 | /* single line commented out url("http://commented2.test.com/img.jpg") *//* back to back comment */ 294 | .test-img { 295 | background-image: url("http://test.com/actual2.png"); 296 | }` 297 | 298 | re := regexp.MustCompile(`(?s)(?:/\*.*?\*/)?((?:[^/]|/[^*])*)`) 299 | results := re.FindAllStringSubmatch(s, -1) 300 | for _, v := range results { 301 | if v[1] != "" { 302 | fmt.Printf("%s", v[1]) 303 | } 304 | } 305 | // Prints 306 | // body { 307 | // background: #ffffff url("actual1.png") no-repeat right top; 308 | // } 309 | // 310 | // .test-img { 311 | // background-image: url("http://test.com/actual2.png"); 312 | // } 313 | ``` 314 | 315 | ## Shall ^/$ Match at a Newline? ## 316 | 317 | When we have a multiline string you can control 318 | if the '^' (BOL=Begin-of-line) or '$' (EOL=End-of-line) matches *at* the newline character with the flag '(?m)'. Default is false. 319 | 320 | ```go 321 | r := regexp.MustCompile(`a$`) // without flag 322 | s := "atlanta\narkansas\nalabama\narachnophobia" 323 | // 01234567 890123456 78901234 5678901234567 324 | // - 325 | res := r.FindAllStringIndex(s,-1) 326 | fmt.Printf("<%v>\n", res) 327 | // 1 match 328 | // <[[37 38]]> 329 | 330 | t := regexp.MustCompile(`(?m)a$`) // with flag 331 | u := "atlanta\narkansas\nalabama\narachnophobia" 332 | // 01234567 890123456 78901234 5678901234567 333 | // -- -- - 334 | res2 := t.FindAllStringIndex(u,-1) 335 | fmt.Printf("<%v>", res2) 336 | // 3 matches 337 | // <[[6 7] [23 24] [37 38]]> 338 | ``` 339 | 340 | 341 | -------------------------------------------------------------------------------- /zh/01-chapter1.markdown: -------------------------------------------------------------------------------- 1 | # 第一部分:基础知识 # 2 | 3 | ## 简单匹配 ## 4 | 5 | 你想知道一个字符串和一个正则表达式是否匹配。如果字符串参数与用 *Compile* 函数编译好的正则匹配的话,*MatchString* 函数就会返回 'true'. 6 | 7 | package main 8 | 9 | import ( 10 | "fmt" 11 | "regexp" 12 | ) 13 | 14 | func main() { 15 | r, err := regexp.Compile(`Hello`) 16 | 17 | if err != nil { 18 | fmt.Printf("There is a problem with your regexp.\n") 19 | return 20 | } 21 | 22 | // Will print 'Match' 23 | if r.MatchString("Hello Regular Expression.") == true { 24 | fmt.Printf("Match ") 25 | } else { 26 | fmt.Printf("No match ") 27 | } 28 | } 29 | 30 | *Compile* 函数是 regexp 包的核心所在。 每一个正则必由 *Compile* 或其姊妹函数 *MustCompile* 编译后方可使用。*MustCompile* 除了正则在不能正确被编译时会抛出异常外,使用方法和 *Compile* 几乎相同。因为 *MustCompile* 的任何错误都会导致一个异常,所以它无需返回表示错误码的第二个返回值。这就使得把 *MustCompile* 和匹配函数链在一起调用更加容易。像下面这样: 31 | (但考虑性能因素,要避免在一个循环里重复编译正则表达式的用法) 32 | 33 | package main 34 | 35 | import ( 36 | "fmt" 37 | "regexp" 38 | ) 39 | 40 | func main() { 41 | if regexp.MustCompile(`Hello`).MatchString("Hello Regular Expression.") == true { 42 | fmt.Printf("Match ") // 会再次打印 'Match' 43 | } else { 44 | fmt.Printf("No match ") 45 | } 46 | } 47 | 48 | 49 | 这句不合法的正则 50 | 51 | var myre = regexp.MustCompile(`\d(+`) 52 | 53 | 会导致错误: 54 | 55 | panic: regexp: Compile(`\d(+`): error parsing regexp: missing argument to repetition operator: `+` 56 | 57 | goroutine 1 [running]: 58 | regexp.MustCompile(0x4de620, 0x4, 0x4148e8) 59 | go/src/pkg/regexp/regexp.go:207 +0x13f 60 | 61 | 62 | *Compile* 函数的第二个参数会返回一个错误值。 在本教程中我通常都忽略这第二个参数,因为我写的所有正则都棒棒哒 ;-)。如果你写的正则也是字面量当然也可能没有问题,但是如果是在运行时从输入获取的值作为正则表达式,那你最好还是检查一下返回的这个错误值。 63 | 64 | 本教程接下来为了简洁会略过所有错误返回值的检查。 65 | 66 | 下面这个正则会匹配失败: 67 | 68 | r, err := regexp.Compile(`Hxllo`) 69 | // Will print 'false' 70 | fmt.Printf("%v", r.MatchString("Hello Regular Expression.")) 71 | 72 | ## CompilePOSIX/MustCompilePOSIX ## 73 | 74 | *CompilePOSIX* 和 *MustCompilePOSIX* 方法运行着的是一个略为不同的引擎。这两个里面采用的是 POSIX ERE (extended regular expression) 引擎。从 Go 语言的视角看它们采用了严格的规则集合,也就是 *egrep* 所支持的标准。因此 Go 的标准 re2 引擎支持的某些细节在 POSIX 版本中是没有的,比如 *\A*. 75 | 76 | s := "ABCDEEEEE" 77 | rr := regexp.MustCompile(`\AABCDE{2}|ABCDE{4}`) 78 | rp := regexp.MustCompilePOSIX(`\AABCDE{2}|ABCDE{4}`) 79 | fmt.Println(rr.FindAllString(s, 2)) 80 | fmt.Println(rp.FindAllString(s, 2)) 81 | 82 | 这里只有 *MustCompilePOSIX* 函数会解析失败,因为 POSIX ERE 中不支持 *\A*。 83 | 84 | 还有,POSIX 引擎更趋向最左最长(_leftmost-longest_)的匹配。在初次匹配到时并不会返回,而是会检查匹配到的是不是最长的匹配。 比如: 85 | 86 | s := "ABCDEEEEE" 87 | rr := regexp.MustCompile(`ABCDE{2}|ABCDE{4}`) 88 | rp := regexp.MustCompilePOSIX(`ABCDE{2}|ABCDE{4}`) 89 | fmt.Println(rr.FindAllString(s, 2)) 90 | fmt.Println(rp.FindAllString(s, 2)) 91 | 92 | 将打印: 93 | 94 | [ABCDEE] <- 第一个可接受的匹配 95 | [ABCDEEEE] <- 但是 POSIX 想要更长的匹配 96 | 97 | 只有当你有一些特殊需求时,POSIX 函数也许才会是你的不二之选。 98 | 99 | ## 字符分类 ## 100 | 101 | 字符类别 '\w' 代表所有 [A-Za-z0-9_] 包含在内的字符。 助记法:'word'。 102 | 103 | r, err := regexp.Compile(`H\wllo`) 104 | // Will print 'true'. 105 | fmt.Printf("%v", r.MatchString("Hello Regular Expression.")) 106 | 107 | 字符类别 '\d' 代表所有数字字符。 108 | 109 | r, err := regexp.Compile(`\d`) 110 | // Will print 'true': 111 | fmt.Printf("%v", r.MatchString("Seven times seven is 49.")) 112 | // Will print 'false': 113 | fmt.Printf("%v", r.MatchString("Seven times seven is forty-nine.")) 114 | 115 | 字符类别 '\s' 代表以下任何空白:TAB, SPACE, CR, LF。或者更确切的说是 [\t\n\f\r ]。 116 | 117 | r, err := regexp.Compile(`\s`) 118 | // Will print 'true': 119 | fmt.Printf("%v", r.MatchString("/home/bill/My Documents")) 120 | 121 | 使用字符类别表示方法的大写形式表示相反的类别。所以 '\D' 代表任何不属于 '\d' 类别的字符。 122 | 123 | r, err := regexp.Compile(`\S`) // Not a whitespace 124 | // Will print 'true', obviously there are non-whitespaces here: 125 | fmt.Printf("%v", r.MatchString("/home/bill/My Documents")) 126 | 127 | 检查一个字符串是不是包含单词字符以外的字符: 128 | 129 | r, err := regexp.Compile(`\W`) // Not a \w character. 130 | 131 | fmt.Printf("%v", r.MatchString("555-shoe")) // true: has a non-word char: The hyphen 132 | fmt.Printf("%v", r.MatchString("555shoe")) // false: has no non-word char. 133 | 134 | ## 匹配的内容中有什么? ## 135 | 136 | *FindString* 函数会查找一个字符串。当你使用一个字面量的字符串作为正则时,结果自然就是该字符串本身。只有当你使用模式以及分类时,结果才会更加有趣。 137 | 138 | r, err := regexp.Compile(`Hello`) 139 | // 会打印 'Hello' 140 | fmt.Printf(r.FindString("Hello Regular Expression. Hullo again.")) 141 | 142 | 当 FindString 找不到和正则表达式匹配的字符串时,它会返回空白字符串。要知道空白字符串也算是一次有效匹配的结果。 143 | 144 | r, err := regexp.Compile(`Hxllo`) 145 | // 什么都不打印 (也就是空字符串) 146 | fmt.Printf(r.FindString("Hello Regular Expression.")) 147 | 148 | FindString 会在首次匹配后即返回。如果你想尽可能多地匹配你就需要 *FindAllString()* 函数,这个后面会讲到。 149 | 150 | ### 特殊字符 ### 151 | 152 | 句点 '.' 匹配任意字符。 153 | 154 | // 会打印出 'cat' 155 | r, err := regexp.Compile(`.at`) 156 | fmt.Printf(r.FindString("The cat sat on the mat.")) 157 | 158 | 'cat' 是第一个匹配。 159 | 160 | // 更多的点号 161 | s:= "Nobody expects the Spanish inquisition." 162 | // -- -- -- 163 | r, err := regexp.Compile(`e.`) 164 | res := r.FindAllString(s, -1) // negative: all matches 165 | // 打印 [ex ec e ]。最后一个元素是 'e' 和一个空白字符 166 | fmt.Printf("%v", res) 167 | res = r.FindAllString(s, 2) // find 2 or less matches 168 | // 打印 [ex ec] 169 | fmt.Printf("%v", res) 170 | 171 | ## 特殊字符的字面量 ## 172 | 173 | 查找 '\\':在字符串里 '\\' 需要转义一次,而在正则里就要转义两次。 174 | 175 | r, err := regexp.Compile(`C:\\\\`) 176 | if r.MatchString("Working on drive C:\\") == true { 177 | fmt.Printf("Matches.") // <--- 178 | } else { 179 | fmt.Printf("No match.") 180 | } 181 | 182 | 查找一个字面量的句点: 183 | 184 | r, err := regexp.Compile(`\.`) 185 | if r.MatchString("Short.") == true { 186 | fmt.Printf("Has a dot.") // <--- 187 | } else { 188 | fmt.Printf("Has no dot.") 189 | } 190 | 191 | 其它用来组成正则表达式的特殊字符也基本这样用: .+*?()|[]{}^$ 192 | 193 | 如查找一个字面量的美元符号: 194 | 195 | r, err := regexp.Compile(`\$`) 196 | if len(r.FindString("He paid $150 for that software.")) != 0 { 197 | fmt.Printf("Found $-symbol.") // <--- 198 | } else { 199 | fmt.Printf("No $$$.") 200 | } 201 | 202 | ## 简单的重复模式 ## 203 | 204 | *FindAllString* 函数返回匹配到的所有字符串的一个数组。FindAllString 需要两个参数,一个字符串正则以及需要返回的匹配内容的最大数量,如果你确定需要所有的匹配内容时就传 '-1' 给它。 205 | 206 | 查找字词。一个词就是字符类型 \w 的一个序列。加号 '+' 可以表示重复: 207 | 208 | s := "Eenie meenie miny moe." 209 | r, err := regexp.Compile(`\w+`) 210 | res := r.FindAllString(s, -1) 211 | // 打印 [Eenie meenie miny moe] 212 | fmt.Printf("%v", res) 213 | 214 | 和在命令行下作为文件名字通配符不同,'\*' 并不表示“任意字符”,而是表示它前面的一个字符(或分组)的重复次数。'+' 需要它前面的字符至少出现一次,'*' 在零次时也是满足的。这个可能会导致匪夷所思的结果。 215 | 216 | s := "Firstname Lastname" 217 | r, err := regexp.Compile(`\w+\s\w+`) 218 | res := r.FindString(s) 219 | // Prints Firstname Lastname 220 | fmt.Printf("%v", res) 221 | 222 | 但是如果是有些用户输入的内容可能会有两个空格: 223 | 224 | s := "Firstname Lastname" 225 | r, err := regexp.Compile(`\w+\s\w+`) 226 | res := r.FindString(s) 227 | // 打印为空 (空字符串说明没有匹配到) 228 | fmt.Printf("%v", res) 229 | 230 | 使用 '\s+' 我们可以允许任意数量(但至少一个)的空白字符: 231 | 232 | s := "Firstname Lastname" 233 | r, err := regexp.Compile(`\w+\s+\w+`) 234 | res := r.FindString(s) 235 | // Prints Firstname Lastname 236 | fmt.Printf("%v", res) 237 | 238 | 如果你读取一个 INI 配置格式的文本文件,你也许会宽松地对待等号两侧的空白字符。 239 | 240 | s := "Key=Value" 241 | r, err := regexp.Compile(`\w+=\w+`) 242 | res := r.FindAllString(s, -1) 243 | // OK, prints Key=Value 244 | fmt.Printf("%v", res) 245 | 246 | 现在让我们在等号两边加上空格。 247 | 248 | s := "Key = Value" 249 | r, err := regexp.Compile(`\w+=\w+`) 250 | res := r.FindAllString(s, -1) 251 | // 失败了,什么都没有打印出来,因为 \w 不匹配空格 252 | fmt.Printf("%v", res) 253 | 254 | 于是我们用 '\s*' 来允许一些空格(包括没有空格的情况): 255 | 256 | s := "Key = Value" 257 | r, err := regexp.Compile(`\w+\s*=\s*\w+`) 258 | res := r.FindAllString(s, -1) 259 | fmt.Printf("%v", res) 260 | 261 | Go 的正则模式支持更多的和 '?' 结合使用的模式。 262 | 263 | ## 锚点和边界 ## 264 | 265 | 插入符号 ^ 标记“行的开始”。 266 | 267 | s := "Never say never." 268 | r, err1 := regexp.Compile(`^N`) // Do we have an 'N' at the beginning? 269 | fmt.Printf("%v ", r.MatchString(s)) // true 270 | t, err2 := regexp.Compile(`^n`) // Do we have an 'n' at the beginning? 271 | fmt.Printf("%v ", t.MatchString(s)) // false 272 | 273 | 美元符号 $ 标记“行的结束”。 274 | 275 | s := "All is well that ends well" 276 | r, err := regexp.Compile(`well$`) 277 | fmt.Printf("%v ", r.MatchString(s)) // true 278 | 279 | r, err = regexp.Compile(`well`) 280 | fmt.Printf("%v ", r.MatchString(s)) // true, but matches with first 281 | // occurrence of 'well' 282 | r, err = regexp.Compile(`ends$`) 283 | fmt.Printf("%v ", r.MatchString(s)) // false, not at end of line. 284 | 285 | 我们看到 'well' 匹配到了。为了找到正则确切匹配到的位置,我们可以看一下索引。*FindStringIndex* 函数返回带有两个元素。第一个元素是正则表达式开始匹配到的位置的索引(当然是从0开始的)。第二个元素是正则匹配结束的下一个位置的索引。 286 | 287 | s := "All is well that ends well" 288 | // 012345678901234567890123456 289 | // 1 2 290 | r, err := regexp.Compile(`well$`) 291 | fmt.Printf("%v", r.FindStringIndex(s)) // 打印 [22 26] 292 | 293 | r, err = regexp.Compile(`well`) 294 | fmt.Printf("%v ", r.MatchString(s)) // true, 但是这回匹配第一次出现的 'well' 295 | fmt.Printf("%v", r.FindStringIndex(s)) // Prints [7 11], the match starts at 7 and end before 11. 296 | 297 | r, err = regexp.Compile(`ends$`) 298 | fmt.Printf("%v ", r.MatchString(s)) // false, 'ends' 并不是在结尾处 299 | 300 | 你可以使用 '\b' 查找一个单词的边界。*FindAllStringIndex* 函数会捕获一个正则中所有命中的位置,以一个数组容器的形式返回。 301 | 302 | s := "How much wood would a woodchuck chuck in Hollywood?" 303 | // 012345678901234567890123456789012345678901234567890 304 | // 10 20 30 40 50 305 | // -1-- -2-- -3-- 306 | // 查找以 wood 开头的词 307 | r, err := regexp.Compile(`\bwood`) // 1 2 308 | fmt.Printf("%v", r.FindAllStringIndex(s, -1)) // [[9 13] [22 26]] 309 | 310 | // 查找以 wood 结尾的词 311 | r, err = regexp.Compile(`wood\b`) // 1 3 312 | fmt.Printf("%v", r.FindAllStringIndex(s, -1)) // [[9 13] [46 50]] 313 | 314 | // 查找以 wood 开头并以其结尾的词 315 | r, err = regexp.Compile(`\bwood\b`) // 1 316 | fmt.Printf("%v", r.FindAllStringIndex(s, -1)) // [[9 13]] 317 | 318 | ## 字符分类 ## 319 | 320 | 你可以在任何位置获取一组(或类)字符串,而不是一个单个的字面量字符。在本例中[uio] 就是一个“字符串分类”。在方括号中的任意字符都满足该正则表达式。所以,这个正则会匹配到 'Hullo','Hillo',以及 'Hollo'。 321 | 322 | r, err := regexp.Compile(`H[uio]llo`) 323 | // Will print 'Hullo'. 324 | fmt.Printf(r.FindString("Hello Regular Expression. Hullo again.")) 325 | 326 | 一个排除在外的字符分类会对分类的匹配取反。这时该正则就会匹配所有 'H.llo' 中的点号 *不* 是 'o', 'i' 或者 'u'的字符串。它不会匹配 "Hullo", "Hillo", "Hollo",但是会匹配 "Hallo" 甚至是 "H9llo"。 327 | 328 | r, err := regexp.Compile(`H[^uio]llo`) 329 | fmt.Printf("%v ", r.MatchString("Hillo")) // false 330 | fmt.Printf("%v ", r.MatchString("Hallo")) // true 331 | fmt.Printf("%v ", r.MatchString("H9llo")) // true 332 | 333 | ## POSIX 字符分类 ## 334 | 335 | Golang regexp 库实现了 POSIX 字符分类。这不过就是给常用的类别取个可读性更好的别名。这些分类有: 336 | (https://github.com/google/re2/blob/master/doc/syntax.txt) 337 | 338 | [:alnum:] 字母和数字(alphanumeric) (≡ [0-9A-Za-z]) 339 | [:alpha:] 字母(alphabetic) (≡ [A-Za-z]) 340 | [:ascii:] ASCII (≡ [\x00-\x7F]) 341 | [:blank:] 空字符(blank) (≡ [\t ]) 342 | [:cntrl:] 控制字符(control) (≡ [\x00-\x1F\x7F]) 343 | [:digit:] 数字字符(digits) (≡ [0-9]) 344 | [:graph:] 图形符号(graphical) (≡ [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]) 345 | [:lower:] 小写字母(lower case) (≡ [a-z]) 346 | [:print:] 可打印字符(printable) (≡ [ -~] == [ [:graph:]]) 347 | [:punct:] 标点符号(punctuation) (≡ [!-/:-@[-`{-~]) 348 | [:space:] 空格字符(whitespace) (≡ [\t\n\v\f\r ]) 349 | [:upper:] 大写字母(upper case) (≡ [A-Z]) 350 | [:word:] 文字字符(word characters) (≡ [0-9A-Za-z_]) 351 | [:xdigit:] 十六进制(hex digit) (≡ [0-9A-Fa-f]) 352 | 353 | 注意你必须把一个 ASCII 字符用 [] 包起来。而且还要注意无论何时我们说到字母的时候我们仅仅是在指 ASCII 从65-90范围内的26个字母, 354 | 不包括那些带有变音符的字母。 355 | 356 | 例子:查找一个包含一个小写字母、一个标点符号、一个空格(空白字符)以及一个数字的序列: 357 | 358 | r, err := regexp.Compile(`[[:lower:]][[:punct:]][[:blank:]][[:digit:]]`) 359 | if r.MatchString("Fred: 12345769") == true { 360 | ---- 361 | fmt.Printf("Match ") // 362 | } else { 363 | fmt.Printf("No match ") 364 | } 365 | 366 | 我从来不用这些,因为它们需要打更多的字。但是在一些很多程序员一起工作的项目中,而且并不是每个人都像你一样 367 | 对正则表达式游刃有余的话,使用 POSIX 的写法也许也不失是一个好主意。 368 | 369 | ## Unicode 字符分类 ## 370 | 371 | Unicode 是以区块(block)来组织的,典型地以主题或者语言进行分组。在本章我给出一些例子,因为完全覆盖到全部那是 372 | 不可能的(况且也无甚用处)。参见 [re2 引擎完整 unicode 字符列表](https://code.google.com/p/re2/wiki/Syntax "unicode blocks of re2"). 373 | 374 | ### 示例:希腊语 ### 375 | 376 | 我们以一个希腊语代码块的简单例子开始。 377 | 378 | r, err := regexp.Compile(`\p{Greek}`) 379 | 380 | if r.MatchString("This is all Γςεεκ to me.") == true { 381 | fmt.Printf("Match ") // 会打印出 'Match' 382 | } else { 383 | fmt.Printf("No match ") 384 | } 385 | 386 | 在 Windows-1252 代码页有个 mu,但是没有被认定为希腊语。因为 \p{Greek} 仅仅覆盖 U+0370 到 U+03FF 的部分 http://en.wikipedia.org/wiki/Greek_and_Coptic 。 387 | 388 | if r.MatchString("the µ is right before ¶") == true { 389 | fmt.Printf("Match ") 390 | } else { 391 | fmt.Printf("No match ") // 会打印出 'No match' 392 | } 393 | 394 | 有些来自希腊语和科普特语(Coptic)代码页的特别酷的字母被认定为希腊语,而实际上 395 | 可能是科普特语,要注意。 396 | 397 | if r.MatchString("ϵ϶ϓϔϕϖϗϘϙϚϛϜ") == true { 398 | fmt.Printf("Match ") // Will print 'Match' 399 | } else { 400 | fmt.Printf("No match ") 401 | } 402 | 403 | ### 示例:布莱叶盲文(Braille)### 404 | 405 | 你必须使用一种支持布莱叶盲文的字体。 [布莱叶盲文](http://en.wikipedia.org/wiki/Braille "布莱叶盲文") 406 | 407 | 我怀疑这得配合一个支持布莱叶盲文的打印机才会有用,但这个就随你了。 408 | 409 | r2, err := regexp.Compile(`\p{Braille}`) 410 | if r2.MatchString("This is all ⢓⢔⢕⢖⢗⢘⢙⢚⢛ to me.") == true { 411 | fmt.Printf("Match ") // 会打印出 'Match' 412 | } else { 413 | fmt.Printf("No match ") 414 | } 415 | 416 | ### 示例:彻罗基语(Cherokee)### 417 | 418 | 你必须使用一种支持彻罗基语的字体(比如 Code2000)。 419 | 彻罗基语言的故事绝对值得一读。[去读](http://en.wikipedia.org/wiki/Cherokee#Language_and_writing_system "彻罗基语"). 420 | 421 | r3, err := regexp.Compile(`\p{Cherokee}`) 422 | if r3.MatchString("This is all ᏯᏰᏱᏲᏳᏴ to me.") == true { 423 | fmt.Printf("Match ") // 会打印出 'Match' 424 | } else { 425 | fmt.Printf("No match ") 426 | } 427 | 428 | ## 择一匹配 ## 429 | 430 | 你可以使用管道符号 '|' 允许两个或多个不同的可能来提供可选择性的匹配。如果你只是想对正则表达式中的某些部分进行可选择性的匹配,你可以使用括号来进行分组。 431 | 432 | r, err1 := regexp.Compile(`Jim|Tim`) 433 | fmt.Printf("%v", r.MatchString("Dickie, Tom and Tim")) // true 434 | fmt.Printf("%v", r.MatchString("Jimmy, John and Jim")) // true 435 | 436 | t, err2 := regexp.Compile(`Santa Clara|Santa Barbara`) 437 | s := "Clara was from Santa Barbara and Barbara was from Santa Clara" 438 | // ------------- ----------- 439 | fmt.Printf("%v", t.FindAllStringIndex(s, -1)) 440 | // [[15 28] [50 61]] 441 | 442 | u, err3 := regexp.Compile(`Santa (Clara|Barbara)`) // Equivalent 443 | v := "Clara was from Santa Barbara and Barbara was from Santa Clara" 444 | // ------------- ----------- 445 | fmt.Printf("%v", u.FindAllStringIndex(v, -1)) 446 | // [[15 28] [50 61]] 447 | 448 | 449 | -------------------------------------------------------------------------------- /01-chapter1.markdown: -------------------------------------------------------------------------------- 1 | # Part 1: The basics # 2 | 3 | ## Simple Matching ## 4 | 5 | You want to know if a string matches a regular expression. The *MatchString*-function returns 'true' if the string-argument matches the regular expression that you prepared with *Compile*. 6 | 7 | ```go 8 | package main 9 | 10 | import ( 11 | "fmt" 12 | "regexp" 13 | ) 14 | 15 | func main() { 16 | r, err := regexp.Compile(`Hello`) 17 | 18 | if err != nil { 19 | fmt.Printf("There is a problem with your regexp.\n") 20 | return 21 | } 22 | 23 | // Will print 'Match' 24 | if r.MatchString("Hello Regular Expression.") == true { 25 | fmt.Printf("Match ") 26 | } else { 27 | fmt.Printf("No match ") 28 | } 29 | } 30 | ``` 31 | 32 | *Compile* is the heart of the regexp-package. Every regular expression must be prepared with *Compile* or 33 | its sister-function *MustCompile*. The *MustCompile*-function behaves 34 | almost like *Compile*, but throws a panic if the regular expression cannot be compiled. Because any 35 | error in *MustCompile* leads to a panic, there is no need for returning an error code as second return value. 36 | This makes it easier to chain the *MustCompile* call with the match-function of your choice, like shown here: 37 | (But you should avoid the repeated compilation of a regular expression in a loop for performance reasons.) 38 | 39 | ```go 40 | package main 41 | 42 | import ( 43 | "fmt" 44 | "regexp" 45 | ) 46 | 47 | func main() { 48 | if regexp.MustCompile(`Hello`).MatchString("Hello Regular Expression.") == true { 49 | fmt.Printf("Match ") // Will print 'Match' again 50 | } else { 51 | fmt.Printf("No match ") 52 | } 53 | } 54 | ``` 55 | 56 | The following illegal regexp 57 | 58 | ```go 59 | var myre = regexp.MustCompile(`\d(+`) 60 | ``` 61 | 62 | will yield 63 | 64 | panic: regexp: Compile(`\d(+`): error parsing regexp: missing argument to repetition operator: `+` 65 | 66 | goroutine 1 [running]: 67 | regexp.MustCompile(0x4de620, 0x4, 0x4148e8) 68 | go/src/pkg/regexp/regexp.go:207 +0x13f 69 | 70 | 71 | The *Compile*-function returns in its second argument an error value. In this tutorial I will usually discard it, because of course all my regexes are perfect ;-). You might get away with that if your regexps are literals, but if the regexp is derived from input at runtime you definitely want to check the error value. 72 | 73 | For the rest of this tutorial the evaluation of the error value is skipped for brevity. 74 | 75 | This regular expression will not match: 76 | 77 | ```go 78 | r, err := regexp.Compile(`Hxllo`) 79 | // Will print 'false' 80 | fmt.Printf("%v", r.MatchString("Hello Regular Expression.")) 81 | ``` 82 | 83 | ## CompilePOSIX/MustCompilePOSIX ## 84 | 85 | *CompilePOSIX* and *MustCompilePOSIX* are running a slightly different engine. The rules are implemented 86 | following the POSIX ERE (extended regular expression); from the viewpoint of Go this implies a 87 | _restricted_ set of rules, namely those supported by *egrep*. Thus, a couple of niceties that Go's standard 88 | re2-engine supports are not found in the POSIX version, e.g. *\A*. 89 | 90 | ```go 91 | s := "ABCDEEEEE" 92 | rr := regexp.MustCompile(`\AABCDE{2}|ABCDE{4}`) 93 | rp := regexp.MustCompilePOSIX(`\AABCDE{2}|ABCDE{4}`) 94 | fmt.Println(rr.FindAllString(s, 2)) 95 | fmt.Println(rp.FindAllString(s, 2)) 96 | ``` 97 | 98 | This fails to compile, but only for the *MustCompilePOSIX*-function, because *\A* is not part of POSIX ERE. 99 | 100 | Furthermore the POSIX engine will prefer the _leftmost-longest_ match. It will not return after finding the first 101 | match, but will check that the found match is indeed the longest one. Thus, 102 | 103 | ```go 104 | s := "ABCDEEEEE" 105 | rr := regexp.MustCompile(`ABCDE{2}|ABCDE{4}`) 106 | rp := regexp.MustCompilePOSIX(`ABCDE{2}|ABCDE{4}`) 107 | fmt.Println(rr.FindAllString(s, 2)) 108 | fmt.Println(rp.FindAllString(s, 2)) 109 | ``` 110 | 111 | will print 112 | 113 | [ABCDEE] <- first acceptable match 114 | [ABCDEEEE] <- But POSIX wants the longer match 115 | 116 | The two POSIX-functions are probably only the methods of choice if you have very specific requirements... 117 | 118 | ## Character classes ## 119 | 120 | Character class '\w' represents any character from the class [A-Za-z0-9_], mnemonic: 'word'. 121 | 122 | ```go 123 | r, err := regexp.Compile(`H\wllo`) 124 | // Will print 'true'. 125 | fmt.Printf("%v", r.MatchString("Hello Regular Expression.")) 126 | ``` 127 | Character class '\d' represents any numeric digit. 128 | 129 | ```go 130 | r, err := regexp.Compile(`\d`) 131 | // Will print 'true': 132 | fmt.Printf("%v", r.MatchString("Seven times seven is 49.")) 133 | // Will print 'false': 134 | fmt.Printf("%v", r.MatchString("Seven times seven is forty-nine.")) 135 | ``` 136 | 137 | Character class '\s' represents any of the following whitespaces: TAB, SPACE, CR, LF. Or more precisely [\t\n\f\r ]. 138 | 139 | ```go 140 | r, err := regexp.Compile(`\s`) 141 | // Will print 'true': 142 | fmt.Printf("%v", r.MatchString("/home/bill/My Documents")) 143 | ``` 144 | 145 | Character classes can be negated by using the uppercase '\D', '\S', '\W'. Thus, '\D' is any character that is *not* a '\d'. 146 | 147 | ```go 148 | r, err := regexp.Compile(`\S`) // Not a whitespace 149 | // Will print 'true', obviously there are non-whitespaces here: 150 | fmt.Printf("%v", r.MatchString("/home/bill/My Documents")) 151 | ``` 152 | 153 | Check if a string has anything that is not a word-char. 154 | 155 | ```go 156 | r, err := regexp.Compile(`\W`) // Not a \w character. 157 | 158 | fmt.Printf("%v", r.MatchString("555-shoe")) // true: has a non-word char: The hyphen 159 | fmt.Printf("%v", r.MatchString("555shoe")) // false: has no non-word char. 160 | ``` 161 | 162 | ## What's in a Match? ## 163 | 164 | The *FindString*-function finds a string. When you use a literal string, the result will obviously be the string itself. Only when you start using patterns and classes the result will be more interesting. 165 | 166 | ```go 167 | r, err := regexp.Compile(`Hello`) 168 | // Will print 'Hello' 169 | fmt.Printf(r.FindString("Hello Regular Expression. Hullo again.")) 170 | ``` 171 | 172 | When FindString does not find a string that matches the regular expression, it will return the empty string. Be aware that the empty string might also be the result of a valid match. 173 | 174 | ```go 175 | r, err := regexp.Compile(`Hxllo`) 176 | // Will print nothing (=the empty string) 177 | fmt.Printf(r.FindString("Hello Regular Expression.")) 178 | ``` 179 | 180 | FindString returns after the first match. If you are interested in more possible matches you would use *FindAllString()*, see below. 181 | 182 | ### Special Characters ### 183 | 184 | The dot '.' matches any character. 185 | 186 | ```go 187 | // Will print 'cat'. 188 | r, err := regexp.Compile(`.at`) 189 | fmt.Printf(r.FindString("The cat sat on the mat.")) 190 | ``` 191 | 192 | 'cat' was the first match. 193 | 194 | ```go 195 | // more dot. 196 | s:= "Nobody expects the Spanish inquisition." 197 | // -- -- -- 198 | r, err := regexp.Compile(`e.`) 199 | res := r.FindAllString(s, -1) // negative: all matches 200 | // Prints [ex ec e ]. The last item is 'e' and a space. 201 | fmt.Printf("%v", res) 202 | res = r.FindAllString(s, 2) // find 2 or less matches 203 | // Prints [ex ec]. 204 | fmt.Printf("%v", res) 205 | ``` 206 | 207 | ## Literal Special Characters ## 208 | 209 | Finding one backslash '\\': It must be escaped twice in the regex and once in the string. 210 | 211 | ```go 212 | r, err := regexp.Compile("C:\\\\") 213 | if r.MatchString("Working on drive C:\\") == true { 214 | fmt.Printf("Matches.") // <--- 215 | } else { 216 | fmt.Printf("No match.") 217 | } 218 | ``` 219 | 220 | Finding a literal dot: 221 | 222 | ```go 223 | r, err := regexp.Compile(`\.`) 224 | if r.MatchString("Short.") == true { 225 | fmt.Printf("Has a dot.") // <--- 226 | } else { 227 | fmt.Printf("Has no dot.") 228 | } 229 | ``` 230 | 231 | The other special characters that are relevant for constructing regular expressions work in a similar fashion: .+*?()|[]{}^$ 232 | 233 | Finding a literal dollar symbol: 234 | 235 | ```go 236 | r, err := regexp.Compile(`\$`) 237 | if len(r.FindString("He paid $150 for that software.")) != 0 { 238 | fmt.Printf("Found $-symbol.") // <--- 239 | } else { 240 | fmt.Printf("No $$$.") 241 | } 242 | ``` 243 | 244 | ## Simple Repetition ## 245 | 246 | The *FindAllString*-function returns an array with all the strings that matched. FindAllString takes two arguments, a string and the maximum number of matches that shall be returned. If you definitely want all matches use '-1'. 247 | 248 | Finding words. A word is a sequence of characters of type \w. The plus symbol '+' signifies a repetition: 249 | 250 | ```go 251 | s := "Eenie meenie miny moe." 252 | r, err := regexp.Compile(`\w+`) 253 | res := r.FindAllString(s, -1) 254 | // Prints [Eenie meenie miny moe] 255 | fmt.Printf("%v", res) 256 | ``` 257 | 258 | In contrast to wildcards used on the commandline for filename matching, the '\*' does not symbolize 'any character', but the repetition of the previous character (or group). While the '+' requires at least a single occurence of its preceding symbol, the '*' is also satisfied with 0 occurences. This can lead to strange results. 259 | 260 | ```go 261 | s := "Firstname Lastname" 262 | r, err := regexp.Compile(`\w+\s\w+`) 263 | res := r.FindString(s) 264 | // Prints Firstname Lastname 265 | fmt.Printf("%v", res) 266 | ``` 267 | 268 | But if this is some user supplied input, there might be two spaces: 269 | 270 | ```go 271 | s := "Firstname Lastname" 272 | r, err := regexp.Compile(`\w+\s\w+`) 273 | res := r.FindString(s) 274 | // Prints nothing (the empty string=no match) 275 | fmt.Printf("%v", res) 276 | ``` 277 | 278 | We allow any number (but at least one) of spaces with '\s+': 279 | 280 | ```go 281 | s := "Firstname Lastname" 282 | r, err := regexp.Compile(`\w+\s+\w+`) 283 | res := r.FindString(s) 284 | // Prints Firstname Lastname 285 | fmt.Printf("%v", res) 286 | ``` 287 | 288 | If you read a text file in INI-style, you might want to be permissive regarding spaces around the equal-sign. 289 | 290 | ```go 291 | s := "Key=Value" 292 | r, err := regexp.Compile(`\w+=\w+`) 293 | res := r.FindAllString(s, -1) 294 | // OK, prints Key=Value 295 | fmt.Printf("%v", res) 296 | ``` 297 | 298 | Now let's add some spaces around the equal sign. 299 | 300 | ```go 301 | s := "Key = Value" 302 | r, err := regexp.Compile(`\w+=\w+`) 303 | res := r.FindAllString(s, -1) 304 | // FAIL, prints nothing, the \w does not match the space. 305 | fmt.Printf("%v", res) 306 | ``` 307 | 308 | Therefore we allow a number of spaces (including possibly 0) with '\s*': 309 | 310 | ```go 311 | s := "Key = Value" 312 | r, err := regexp.Compile(`\w+\s*=\s*\w+`) 313 | res := r.FindAllString(s, -1) 314 | fmt.Printf("%v", res) 315 | ``` 316 | 317 | The Go-regexp pattern supports a few more patterns constructed with '?'. 318 | 319 | ## Anchor and Boundaries ## 320 | 321 | The caret symbol ^ denotes a 'begin-of-line'. 322 | 323 | ```go 324 | s := "Never say never." 325 | r, err1 := regexp.Compile(`^N`) // Do we have an 'N' at the beginning? 326 | fmt.Printf("%v ", r.MatchString(s)) // true 327 | t, err2 := regexp.Compile(`^n`) // Do we have an 'n' at the beginning? 328 | fmt.Printf("%v ", t.MatchString(s)) // false 329 | ``` 330 | 331 | The dollar symbol $ denotes an 'end-of-line'. 332 | 333 | ```go 334 | s := "All is well that ends well" 335 | r, err := regexp.Compile(`well$`) 336 | fmt.Printf("%v ", r.MatchString(s)) // true 337 | 338 | r, err = regexp.Compile(`well`) 339 | fmt.Printf("%v ", r.MatchString(s)) // true, but matches with first 340 | // occurrence of 'well' 341 | r, err = regexp.Compile(`ends$`) 342 | fmt.Printf("%v ", r.MatchString(s)) // false, not at end of line. 343 | ``` 344 | 345 | We saw that 'well' matched. To figure out, where exactly the regexp matched, let's have a look at the indexes. The *FindStringIndex*-function returns an array with two entries. The first entry is the index (starting from 0, of course) where the regular expression matched. The second is the index _in front of which_ the regexp ended. 346 | 347 | ```go 348 | s := "All is well that ends well" 349 | // 012345678901234567890123456 350 | // 1 2 351 | r, err := regexp.Compile(`well$`) 352 | fmt.Printf("%v", r.FindStringIndex(s)) // Prints [22 26] 353 | 354 | r, err = regexp.Compile(`well`) 355 | fmt.Printf("%v ", r.MatchString(s)) // true, but matches with first 356 | // occurrence of 'well' 357 | fmt.Printf("%v", r.FindStringIndex(s)) // Prints [7 11], the match starts at 7 and end before 11. 358 | 359 | r, err = regexp.Compile(`ends$`) 360 | fmt.Printf("%v ", r.MatchString(s)) // false, not at end of line. 361 | ``` 362 | 363 | You can find a word boundary with '\b'. The *FindAllStringIndex*-function captures all the hits for a regexp in a container array. 364 | 365 | ```go 366 | s := "How much wood would a woodchuck chuck in Hollywood?" 367 | // 012345678901234567890123456789012345678901234567890 368 | // 10 20 30 40 50 369 | // -1-- -2-- -3-- 370 | // Find words that *start* with wood 371 | r, err := regexp.Compile(`\bwood`) // 1 2 372 | fmt.Printf("%v", r.FindAllStringIndex(s, -1)) // [[9 13] [22 26]] 373 | 374 | // Find words that *end* with wood 375 | r, err = regexp.Compile(`wood\b`) // 1 3 376 | fmt.Printf("%v", r.FindAllStringIndex(s, -1)) // [[9 13] [46 50]] 377 | 378 | // Find words that *start* and *end* with wood 379 | r, err = regexp.Compile(`\bwood\b`) // 1 380 | fmt.Printf("%v", r.FindAllStringIndex(s, -1)) // [[9 13]] 381 | ``` 382 | 383 | ## Character Classes ## 384 | 385 | Instead of a literal character you can require a set (or class) of characters at any location. In this example [uio] is a "character class". Any of the characters in the square brackets will satisfy the regexp. Thus, this regexp will match 'Hullo', 'Hillo', and 'Hollo' 386 | 387 | ```go 388 | r, err := regexp.Compile(`H[uio]llo`) 389 | // Will print 'Hullo'. 390 | fmt.Printf(r.FindString("Hello Regular Expression. Hullo again.")) 391 | ``` 392 | 393 | A negated character class reverses the match of the class. In this case it Will match all strings 'H.llo', where the dot is *not* 'o', 'i' or 'u'. It will not match "Hullo", "Hillo", "Hollo", but it will match "Hallo" and even "H9llo". 394 | 395 | ```go 396 | r, err := regexp.Compile(`H[^uio]llo`) 397 | fmt.Printf("%v ", r.MatchString("Hillo")) // false 398 | fmt.Printf("%v ", r.MatchString("Hallo")) // true 399 | fmt.Printf("%v ", r.MatchString("H9llo")) // true 400 | ``` 401 | 402 | ## POSIX character classes 403 | 404 | The Golang regexp library implements the POSIX character classes. These are simply 405 | aliases for frequently used classes that are given a more readable name. The classes are: 406 | (https://github.com/google/re2/blob/master/doc/syntax.txt) 407 | 408 | [:alnum:] alphanumeric (≡ [0-9A-Za-z]) 409 | [:alpha:] alphabetic (≡ [A-Za-z]) 410 | [:ascii:] ASCII (≡ [\x00-\x7F]) 411 | [:blank:] blank (≡ [\t ]) 412 | [:cntrl:] control (≡ [\x00-\x1F\x7F]) 413 | [:digit:] digits (≡ [0-9]) 414 | [:graph:] graphical (≡ [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]) 415 | [:lower:] lower case (≡ [a-z]) 416 | [:print:] printable (≡ [ -~] == [ [:graph:]]) 417 | [:punct:] punctuation (≡ [!-/:-@[-`{-~]) 418 | [:space:] whitespace (≡ [\t\n\v\f\r ]) 419 | [:upper:] upper case (≡ [A-Z]) 420 | [:word:] word characters (≡ [0-9A-Za-z_]) 421 | [:xdigit:] hex digit (≡ [0-9A-Fa-f]) 422 | 423 | Note that you have to wrap an ASCII character class in []. Furthmore note that whenever we speak about alphabet we are only 424 | talking about the 26 letters in ASCII range 65-90, not including letters with diacritical marks. 425 | 426 | Example: Find a sequence of a lower case letter, a punctuation character, a space (blank) and a digit: 427 | 428 | ```go 429 | r, err := regexp.Compile(`[[:lower:]][[:punct:]][[:blank:]][[:digit:]]`) 430 | if r.MatchString("Fred: 12345769") == true { 431 | ---- 432 | fmt.Printf("Match ") // 433 | } else { 434 | fmt.Printf("No match ") 435 | } 436 | ``` 437 | 438 | I never use those, because they require more typing, but they might actually be a good idea in 439 | projects with many developers where not everybody is as well versed in regular expressions as you are. 440 | 441 | ## Unicode Classes ## 442 | 443 | Unicode is organized in blocks, typically grouped by topic or language. In this chapter 444 | I give some examples, because it's next to impossible to cover all of them (and it doesn't 445 | really help). Refer to [complete unicode list of the 446 | re2 engine](https://code.google.com/p/re2/wiki/Syntax "unicode blocks of re2"). 447 | 448 | ### Example: Greek ### 449 | 450 | We start with a simple example from the Greek code block. 451 | 452 | ```go 453 | r, err := regexp.Compile(`\p{Greek}`) 454 | 455 | if r.MatchString("This is all Γςεεκ to me.") == true { 456 | fmt.Printf("Match ") // Will print 'Match' 457 | } else { 458 | fmt.Printf("No match ") 459 | } 460 | ``` 461 | 462 | On the Windows-1252 codepage there is a mu, but it 463 | doesn't qualify, because \p{Greek} covers only 464 | http://en.wikipedia.org/wiki/Greek_and_Coptic 465 | the range U+0370..U+03FF. 466 | 467 | ```go 468 | if r.MatchString("the µ is right before ¶") == true { 469 | fmt.Printf("Match ") 470 | } else { 471 | fmt.Printf("No match ") // Will print 'No match' 472 | } 473 | ``` 474 | 475 | Some extra cool letters from the Greek and Coptic 476 | codepage that qualify as 'Greek' although they are 477 | probably Coptic, so be careful. 478 | 479 | ```go 480 | if r.MatchString("ϵ϶ϓϔϕϖϗϘϙϚϛϜ") == true { 481 | fmt.Printf("Match ") // Will print 'Match' 482 | } else { 483 | fmt.Printf("No match ") 484 | } 485 | ``` 486 | 487 | ### Example: Braille ### 488 | 489 | You have to use a font that supports [Braille](http://en.wikipedia.org/wiki/Braille "Braille"). 490 | I have my doubts that this is useful unless combined with a Braille capable printer, but there you go. 491 | 492 | ```go 493 | r2, err := regexp.Compile(`\p{Braille}`) 494 | if r2.MatchString("This is all ⢓⢔⢕⢖⢗⢘⢙⢚⢛ to me.") == true { 495 | fmt.Printf("Match ") // Will print 'Match' 496 | } else { 497 | fmt.Printf("No match ") 498 | } 499 | ``` 500 | 501 | ### Example: Cherokee ### 502 | 503 | You have to use a font that supports Cherokee (e.g. Code2000). 504 | The story of the Cherokee script is definitely worth [reading about](http://en.wikipedia.org/wiki/Cherokee#Language_and_writing_system "Cherokee"). 505 | 506 | ```go 507 | r3, err := regexp.Compile(`\p{Cherokee}`) 508 | if r3.MatchString("This is all ᏯᏰᏱᏲᏳᏴ to me.") == true { 509 | fmt.Printf("Match ") // Will print 'Match' 510 | } else { 511 | fmt.Printf("No match ") 512 | } 513 | ``` 514 | 515 | ## Alternatives ## 516 | 517 | You can provide alternatives using the pipe-symbol '|' to allow two (or more) different possible matches. If you want to allow alternatives only in parts of the regular expression, you can use parentheses for grouping. 518 | 519 | ```go 520 | r, err1 := regexp.Compile(`Jim|Tim`) 521 | fmt.Printf("%v", r.MatchString("Dickie, Tom and Tim")) // true 522 | fmt.Printf("%v", r.MatchString("Jimmy, John and Jim")) // true 523 | 524 | t, err2 := regexp.Compile(`Santa Clara|Santa Barbara`) 525 | s := "Clara was from Santa Barbara and Barbara was from Santa Clara" 526 | // ------------- ----------- 527 | fmt.Printf("%v", t.FindAllStringIndex(s, -1)) 528 | // [[15 28] [50 61]] 529 | 530 | u, err3 := regexp.Compile(`Santa (Clara|Barbara)`) // Equivalent 531 | v := "Clara was from Santa Barbara and Barbara was from Santa Clara" 532 | // ------------- ----------- 533 | fmt.Printf("%v", u.FindAllStringIndex(v, -1)) 534 | // [[15 28] [50 61]] 535 | ``` 536 | 537 | --------------------------------------------------------------------------------