正则表达式反向引用

参考: http://java.dzone.com/articles/backreferences-java-regular

以前没用过这种用法,mark。

介绍

反向引用是基于的,组就是把多个字符当作单一的单元看待。组是通过在一对小括号(())内放置正则字符来创建的,每对小括号对应一个组。

反向引用是便捷的,允许重复正则而不需要再写一次。可以通过 \# 来引用前面定义的组,# 是组的序号,从 1 开始。

正则引擎在处理匹配时,要求 反向引用与所引用的组 匹配的内容必须是一样的:即,(\d\d\d)\1 匹配 123123,而不匹配123456

举例


String str = "123123789789";
Pattern p = Pattern.compile("(\\d\\d\\d)\\1");
Matcher m = p.matcher(str);
System.out.println("group count:" + m.groupCount());
while (m.find()) {
    String word = m.group();
    System.out.println(word + " " + m.start() + " " + m.end());
}

System.out.println("\n\n");

String pattern = "\\b(\\w+)\\b[\\w\\W]*\\b\\1\\b";
Pattern p2 = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
String phrase = "unique is not duplicate but unique, Duplicate is duplicate.";
Matcher m2 = p2.matcher(phrase);
while (m2.find()) {
    String val = m2.group();
    System.out.println("Matching subsequence is \"" + val + "\"");
    System.out.println("Duplicate word: " + m2.group(1) + "\n");
}

输出:

group count:1
123123 0 6
789789 6 12



Matching subsequence is "unique is not duplicate but unique"
Duplicate word: unique

Matching subsequence is "Duplicate is duplicate"
Duplicate word: Duplicate

发表评论

电子邮件地址不会被公开。 必填项已用*标注

This site uses Akismet to reduce spam. Learn how your comment data is processed.