Most used Perl regular expression
| Includes | |
|---|---|
| [0-9] | includes all digits |
| \d | includes all digits |
| \D | includes all that is not digits |
| [a-m] | includes all the character from a to m |
| [a-z] | includes all the character from a to z |
| [A-Z] | includes all upper character from A to Z |
| [^az] | includes all but not a nor z |
| \w | includes all character like [0-9a-zA-Z_] |
| \W | includes all that is not a character |
| \s | include a space, a tab, a new line or a return |
| \S | include the opposite of \s |
Ex : [0-9a-zA-Z] will match any digit or letter from a to z uppercase or lowercase.
| Pattern | |
|---|---|
| * | 0 or more |
| + | one or more |
| ? | 0 ore once |
| {n} | n times exactly |
| {n,} | at least n times |
| {,n} | at best n times |
| {n,m} | between n and m times |
Ex : [0-9a-zA-Z]+ will match any number of letter or digit
tips : “\” neutralise a special character ex : “\.txt” neutralise the “.” Example of usage with grep :
grep -oP '/documents/[\w\s]+\.txt'
-o excludes what doesn't match the regular expression
-P use Perl regular expressions
To sum up this command ask grep to exclude all that doesn't start with /documents/ followed by one or more printable-characters-or-digits and ending with .txt
echo "<li class='pure-tree_link'><a href='/documents/Invoice_15_11_2020.pdf' target='_blank'>Invoice</a></li><li class='pure-tree_link'><a href='/documents/Report_15_01_2020.pdf' target='_blank'>Report</a></li><li class='pure-tree_link'><a href='/documents/flag_11dfa168ac8eb2958e38425728623c98.txt' target='_blank'>flag</a></li> </ul>" | grep -oP '/documents/[\w\s]+\.txt'
/documents/flag_11dfa168ac8eb2958e38425728623c98.txt