Skip to content

Commit 5f9597d

Browse files
committed
fix
1 parent 7f5008e commit 5f9597d

File tree

1 file changed

+11
-7
lines changed
  • 9-regular-expressions/03-regexp-character-classes

1 file changed

+11
-7
lines changed

9-regular-expressions/03-regexp-character-classes/article.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Most used are:
4343
: A space symbol: that includes spaces, tabs, newlines.
4444

4545
`\w` ("w" is from "word")
46-
: A "wordly" character: either a letter of English alphabet or a digit or an underscore. Non-english letters (like cyrillic or hindi) do not belong to `\w`.
46+
: A "wordly" character: either a letter of English alphabet or a digit or an underscore. Non-Latin letters (like cyrillic or hindi) do not belong to `\w`.
4747

4848
For instance, `pattern:\d\s\w` means a "digit" followed by a "space character" followed by a "wordly character", like `"1 a"`.
4949

@@ -115,7 +115,7 @@ alert( "Hello, Java!".match(/\bJava!\b/) ); // null (no match)
115115

116116
Once again let's note that `pattern:\b` makes the searching engine to test for the boundary, so that `pattern:Java\b` finds `match:Java` only when followed by a word boundary, but it does not add a letter to the result.
117117

118-
Usually we use `\b` to find standalone English words. So that if we want `"Java"` language then `pattern:\bJava\b` finds exactly a standalone word and ignores it when it's a part of `"JavaScript"`.
118+
Usually we use `\b` to find standalone English words. So that if we want `"Java"` language then `pattern:\bJava\b` finds exactly a standalone word and ignores it when it's a part of another word, e.g. it won't match `match:Java` in `subject:JavaScript`.
119119

120120
Another example: a regexp `pattern:\b\d\d\b` looks for standalone two-digit numbers. In other words, it requires that before and after `pattern:\d\d` must be a symbol different from `\w` (or beginning/end of the string).
121121

@@ -125,6 +125,8 @@ alert( "1 23 456 78".match(/\b\d\d\b/g) ); // 23,78
125125

126126
```warn header="Word boundary doesn't work for non-Latin alphabets"
127127
The word boundary check `\b` tests for a boundary between `\w` and something else. But `\w` means an English letter (or a digit or an underscore), so the test won't work for other characters (like cyrillic or hieroglyphs).
128+
129+
Later we'll come by Unicode character classes that allow to solve the similar task for different languages.
128130
```
129131

130132

@@ -223,13 +225,14 @@ alert( "CS4".match(/CS.4/) ); // null, no match because there's no character for
223225

224226
Usually a dot doesn't match a newline character.
225227

226-
For instance, this doesn't match:
228+
For instance, `pattern:A.B` matches `match:A`, and then `match:B` with any character between them, except a newline.
229+
230+
This doesn't match:
227231

228232
```js run
229233
alert( "A\nB".match(/A.B/) ); // null (no match)
230234

231-
// a space character would match
232-
// or a letter, but not \n
235+
// a space character would match, or a letter, but not \n
233236
```
234237

235238
Sometimes it's inconvenient, we really want "any character", newline included.
@@ -240,7 +243,6 @@ That's what `s` flag does. If a regexp has it, then the dot `"."` match literall
240243
alert( "A\nB".match(/A.B/s) ); // A\nB (match!)
241244
```
242245

243-
244246
## Summary
245247

246248
There exist following character classes:
@@ -255,7 +257,9 @@ There exist following character classes:
255257

256258
...But that's not all!
257259

258-
Modern JavaScript also allows to look for characters by their Unicode properties, for instance:
260+
The Unicode encoding, used by JavaScript for strings, provides many properties for characters, like: which language the letter belongs to (if a letter) it is it a punctuation sign, etc.
261+
262+
Modern JavaScript allows to use these properties in regexps to look for characters, for instance:
259263

260264
- A cyrillic letter is: `pattern:\p{Script=Cyrillic}` or `pattern:\p{sc=Cyrillic}`.
261265
- A dash (be it a small hyphen `-` or a long dash ``): `pattern:\p{Dash_Punctuation}` or `pattern:\p{pd}`.

0 commit comments

Comments
 (0)