Quiz quest in ESCAPE HATCHES tut wrong. Question: If I want to match the character Ö, which escape sequence should I Use

Question

the Ö character is not a unicode character and therefore regex would match this using \W and NOT \w like your answer says.

Here is what your answer says when answering it CORRECTLY.

Bummer! \W matches anything that isn't an Unicode word character, so it would actually ignore the above character. \w will match the code, though.

Answer 1 · 2015-07-09T20:31:43Z

July 9, 2015 8:31pm

The statement from the teacher is technically correct. The confusion is that the simplification ( a-Z,0-9,_ ) only applies to the ASCII encoding that most English speakers use. When a Unicode encoding is used as default in Python3 the escape matches any character that could be used in a word in a dictionary. As a side note digits and underscore are included because they get used in programming identifiers.
Is an environment problem. The site calls out that they are using JavaScript regexes. Unfortunately there is only general consensus in designing mini languages like this. This is one of the cases where the language used to write the program (Python, JavaScript, etc) changes how the regex works. Even though the documentation uses the same name (Regex). The only way to avoid this is to use a Python environment for testing Regexes for use in Python.

Hope that clarifies things a little.

Answer 2 · 2015-07-09T16:32:46Z

July 9, 2015 4:32pm

The quiz has the correct answer.

In the cases of characters(\w), whitespace(\s), and decimals(\d) the lowercase escape matches the category. While the uppercase escape would match the absence of a unicode category.

https://docs.python.org/3/library/re.html has a useful cheat sheet in the first section

Answer 3 · 2015-08-27T03:53:28Z

August 27, 2015 3:53am

Ö is an UTF-8 character. The character code is Ö

You can view more details here: http://www.periodni.com/unicode_utf-8_encoding.html#german_special_characters

Answer 4 · 2015-07-09T17:50:17Z

July 9, 2015 5:50pm

Ö with the to dots above the "0" is not a unicode character, therefore the NON-unicode character "Ö" would be matched with \W (uppercase W).

Cheers, John

Answer 5 · 2015-07-09T20:15:17Z

July 9, 2015 8:15pm

Hi Adam,

This is why I'm so confused.

In one of the previous lessons the teacher said: "\w matches any Unicode word character, which is any letter, digit and underscore. ( a-Z,1-9,_ )
When I go to http://regexpal.com/, if I type in \W (uppercase), it matches Ö. Lowercase \w does not match. Here's a screenshot on Regexpal.com showing that it matches \W (uppercase). Here's a screenshot: https://www.dropbox.com/s/jwfyc1flrxtediq/regex-uppercase-w.png?dl=0

Thanks, John

Answer 6 · 2015-07-09T20:35:13Z

July 9, 2015 8:35pm

Well said Adam.

I'll put that in my regex notes. Thanks for taking the time to help me figure that out.

Cheers, John

Welcome to the Treehouse Community

Looking to learn something new?

John Koch

John Koch

Quiz quest in ESCAPE HATCHES tut wrong. Question: If I want to match the character Ö, which escape sequence should I Use

6 Answers

Adam Grandquist

Adam Grandquist

Adam Grandquist

Adam Grandquist

Anthony Attard

Anthony Attard

John Koch

John Koch

Adam Grandquist

Adam Grandquist

John Koch

John Koch

John Koch

John Koch