Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trialJohn Koch
1,781 PointsQuiz quest in ESCAPE HATCHES tut wrong. Question: If I want to match the character Ö, which escape sequence should I Use
the Ö character is not a unicode character and therefore regex would match this using \W and NOT \w like your answer says.
Here is what your answer says when answering it CORRECTLY.
Bummer! \W matches anything that isn't an Unicode word character, so it would actually ignore the above character. \w will match the code, though.
6 Answers
Adam Grandquist
5,855 PointsThe statement from the teacher is technically correct. The confusion is that the simplification ( a-Z,0-9,_ ) only applies to the ASCII encoding that most English speakers use. When a Unicode encoding is used as default in Python3 the escape matches any character that could be used in a word in a dictionary. As a side note digits and underscore are included because they get used in programming identifiers.
Is an environment problem. The site calls out that they are using JavaScript regexes. Unfortunately there is only general consensus in designing mini languages like this. This is one of the cases where the language used to write the program (Python, JavaScript, etc) changes how the regex works. Even though the documentation uses the same name (Regex). The only way to avoid this is to use a Python environment for testing Regexes for use in Python.
Hope that clarifies things a little.
Adam Grandquist
5,855 PointsThe quiz has the correct answer.
In the cases of characters(\w
), whitespace(\s
), and decimals(\d
) the lowercase escape matches the category. While the uppercase escape would match the absence of a unicode category.
https://docs.python.org/3/library/re.html has a useful cheat sheet in the first section
Anthony Attard
43,915 PointsÖ is an UTF-8 character. The character code is Ö
You can view more details here: http://www.periodni.com/unicode_utf-8_encoding.html#german_special_characters
John Koch
1,781 PointsÖ with the to dots above the "0" is not a unicode character, therefore the NON-unicode character "Ö" would be matched with \W (uppercase W).
Cheers, John
Adam Grandquist
5,855 PointsMy local install of python matches Ö
as a unicode character.
Why would you expect it to not have the unicode character property?
Python 3.4.3 (default, May 1 2015, 19:14:18)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.49)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> test = "Ö"
>>> import re
>>> re.match(r'\W', test) # returns None so not printed
>>> match = re.match(r'\w', test)
>>> match
<_sre.SRE_Match object; span=(0, 1), match='Ö'>
John Koch
1,781 PointsHi Adam,
This is why I'm so confused.
In one of the previous lessons the teacher said: "\w matches any Unicode word character, which is any letter, digit and underscore. ( a-Z,1-9,_ )
When I go to http://regexpal.com/, if I type in \W (uppercase), it matches Ö. Lowercase \w does not match. Here's a screenshot on Regexpal.com showing that it matches \W (uppercase). Here's a screenshot: https://www.dropbox.com/s/jwfyc1flrxtediq/regex-uppercase-w.png?dl=0
Thanks, John
John Koch
1,781 PointsWell said Adam.
I'll put that in my regex notes. Thanks for taking the time to help me figure that out.
Cheers, John