Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

View Challenge

Posted March 5, 2015 10:56pm by

Create a function named find_words that takes a count and a string

I have been stuck on this for a while and maybe I'm not understanding what they are asking. Please help!

word_length.py

import re

# EXAMPLE:
# >>> find_words(4, "dog, cat, baby, balloon, me")
# ['baby', 'balloon']

def find_words(count, string):
    count = str(6)
    return re.findall(r'/w{'+ count + ', }', string)

11 Answers

March 5, 2015 11:22pm

Hey Shawndee, you are very close. The challenge asks:

Create a function named find_words that takes a count and a string. Return a list of all of the words in the string that are count word characters long or longer.

Your regular expression needed two fixes: change /w to \w, and remove space later part of string changing ', }' to ',}'.

The updated line is now:.

    return re.findall(r'\w{'+ str(count) + ',}', string) #<-- cast count as string

An alternate to using the range specifiers "{}", you can multiply the character search \w by count then check for optional additional characters with \w* as shown below:

    return re.findall(r'\w' * count + '\w*', string)

Finally you could also use the string format method:

def find_words(cnt,s):
    return re.findall(r'\w{{{:d},}}'.format(cnt), s)

Note the need to escape the braces to differentiate between re braces and format braces.

June 20, 2015 7:31pm

Hey Chris,

when I tried you're suggestion:

return re.findall(r'\w{'+ count + ',}', string)

I got an error: TypeError: Can't convert 'int' object to str implicitly.

After searching around, I landed on this solution:

return re.findall(r'\w{'+ str(count) + ',}', string)

It works, but I am wondering if there is a better way to do it?

Thanks!

June 20, 2015 8:15pm

Thanks for the correction. I've updated my first solution to cast count as string. The second answer works correctly as is.

The funny part, is when I tested the my original first solution, I had included your code count = str(6), which evidently is the correct answer.

Not sure what you mean by is there a better way? Do you mean an alternative regular expression, or some other string parsing approach?

I am curious what you had in mind.

There are certainly other ways such as splitting the input string on spaces, then looping over each word and checking it's length. Using regular expressions may seem complex but re is some of the most optimized library code available built explicitly for parsing strings.

July 9, 2015 3:21am

return re.findall(r'\w{{{:d},}}'.format(cnt), s)

Why there are 2 pair of {} cover {:d},? Why one pair of {} is not right?

July 9, 2015 3:35am

A single-{} is interpreted by the format method as field marker to be replaced. The double-{} are used to "escape" this substitution. At parsing, format replaces a double-{} with a single-{} without doing a field substitution. The regex can then interpret the formatted string containing the remaining single-{}. Hope this helps.

February 16, 2016 7:29am

:d for ?

February 16, 2016 8:15am

The :d indicates to the format() method to format as a "decimal" integer (not as a float)

April 1, 2020 2:15am

Hi Chris,

If we didn't need to add the comma, after count would it just be '\w{{:d}}'

April 1, 2020 2:23am

If the comma isn’t needed to set a range of values, it would be triple curly brackets:

    return re.findall(r'\w{{{:d}}}'.format(cnt), s)

The outer two curly brackets result in one printed curly bracket. The inner curly brackets mark the substitution field that is consumed in the variable substitution for the field.

Post back if you need more help. Good luck!!!

April 1, 2020 3:56am

So format always goes from outside to inside when escaping {}'s correct?

so format see's the outside {{}}'s and replaces it with just {} so we're left with {{:d},}

and then format replaces {:d} with count so we're left with {count,} and then the regex can evaluate {count,}

I guess what;s still really hindering my understand of this is why doesn't format escape again when it sees {{:d},}... shouldn't this trigger format to "simplify" the brackets again and only be left with {:d},? Or does format only escape once meaning if I have {{{{}}}}, format will remove only the outer most brackets and keep the rest

I'm sorry if my rambling is hard to understand.. my mind is jumbled from spending way too much time trying to understand this.

April 1, 2020 6:10am

It’s not so much as outside-in as it is {{ become a { ignored by format. Same goes for }} is seen as an ignored }

When parsing, format does a single pass. to look for fields to replace. If it sees and opening { it thinks, “Ooh, here’s a field”. But if it immediately sees another { it says “dang, the pair is just the printable char {, keep looking for start of field”. Once an actual start of field is detected, it begins looking for the end of field, ignoring double }} in a similar manner.

{ and } tagged as part of a field will be replaced with the field contents.

September 11, 2015 7:30pm

This is what I came up with and it passed:

def find_words(count, string):
  re_string = re.findall(r'\w+' * count, string)
  return re_string

However, I have a question here. We we've been said that count is a number, which means it has to be an int. Adding this:

count = int(count)

won't break anything. What is wrong with this though?

def(int(count), string)

I guess it is a very basic or kind of stupid question. I know it won't be useful to add this, or anything, but I'm just curious.

[edited format --cf]

September 12, 2015 4:07am

It is allowable to pass a function as an argument. In fact, some functions require that the first argument is a function such as map.

In your example, it is simply illegal syntax. If it were legal, the issue with using int(count) as an argument is that it would be evaluated at compile time and then try to used the non-mutable return value: (an integer) as the argument variable. :

def find_words(int(count), string): #<-- illegal syntax
    pass

# given the argument of count=5, turns into 

def find_words(5, string):
    pass

You could, of course, process the arguments on the call:

find_words(int(count), string)

# which is the same as:
find_words(5, string)

September 12, 2015 8:50am

Thank you, sir. It makes sense now.

May 22, 2016 12:13am

very interesting and straight forward approach, compared to every one else in the blog !!

June 25, 2015 7:01pm

This worked for me:

def find_words(cnt, string): return re.findall(r'\w+' * cnt, string)

[edit formatting -cf]

June 30, 2015 2:56am

Interesting solution. It works because the first w+ catches all the characters, then the remaining "cnt - 1" w+ insure the minimum cnt is reached, each absorbing a single character.

This solution is slightly less efficient because the regex parser has to backtrack. But that's tecnical minutiae.

June 24, 2020 2:08pm

What is the difference with the following? In my eyes, we do the same but mine doesn't work.

re.findall(r'\w{count,}\s', string)

June 24, 2020 5:09pm

Katherina Kallis the characters “count” are seen as the literal individual ascii letters “c”, “o”, “u”, “n”, “t” in the pattern and not as the variable label count.

June 21, 2015 4:28pm

this also works

def find_words(cnt,s):
    p=r'\w{%i,}'%cnt
    return re.findall(p,s)

[edited format --cf]

June 21, 2015 6:48pm

Correct, but I would use the string format method:

def find_words(cnt,s):
    return re.findall(r'\w{{{:d},}}'.format(cnt), s)

Note the need to escape the braces to differentiate between re braces and format braces.

June 30, 2015 12:59am

Can you explain why when you use the percent sign method of adding the count to p it works but when you just put count in itself it gives an error? I've got mine working but I'm puzzled as to why it works like that.

June 30, 2015 3:04am

Mason Preusser : The percent method reformats the integer as a string. Using count directly could give you a error trying to concatenate ints and strings depending on your code. Can you post you failing version?

June 30, 2015 3:10am

Sure thing Chris,

import re
def find_words(count, stringy):
    p = r'\w{count,}'
    s = stringy
    return re.findall(p, s)

Now that I look at it and remember the way to modify strings it makes more sense.

[edit formatting -cf]

June 30, 2015 3:20am

Mason, in your case, the characters "count" are being interpreted as the 5 literal string characters and not a the variable name. In this context a "c" following a left brace is not valid regex syntax.

To include the variable count in the regular expression and an in-line fashion look at my first solution in the excepted answer above.

February 16, 2016 7:21am

what %i means

February 16, 2016 8:35am

In the string "r'\w{%i,}' % cnt", the %i is part of the older string formatting syntax.

%i is a field placeholder for an integer.

July 11, 2015 1:44am

Why doesn't this work? :

def find_words(count, input_str):
  return re.findall(r'\w{count,}', input_str)

[edit formatting --cf]

July 11, 2015 2:54am

"count" in your code is a interpreted as characters instead of a variable count. See my response to Mason above.

September 28, 2015 9:54pm

I'm attempting to solve this also and i am confused on some behavior

def find_words(count, arg):
  regex = "r{}{}{}".format("'\w{",count,"}\w*'")
  return re.findall(regex,arg)

however when i put this

regex = "r{}{}{}".format("'\w{",count,"}\w*'")

note i set count = 3 for testing

the result is regex is regex = "r'\\w{3}\\w*'"

why the extra '\' before the 'w' ?

I'm attempting to get

regex to equal r'\w{3}\w*'

can anyone help??

[edit: added backticks to show \\ -cf]

September 28, 2015 10:20pm

The "r" is used to declare the string to be a raw string. It's not part of the regex and should be outside the quotes.

Moving the "r" outside, then removing the unnecessary single quotes in the format arguments, gives following passing code:

def find_words(count, arg):
  regex = r"{}{}{}".format("\w{",count,"}\w*")
  return re.findall(regex,arg)

Note, you will always see the double \\ when inspecting variables. It's the python shell's why of showing it's a literal backslash. The second slash is not really there:

In [74]: count = 3

In [75]: regex = r"{}{}{}".format("\w{",count,"}\w*")

In [76]: regex

Out[76]: '\\w{3}\\w*'

In [77]: print regex
\w{3}\w*

In [78]: len(regex)

Out[78]: 8

December 22, 2015 3:15am

<p>This worked for me</p>

def find_words(count,string):
  pattern = (r'\w{%d,}') % (count)
  return re.findall(pattern,string)

August 8, 2016 11:37am

I know I'm late to the party, but can someone explain one thing? In the following answer, and I'm asking about this one because it seems to be the answer the instructor is looking for, before the string variable you have this: ',}', I do not understand what the comma before the parentheses does. If you take out the count variable, which would include + str(count) + ' and replaced it will a whole number like 6, you'd have {6,} Here is the full string i'm referring to, thank you in advance for any help.

return re.findall(r'\w{'+ str(count) + ',}', string)

August 9, 2016 1:27am

When using curly braces "{ }" to specify a count range for the preceding regex token, the first number is the minimum and the second is the maximum allowed to match. If the second number is omitted it means "or more". So {6,} means 6 or more.

August 9, 2016 12:12pm

That was driving me crazy, Chris. Thank you for the response!

November 27, 2016 7:55am

This is what I came up with it passes but I am not sure if this is the best way after reading some of the other posts. Code:

def find_words(count, string):
    return re.findall(r"\w+" * count, string)

Also, am I getting this right? \w to find all the Unicode character, such as "dog, cat, baby, balloon, me" \w+ says there are at least one or more characters then we take the \w+ multiple by say for example 4 returns -> baby (\w+\w+\w+\w+) and balloon(\w+\w+\w+\w+ plus longer than it)

November 27, 2016 7:11pm

Your regex works but it looses performance with unnecessary extra "+" each causing extra backtracking during matching.

July 21, 2018 7:21pm

What if the count test went in as 7? Then it would be incorrect. You shouldn't have to explicitly set the count number.

Posting to the forum is only allowed for members with active accounts.
Please sign in or sign up to post.

Welcome to the Treehouse Community

Looking to learn something new?

Shawndee Boyd

Shawndee Boyd

Create a function named find_words that takes a count and a string

11 Answers

Chris Freeman

Chris Freeman

Zachary Vacek

Zachary Vacek

Chris Freeman

Chris Freeman

Julie Pham

Julie Pham

Chris Freeman

Chris Freeman

Abdillah Hasny

Abdillah Hasny

Chris Freeman

Chris Freeman

Timothy Tseng

Timothy Tseng

Chris Freeman

Chris Freeman

Timothy Tseng

Timothy Tseng

Chris Freeman

Chris Freeman

Grzegorz Gancarczyk

Grzegorz Gancarczyk

Chris Freeman

Chris Freeman

Grzegorz Gancarczyk

Grzegorz Gancarczyk

hector villasano

hector villasano

Philippe Mitton

Philippe Mitton

Chris Freeman

Chris Freeman

Katherina Kallis

Katherina Kallis

Chris Freeman

Chris Freeman

Robert Erick

Robert Erick

Chris Freeman

Chris Freeman

Mason Preusser

Mason Preusser

Chris Freeman

Chris Freeman

Mason Preusser

Mason Preusser

Chris Freeman

Chris Freeman

Abdillah Hasny

Abdillah Hasny

Chris Freeman

Chris Freeman

Mario Ibanez

Mario Ibanez

Chris Freeman

Chris Freeman

luke hammer

luke hammer

Chris Freeman

Chris Freeman

Bikram Mann

Bikram Mann

Dana Kennedy

Dana Kennedy

Chris Freeman

Chris Freeman

Dana Kennedy

Dana Kennedy

Yuda Leh

Yuda Leh

Chris Freeman

Chris Freeman