Need help with the regular expression.

Question

when I use this below script it complains about the string pattern

import urllib.request
import re
from re import findall
page = urllib.request.urlopen('https://sonar.com/prod/drilldown/measures/1274530?metric=coverage')
data = page.read()
print(page.read())
html = page.read()
htmlStr = html.decode()
print(data)
print(re.findall(r'[coverag]{8}', data))

Traceback (most recent call last):
  File "C:\Users\a530614\Documents\pythons\script.py", line 10, in <module>
    print(re.findall(r'\b[coverag]\b', data))
  File "C:\Users\a530614\python\lib\re.py", line 213, in findall
    return _compile(pattern, flags).findall(string)
TypeError: cannot use a string pattern on a bytes-like object

but when use b type it works for me.

import urllib.request
import re
from re import findall
page = urllib.request.urlopen('https://sonar.com/prod/drilldown/measures/1274530?metric=coverage')
data = page.read()
print(page.read())
html = page.read()
htmlStr = html.decode()
print(data)
print(re.findall(b'[coverag]{8}', data))

What is that I am missing here ?

[MOD: added ``` markdown formatting -cf]

Answer 1 · 2016-01-13T19:27:59Z

January 13, 2016 7:27pm

urllib.request.urlopen returns an HTTPResonse object. The .read() method returns a bytes object. Looking at it interactively:

$ python3
Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request

# Get page
>>> page = urllib.request.urlopen('https://teamtreehouse.com/')
>>> type(page)
<class 'http.client.HTTPResponse'>

# Get data
>>> data = page.read()
>>> type(data)
<class 'bytes'>

A bytes object is different from a string object. Using re on a bytes object requires a bytes regex. The "b" signifies that the string should be interpreted as bytes. You can also pair this with the "r" to signify a raw-bytes pattern.

# Using data from above
>>> import re
>>> re.findall(b'Treehouse', data)
[b'Treehouse', b'Treehouse', b'Treehouse', b'Treehouse', b'Treehouse', b'Treehouse']

# Try again with regular string will Fail
>>> re.findall('Treehouse', data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/re.py", line 210, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

Post back if you have if you need more help.

Welcome to the Treehouse Community

Looking to learn something new?

Rajesh Tupakula

Rajesh Tupakula

Need help with the regular expression.

1 Answer

Chris Freeman

Chris Freeman

Chris Freeman

Chris Freeman