Python Coding Help - Finding BeautifulSoup

Posted on
Tue Jul 02, 2019 8:03 am
stanleykrasnow offline
Posts: 28
Joined: Dec 10, 2018

Python Coding Help - Finding BeautifulSoup

I am trying to use Indigo to schedule a Python script which contains the use of BeautifulSoup to scrape a web site. I use Anaconda / Spyder (2.7) to write and maintain the Python script. The Python script works in the Spyder environment, but fails when running in Indigo.

Under Indigo the environment cannot find BeautifulSoup " embedded script, line 57, at top level ImportError: No module named bs4", despite trying to include it using the Python invocation: sys.path.append ("/anaconda2/lib/python2.7/site-packages"). Can someone please help be get to BeautifulSoup? Line 57 is: from bs4 import BeautifulSoup # to parse the HTML on a web page

Thank you,

Stan Krasnow

My python code is:

Code: Select all
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Sat Jun 29 14:46:27 2019

@author: XXX
"""

#  Scrape selected data from CeleriusVI_URL
#
#  Two key websites:
#  https://teamtreehouse.com/community/help-with-pythons-beautiful-soup-html-question
#  https://martechwithme.com/introduction-to-web-scraping-with-python-extracting-data-from-a-page/
#
import sys
# sys.path.append ("/Library/Python/2.7/site-packages")
from indigo_attachments import log_or_print as lop
lop ("Starting Celerius VI Scraping")
#
from random import randint
USER_AGENTS = [
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/57.0.2987.110 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/61.0.3163.79 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) '
     'Gecko/20100101 '
     'Firefox/55.0'),  # firefox
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/61.0.3163.91 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/62.0.3202.89 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/63.0.3239.108 '
     'Safari/537.36'),  # chrome
]

random_agent_count = randint(0, (len(USER_AGENTS)-1))
import requests                 #  to fetch a web page
from datetime import datetime   #  to get today's date for email
#  ###########################################################################
#  anaconda2⁩ ▸ ⁨pkgs⁩ ▸ ⁨beautifulsoup4-4.7.1-py27_1⁩ ▸ ⁨lib⁩ ▸ ⁨python2.7⁩ ▸ ⁨site-packages⁩  done below
#  sys.path.append ("/anaconda2/pkgs/beautifulsoup4-4.7.1-py27_1/lib/python2.7/site-packages")
#  anaconda2⁩ ▸ ⁨lib⁩ ▸ ⁨python2.7⁩ ▸ ⁨site-packages⁩   done below
#  sys.path.append ("/anaconda2/lib/python2.7/site-packages")
#  ###########################################################################
sys.path.append ("/anaconda2/lib/python2.7/site-packages")
from bs4 import BeautifulSoup   #  to parse the HTML on a web page
import smtplib                  #  to send an email
#
headers = {'User-Agent': USER_AGENTS[random_agent_count]}  #  try to avoid scraping defences
CeleriusVI_URL = "https://www.bloomberg.com/quote/ACELVIP:LX"
request = requests.get(CeleriusVI_URL, headers=headers)    # First "fetch" the URL using requests
content = request.content                                  # Then retrieve the content

# Give the web content to BeautifulSoup
soup = BeautifulSoup(content, 'lxml')
#  Get the as of date
#         <span class="fieldLabel__9f45bef7"><span>Total Assets (M EUR) (On 06/28/2019)</span></span>
#         <span>Total Assets (M EUR) (On 06/28/2019)</span>
find_CeleriusVI_as_of_date = soup.find_all("span", class_="fieldLabel__9f45bef7", attrs= "Total Assets (M EUR)")
CeleriusVI_HTML_index = 6         #  trial and error index of SPANs
find_text = find_CeleriusVI_as_of_date[CeleriusVI_HTML_index].text
#  lop (str(find_CeleriusVI_as_of_date) + "XXX")
CeleriusVI_as_of_date = "o" + find_text[6:]                #  clean-up return from BS; add lower case o in the word On
CeleriusVI_as_of_date = CeleriusVI_as_of_date[:-1]         #  clean-up return from BS;
#  Get the Celerius Current Asset Value
#  <span class="fieldValue__2d582aa7">82.348</span>
find_CeleriusVI_Value = soup.find_all("span", class_ = "fieldValue__2d582aa7")
CeleriusVI_HTML_index = 7          #  trial and error index of SPANs
CeleriusVI_Value = find_CeleriusVI_Value[CeleriusVI_HTML_index].text

euro = u"€"                                                 #  unicode character for printing
lop (CeleriusVI_as_of_date + " the value of Celerius VI is " + euro + CeleriusVI_Value + "m")
lop ("End of Celerius VI Scraping")
#

Posted on
Tue Jul 02, 2019 9:17 am
FlyingDiver offline
User avatar
Posts: 7189
Joined: Jun 07, 2014
Location: Southwest Florida, USA

Re: Python Coding Help - Finding BeautifulSoup

Do "sudo pip install bs4" in Terminal so that the library gets installed where Python expects it to be.

Also, please use CODE tags when posting Python code.

joe (aka FlyingDiver)
my plugins: http://forums.indigodomo.com/viewforum.php?f=177

Posted on
Tue Jul 02, 2019 9:41 am
jay (support) offline
Site Admin
User avatar
Posts: 18200
Joined: Mar 19, 2008
Location: Austin, Texas

Re: Python Coding Help - Finding BeautifulSoup

You seem to have multiple copies of Python installed, which is known to cause path issues with Indigo. Specifically, you're attempting to point to /anaconda2/lib/python2.7/site-packages which is not a standard python path for the built-in python installation on macOS (and python2 is not the standard name of the python 2.7 executable in macOS).

Jay (Indigo Support)
Twitter | Facebook | LinkedIn

Posted on
Tue Jul 02, 2019 9:42 am
FlyingDiver offline
User avatar
Posts: 7189
Joined: Jun 07, 2014
Location: Southwest Florida, USA

Re: Python Coding Help - Finding BeautifulSoup

If all you need to do is run the script on a schedule, why not use cron? Much simpler than trying to make this work in Indigo.

joe (aka FlyingDiver)
my plugins: http://forums.indigodomo.com/viewforum.php?f=177

Posted on
Tue Jul 02, 2019 9:47 am
jay (support) offline
Site Admin
User avatar
Posts: 18200
Joined: Mar 19, 2008
Location: Austin, Texas

Re: Python Coding Help - Finding BeautifulSoup

FlyingDiver wrote:
If all you need to do is run the script on a schedule, why not use cron? Much simpler than trying to make this work in Indigo.


That's definitely a matter of opinion! :lol: :D

Jay (Indigo Support)
Twitter | Facebook | LinkedIn

Posted on
Tue Jul 02, 2019 9:48 am
FlyingDiver offline
User avatar
Posts: 7189
Joined: Jun 07, 2014
Location: Southwest Florida, USA

Re: Python Coding Help - Finding BeautifulSoup

jay (support) wrote:
FlyingDiver wrote:
If all you need to do is run the script on a schedule, why not use cron? Much simpler than trying to make this work in Indigo.


That's definitely a matter of opinion! :lol: :D


Well, the scheduling part is harder, but making it work in the standard Python environment is yet to be seen...

joe (aka FlyingDiver)
my plugins: http://forums.indigodomo.com/viewforum.php?f=177

Posted on
Tue Jul 02, 2019 10:14 am
jay (support) offline
Site Admin
User avatar
Posts: 18200
Joined: Mar 19, 2008
Location: Austin, Texas

Re: Python Coding Help - Finding BeautifulSoup

Actually, it just occurred to me that if he changed the script to executable (chmod +x name_of_script.py) then he could execute it from the Run Shell Script action as-is (it would be run in his specified python2 install). He will have to specify the full path to his python2 executable in the shebang line rather than using the env command since it's not likely that his python2 install is in the default path:

Code: Select all
#!/full/path/to/custom/python2


I still believe however that sorting out the multiple python install issue is the way to avoid this and other issues in the future.

Jay (Indigo Support)
Twitter | Facebook | LinkedIn

Page 1 of 1

Who is online

Users browsing this forum: No registered users and 4 guests