• alley
    2020-01-04
    老师我按你的步骤,pip3分别安装了bs4和lxml;一直报这个错误
    File "D:\python\lib\site-packages\bs4\__init__.py", line 228, in __init__
        % ",".join(features))
    bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

    作者回复: 先用pip安装 lxml 试一下

    
    
  • Metamorphosis
    2019-09-09
    老师请问一下你那个html_doc咋导入的

    作者回复: html_doc 是网页点鼠标右键查看源代码获取的

    
    
  • 程序员人生
    2019-08-06
    from bs4 import BeautifulSoup
    import requests

    headers = {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
        "Accept-Language": "zh-CN,zh;q=0.8",
        "Connection": "close",
        "Cookie": "_gauges_unique_hour=1; _gauges_unique_day=1; _gauges_unique_month=1; _gauges_unique_year=1; _gauges_unique=1",
        "Referer": "http://www.infoq.com",
        "Upgrade-Insecure-Requests": "1",
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER"
    }

    #url = 'http://www.infoq.com/cn/news'
    url = 'http://news.baidu.com/'

    def craw(url):
        response = requests.get(url, headers=headers)

        soup = BeautifulSoup(response.text, 'lxml')

        for hotnews in soup.find_all('div', class_='hotnews'):
            for news in hotnews.find_all('a'):
                print(news.text,end=' ')
                print(news.get('href'))


    craw(url)

    换了个网页爬爬
    展开

    作者回复: 不错,静态页面都可以用这种方式爬取的

    
    
  • 夜尽天明
    2019-07-25
    print(soup.p['class'])
    为什么匹配的是第一个title 而不是story

    作者回复: soup.p['class'] 默认取第一个,soup.find_all('p')取所有的p标签

    
    
  • 100执行%
    2018-10-18
    pip3安装bs4报错 之前安装requests都可以的 报错信息如下
    07Exception:
    Traceback (most recent call last):
      File "c:\users\acadsoc\appdata\local\programs\python\python36\lib\site-package
    s\pip\_vendor\urllib3\response.py", line 331, in _error_catcher
        yield
      File "c:\users\acadsoc\appdata\local\programs\python\python36\lib\site-package
    s\pip\_vendor\urllib3\response.py", line 413, in read
        data = self._fp.read(amt)
      File "c:\users\acadsoc\appdata\local\programs\python\python36\lib\site-package
    s\pip\_vendor\cachecontrol\filewrapper.py", line 62, in read
        data = self.__fp.read(amt)
      File "c:\users\acadsoc\appdata\local\programs\python\python36\lib\http\client.
    py", line 449, in read
        n = self.readinto(b)
      File "c:\users\acadsoc\appdata\local\programs\python\python36\lib\http\client.
    py", line 493, in readinto
        n = self.fp.readinto(b)
      File "c:\users\acadsoc\appdata\local\programs\python\python36\lib\socket.py",
    line 586, in readinto
        return self._sock.recv_into(b)
      File "c:\users\acadsoc\appdata\local\programs\python\python36\lib\ssl.py", lin
    e 1009, in recv_into
        return self.read(nbytes, buffer)
      File "c:\users\acadsoc\appdata\local\programs\python\python36\lib\ssl.py", lin
    e 871, in read
        return self._sslobj.read(len, buffer)
      File "c:\users\acadsoc\appdata\local\programs\python\python36\lib\ssl.py", lin
    e 631, in read
        v = self._sslobj.read(len, buffer)
    socket.timeout: The read operation timed out

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "c:\users\acadsoc\appdata\local\programs\python\python36\lib\site-package
    s\pip\_internal\cli\base_command.py", line 143, in main
        status = self.run(options, args)
      File "c:\users\acadsoc\appdata\local\programs\python\python36\lib\site-package
    s\pip\_internal\commands\install.py", line 318, in run
        resolver.resolve(requirement_set)
      File "c:\users\acadsoc\appdata\local\programs\python\python36\lib\site-package

    展开

    作者回复: 试试下载离线包安装一下,上面的错误提示并没有明确错误的原因

    
    
  • ol #丁
    2018-07-27
    老师,bs4库去哪里能下载到?有网址吗?给一个?谢谢🙏 ,爬虫用到的库一般都去哪里下载?

    作者回复: 都是通过pip这个工具安装的,由于最近pip有升级,因此在安装软件包之前需要升级pip程序
    使用 pip install --upgrade pip 就可以更新为最新版,使用pip install bs4 可以通过网络自动下载并安装 bs4库

    
    
我们在线,来聊聊吧