• Zhaohong
    2023-04-25 来自北京
    多线程函数中的max_workers参数应该怎么设置呢,越大越好吗

    作者回复: max_workers 没有推荐的数值,因为线程的工作量不同,执行时间长短不同,还有最重要的CPU性能不同。都会导致设置的值不同。要根据实际需求来制定。 例如我之前的工作中使用nginx作为web服务器提供网页服务,nginx就是典型的多进程多线程服务器。进程数量,往往和逻辑CPU数量相同,如果提供的是php网页,且网页多为简单页面和图片,可以将每个核心的线程数量设置为10个,每个线程轮流处理,且不会导致用户卡顿,当短视频业务兴起时,后端是短视频服务器,会减少线程数量,因为用户每次请求和响应时长增加了,业务“重”了, 这时会减少线程数量,让用户不觉得卡顿,也让服务器的CPU占用率低于80% 。 这就是基于不同业务逻辑进行线程数量设置的基本方法 简单来说是一只蜘蛛有一张嘴和八条腿,如果每条腿都拿勺子吃饭,要考虑嘴的大小,勺子的大小,勺子里面的饭的多少,让蜘蛛吃饱又不闲着,让嘴里放得下,勺子也不空着

    
    1
  • 有点怀旧
    2023-04-12 来自河南
    请问这个库如果需要使用公共变量加锁应该怎么使用

    作者回复: 首先,导入多线程的库,再实例化一个锁对象, 加锁,之后在不需要解决锁冲突的地方释放锁。具体代码如下: import threading lock = threading.Lock() # 初始化一个锁对象 for i in range(1, 100): lock.acquire() #申请锁 sum += 1 lock.release() #释放锁

    
    
  • Matthew
    2023-01-17 来自江苏
    import concurrent.futures import urllib.request import ssl # 定义关于 url 的字典,元素为:(网站名称:url) url_records = {'极客时间':'https://time.geekbang.org/', '百度':'https://www.baidu.com', '京东':'https://www.jd.com', '淘宝':'https://www.taobao.com', '天猫':'https://www.tmall.com' } # 定义方法,访问网址并读取首页内容 def load_url(url, timeout): context = ssl._create_unverified_context() # url是https形式 with urllib.request.urlopen(url, timeout=timeout, context=context) as conn: return conn.read() # 通过线程池,并行执行,并将文件内容写入”网站名称.txt“的文件中 with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: future_to_url = {executor.submit(load_url, url_record[1], 60): url_record for url_record in list(url_records.items())} for future in concurrent.futures.as_completed(future_to_url): url_name = future_to_url[future][0] url_path = future_to_url[future][1] try: data = future.result().decode('utf-8') except Exception as exc: print('%r generated an exception: %s' % (url_path, exc)) else: print('%r page is %d bytes' % (url_path, len(data))) with open(f'{url_name}.txt', 'w') as f: f.write(data)
    展开
    
    1
  • Geek_Mike
    2023-08-29 来自云南
    #请使用并发任务模型同时访问 5 个网站,并将网页的数据存储到不同的文件中 import requests import concurrent.futures as cfs urls = ['http://www.baidu.com', 'https://www.taobao.com', 'https://www.jd.com', 'https://www.tmall.com', 'https://cn.bing.com'] def get_pages(url, timeout): response = requests.get(url=url, timeout=timeout) if response.status_code == 200: return response.text else: return None with cfs.ThreadPoolExecutor(max_workers=5) as exe: futures_dict = {exe.submit(get_pages, url, 10): url for url in urls} for future in cfs.as_completed(futures_dict): url = futures_dict[future] try: data = future.result() except Exception: print(f'{url}数据存储出现异常:{Exception}') else: file_path = './' + url.split('/')[-1] + '.html' with open(file_path, 'w') as f: f.write(data) print(f'{url}存储数据{len(data)}字节')
    展开
    
    
  • Cy23
    2023-01-29 来自辽宁
    借鉴学习了下其他同学的,自己写了下, from concurrent.futures import ThreadPoolExecutor, as_completed import urllib.request URLS = [ 'https://time.geekbang.org', 'https://www.taobao.com/', 'https://www.jd.com', 'https://leetcode.cn', 'https://www.zhihu.com' ] def load_url(url, timeout): with urllib.request.urlopen(url, timeout=timeout) as conn: return conn.read() with ThreadPoolExecutor(max_workers=5) as executor: future_to_url = {executor.submit(load_url, url, 60): url for url in URLS} for i,future in enumerate(as_completed(future_to_url)): url = future_to_url[future] try: data = future.result().decode('utf-8') except Exception as exc: print('%r generated an exception: %s' % (url, exc)) else: with open(f'./84/{i+1}.txt', 'a', encoding='utf-8') as f: f.write(data) print('%r page is saved' % (url))
    展开
    
    
  • PatrickL
    2023-01-11 来自上海
    import urllib import concurrent from lxml import etree import os URLS = ['https://time.geekbang.org', 'https://www.csdn.net/', 'https://www.jd.com', 'https://leetcode.cn', 'https://www.zhihu.com'] def load_url(url, timeout): with urllib.request.urlopen(url, timeout=timeout) as conn: return conn.read() with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: future_to_url = {executor.submit(load_url, url, 60): url for url in URLS} for i,future in enumerate(concurrent.futures.as_completed(future_to_url)): url = future_to_url[future] try: data = future.result().decode('utf-8') except Exception as exc: print(f'{url} generated an exception:{exc}') else: rst = etree.HTML(data).xpath('//head/title/text()')[0] print(f'The title of {url} page is "{rst}".') with open(f'{i+1}.txt', 'w') as f: f.write(rst) # os.remove(f'{i+1}.txt')
    展开
    
    