一、不同浏览器使用

1. chrome

需要安装chrome对应版本的chromedriver

from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--mute-audio")            # 不开声音
# 设置代理
options.add_argument('--proxy-server=http://127.0.0.1:8118')
options.add_argument('--ignore-certificate-errors')   # 忽略证书错误请求
# options.add_argument('--headless')
# options.add_argument('--disable-gpu')
# options.add_argument('--no-sandbox')
# options.add_argument('window-size=800x600')

driver = webdriver.Chrome(options=options)      # 初始化一个driver
...
driver.quit()                                   # 退出一个driver

二、界面实例操作

1. 窗口操作

driver.set_window_size(760, 730)        # 设置窗口大小
print(driver.get_window_size())         # 获取当前窗口大小
driver.maximize_window()                # 最大化窗口

driver.get("https://www.baidu.com")     # 开启窗口访问网页

print(driver.title)                     # 当前窗口的标题

2. 标签页操作

# 打印所有的标签页
>>> print(driver.window_handles)
['CDwindow-E0BA79545D54D4F0C2397673A8BCE22F']
# 切换到特定的标签页
>>> driver.switch_to.window(driver.window_handles[1])

3. 查找界面元素

主要使用find_element()和find_elements()
其中find_element()返回第一个
find_elements()返回列表

from selenium import webdriver
from selenium.webdriver.common.by import By

item = driver.find_element(By.CLASS_NAME,"item-list")
item_list = driver.find_elements(By.CLASS_NAME,"item-list")

几种类型见下

3.1. XPATH

(1) 标签内内容

1	<button type="button">Geeks For Geeks</button>

可以只输入一半进行匹配

1	button = driver.find_element(By.XPATH, "//button[contains(text(), 'Geeks for Geeks')]")

(2) 标签特定属性匹配

1	<button type="button" class="test">Geeks For Geeks</button>

# 只选一个匹配
button = driver.find_element(By.XPATH, "//button[@type='button']")
# 与关系匹配
button = driver.find_element(By.XPATH, "//button[@type='button' and @class='test']")

(3) 层级匹配

<html>
 <body>
  <div>
   <button type="button" class="test">Geeks For Geeks</button>
   <button type="button1" class="test">Geeks For Geeks</button>
  </div>
 </body>
</html>

1	button = driver.find_element(By.XPATH, "/html/body/div/button[@type='button']")

(4) 获取父级

<html>
 <body>
  <div>
   <button type="button" class="test">Geeks For Geeks</button>
   <button type="button1" class="test">Geeks For Geeks</button>
  </div>
 </body>
</html>

1 2	button = driver.find_element(By.XPATH, "//button[@type='button']") div_par = button.find_element(By.XPATH, "..")

3.2. CLASS_NAME

1	<xxx class='abc def'></xxx>

1
2
3

# 可以使用abc也可以使用def
xxx = driver.find_element(By.CLASS_NAME, 'abc')
xxx = driver.find_element(By.CLASS_NAME, 'def')

4. 等待页面加载完成

4.1. 根据特定标签元素进行判断

引入WebDriverWait类，超出timeout后抛出异常
默认探测步长0.5s

class WebDriverWait(object):
    def __init__(self, driver, timeout, poll_frequency=POLL_FREQUENCY, ignored_exceptions=None):
        """Constructor, takes a WebDriver instance and timeout in seconds.

           :Args:
            - driver - Instance of WebDriver (Ie, Firefox, Chrome or Remote)
            - timeout - Number of seconds before timing out
            - poll_frequency - sleep interval between calls
              By default, it is 0.5 second.
            - ignored_exceptions - iterable structure of exception classes ignored during calls.
              By default, it contains NoSuchElementException only.

           Example::

            from selenium.webdriver.support.wait import WebDriverWait \n
            element = WebDriverWait(driver, 10).until(lambda x: x.find_element(By.ID, "someId")) \n
            is_disappeared = WebDriverWait(driver, 30, 1, (ElementNotVisibleException)).\\ \n
                        until_not(lambda x: x.find_element(By.ID, "someId").is_displayed())
        """
        ...

使用until()和until_not()两个方法进行判断

# 10s内method成功就返回，否则抛出异常
WebDriverWait(driver, 10).until(method，message="")
# 10s内method不成功就返回，否则抛出异常
WebDriverWait(driver, 10).until_not(method，message="")

结合expected_conditions来进行判断

expected_conditions类提供的预期条件判断的方法

方法	说明
title_is	判断当前页面的 title 是否完全等于（==）预期字符串，返回布尔值
title_contains	判断当前页面的 title 是否包含预期字符串，返回布尔值
presence_of_element_located	判断某个元素是否被加到了 dom 树里，并不代表该元素一定可见
visibility_of_element_located	判断元素是否可见（可见代表元素非隐藏，并且元素宽和高都不等于 0）
visibility_of	同上一方法，只是上一方法参数为locator，这个方法参数是定位后的元素
presence_of_all_elements_located	判断是否至少有 1 个元素存在于 dom 树中。举例：如果页面上有 n 个元素的 class 都是’wp’，那么只要有 1 个元素存在，这个方法就返回 True
text_to_be_present_in_element	判断某个元素中的 text 是否包含了预期的字符串
text_to_be_present_in_element_value	判断某个元素中的 value 属性是否包含了预期的字符串
frame_to_be_available_and_switch_to_it	判断该 frame 是否可以 switch进去，如果可以的话，返回 True 并且 switch 进去，否则返回 False
invisibility_of_element_located	判断某个元素中是否不存在于dom树或不可见
element_to_be_clickable	判断某个元素中是否可见并且可点击
staleness_of	等某个元素从 dom 树中移除，注意，这个方法也是返回 True或 False
element_to_be_selected	判断某个元素是否被选中了,一般用在下拉列表
element_selection_state_to_be	判断某个元素的选中状态是否符合预期
element_located_selection_state_to_be	跟上面的方法作用一样，只是上面的方法传入定位到的 element，而这个方法传入 locator
alert_is_present	判断页面上是否存在 alert

实例

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By

# 20s内等待<xxx class='item-list'>加载到dom树后获取元素#设置等待
wait = WebDriverWait(driver, 20)
wait.until(EC.presence_of_element_located((By.CLASS_NAME, "item-list")))

# 自定义判断
wait.until(lambda diver: driver.find_element(By.CLASS_NAME, "item-list"))

5. 鼠标移动点击

from selenium.webdriver.common.action_chains import ActionChains

ActionChains(driver).move_by_offset(400, 300).perform()     # 鼠标移动到 (400, 300)
ActionChains(driver).click().perform()                      # 鼠标左键点击

# 鼠标移动到元素上面，针对部分悬浮显示的情况
ActionChains(driver).move_to_element(element).perform()

6. 截图

1	driver.get_screenshot_as_file('test.png')

7. 获取标签的属性

1	<img src='/a/b/c.jpg' class='abc'>

1	>>> driver.find_element(By.CLASS_NAME, 'abc').

8. iframe处理

碰到iframe的时候，无法进行iframe内部的元素查找操作，需要进行切换

1	<iframe id='dpa' ...>...</iframe>

1	driver.switch_to.frame('dpa')

9. 页面缩放

1 2	# 大于1就是放大，小于1缩小 >>> driver.execute_script("document.body.style.zoom='1.5'")

10. 表单提交

遇到使用<form>标签的
对<input type="submit">元素无法点击，可以使用

1 2	>>> tmp = driver.find_element(By.CLASS_NAME, value='submit-button') >>> tmp.submit()

11. 执行js脚本

11.1. 语法

使用arguments[n]可以取后面的web_element使用

1	driver.execute_script('arguments[0].parentNode.removeChild(arguments[0])', tmp)

11.2. 示例1: 删除一个标签

1 2	tmp = driver.find_element(By.CLASS_NAME, 'submit-button') driver.execute_script('arguments[0].parentNode.removeChild(arguments[0])', tmp)

三、请求相关

1. seleniumwire获取请求头部信息

selenium不支持获取请求头部信息了，需要使用seleniumwire包

1	pip install selenium-wire

获取请求头和响应头部信息

from seleniumwire import webdriver

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--no-sandbox')

driver = webdriver.Chrome(options=options)
wait = WebDriverWait(driver, 60)
driver.maximize_window()
driver.get("https://www.baidu.com")
for re in driver.requests:
    req = dict(re.headers)
    res = dict(re.response.headers)
    print(req)
    print(res)

技术的路上奔跑

python selenium使用记录

一、不同浏览器使用

1. chrome

二、界面实例操作

1. 窗口操作

2. 标签页操作

3. 查找界面元素

3.1. XPATH

(1) 标签内内容

(2) 标签特定属性匹配

(3) 层级匹配

(4) 获取父级

3.2. CLASS_NAME

4. 等待页面加载完成

4.1. 根据特定标签元素进行判断

实例

5. 鼠标移动点击

6. 截图

7. 获取标签的属性

8. iframe处理

9. 页面缩放

10. 表单提交

11. 执行js脚本

11.1. 语法

11.2. 示例1: 删除一个标签

三、请求相关

1. seleniumwire获取请求头部信息