728x90
반응형
Beautiful Soup → HTML의 DOM을 쉽게 찾아준다
1. BeautifulSoup 설치
① 사이트
Beautiful Soup Documentation — Beautiful Soup 4.4.0 documentation
Non-pretty printing If you just want a string, with no fancy formatting, you can call unicode() or str() on a BeautifulSoup object, or a Tag within it: str(soup) # ' I linked to example.com ' unicode(soup.a) # u' I linked to example.com ' The str() functio
beautiful-soup-4.readthedocs.io
② 설치
Terminal→ pip install beautifulsoup4
③ 결과
2. 예제
- 예제 1 소스 + a tag("href"), p tag 찾기
from bs4 import BeautifulSoup
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
- 예제 2 : 찾기
① 이름으로 찾기
m_body = soup.body
# print(m_body) - body 찾기
m_h1 = soup.h1
# print(m_h1) - h1 찾기
② 속성으로 찾기
m_h1 = soup.h1
# print(m_h1) - h1 찾기
③ class로 찾기
④ id로 찾기
m_hello = soup.find(id="hello")
print(m_hello)
⑤ css 선택기
target = soup.select_one("#focusPanelCenter .panel_inner img")
#print(target)
title = target["alt"]
print(title)
728x90
반응형
'Programming > Python' 카테고리의 다른 글
Python Numpy 1강 - 배열 (0) | 2021.10.20 |
---|---|
Python 8강 - 데이터 크롤링 (0) | 2021.10.06 |
Python 6강 - Crawling (0) | 2021.09.20 |
Python 5강 - Web에서 구동 (0) | 2021.09.13 |
Python 4강 - 통신 (0) | 2021.09.13 |