Programming/Python

Python 7강 - HTML Parsing

상맹 2021. 9. 22. 23:46
728x90
반응형

Beautiful Soup → HTML의 DOM을 쉽게 찾아준다


1. BeautifulSoup 설치

① 사이트

 

Beautiful Soup Documentation — Beautiful Soup 4.4.0 documentation

Non-pretty printing If you just want a string, with no fancy formatting, you can call unicode() or str() on a BeautifulSoup object, or a Tag within it: str(soup) # ' I linked to example.com ' unicode(soup.a) # u' I linked to example.com ' The str() functio

beautiful-soup-4.readthedocs.io

② 설치

Terminal→ pip install beautifulsoup4

③ 결과

2.  예제

- 예제 1 소스 + a tag("href"), p tag 찾기

from bs4 import BeautifulSoup

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

실행결과

 

- 예제 2 : 찾기

① 이름으로 찾기

m_body = soup.body
# print(m_body) - body 찾기

m_h1 = soup.h1
# print(m_h1) - h1 찾기

m_body
m_h1

② 속성으로 찾기

m_h1 = soup.h1
# print(m_h1) - h1 찾기

m_h1

③ class로 찾기

④ id로 찾기

m_hello = soup.find(id="hello")
print(m_hello)

⑤ css 선택기

target = soup.select_one("#focusPanelCenter .panel_inner img")
#print(target)
title = target["alt"]
print(title)

 

728x90
반응형

'Programming > Python' 카테고리의 다른 글

Python Numpy 1강 - 배열  (0) 2021.10.20
Python 8강 - 데이터 크롤링  (0) 2021.10.06
Python 6강 - Crawling  (0) 2021.09.20
Python 5강 - Web에서 구동  (0) 2021.09.13
Python 4강 - 통신  (0) 2021.09.13