Python pandas 3강 - Nan 제거

Programming/Python

Python pandas 3강 - Nan 제거

상맹 2021. 10. 26. 17:25

728x90

Pandas는 NaN을 사용하여 행을 삭제합니다

이 자습서에서는 DataFrame.notna() 및 DataFrame.dropna() 메서드를 사용하여 NaN 값이있는 모든 행을 삭제하는 방법을 설명합니다.

www.delftstack.com

import pandas as pd
import numpy as np

from google.colab import drive
drive.mount("/content/drive")

df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/train.csv")

# 연습
# 조건을 찾아보자
list_test = [
    {"id":1,"password":1234,"age":20},
    {"id":2,"password":1234,"age":25},
    {"id":3,"password":1234,"age":30}
]
df_test = pd.DataFrame(list_test)
print(df_test)
print("="*50)

# 선택되지 않은 것들은 NaN 값이 된다.
print(df_test[df_test>20])

print(df_test.age)
print("="*50)
print(df_test["age"])
print("="*50)
print(df_test.loc[:,["age"]])
print("="*50)
print(df_test.iloc[:2])

# select * from emp where age > 13;   (O)
# select * from emp where 모든 > 13;  (X)
# select * from emp where 3번행 > 13; (X)

# 조건을 통해서 찾는 것 = where
print(df_test[df_test.age > 25])
print("="*50)
print(df_test[df_test.id==2])
print("="*50)

# isin() 함수
print(df_test[df_test.id.isin([1,2])])
print("="*50)

# id가 2번인 것을 찾고 싶다 loc, iloc, []
print(df_test.iloc[[1],[0]])

# print(df.head())

# Survived(타겟 = 종속변인)
# 무엇때문에? feature(특징, 컬럼) Sex, Age, Pclass, Survived

df1 = df.loc[:,["Sex", "Age", "Pclass", "Survived", "Cabin"]]
print(df1.head())
print("="*50)

# Nan 값을 치환, Nan 값을 가진 행을 제거할 수 있고, Nan값을 0으로 채워주기
print(df1.count())
print("="*50)
print(df1.isnull().sum())

# Nan 값을 제거 (Cabin은 0으로 치환, Age는 행을 날려버림)
df2 = df1.dropna(subset=["Age"])

# Nan 값을 치환
df3 = df2.fillna(0)

print(df2)
print("="*50)
print(df2.isnull().sum())

print(df3)
print("="*50)
print(df3.isnull().sum())

# 데이터 정제시에 학습데이터 (80%), 검증데이터 (20%)
# 샘플링 편향이 되면 안된다.

# 10대들의 Age 값을 10으로 변경
df3.loc[(df3.Age <20) & (df3.Age >= 10), ["Age"]] = 10
df4 = df3 
print(df4)
print("="*50)

# 10대들의 행을 제거
# 20대 행 제거 df5 = df4.loc[(df4.Age >= 20) & (df4.Age <30) , :]
df5 = df4.loc[df4.Age != 10, :]
print(df5)

# 열 추가 (시리즈, List, assign)

s = np.arange(1, 613)

# df5["id"] = list

# df5.loc[:, ["id"]] = list

df5.assign(id = s)

df5

df5.groupby(["Sex", "Survived"])["Survived"].count()

728x90

저작자표시 비영리 변경금지 (새창열림)

'Programming > Python' 카테고리의 다른 글

Python pandas 2강 - DataFrame 사용 (0)	2021.10.26
Python pandas 1강 - 데이터 다루기 (0)	2021.10.26
Python Numpy 7강 - 행렬곱, 연립방정식 (2)	2021.10.21
Python Numpy 6강 - BroadCasting (0)	2021.10.20
Python Numpy 5강 - DB연결, 학습 (0)	2021.10.20

현재글Python pandas 3강 - Nan 제거

250x250

Today :
Yesterday :

git branch, 자바폰트, git 설치, Branch, git 설정, JDK, JDK11, git, git merge, Merge, 스프링, 환경변수, 쿠키런폰트,

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

상맹의 명상

Python pandas 3강 - Nan 제거

'Programming > Python' 카테고리의 다른 글

'Programming/Python'의 다른글

티스토리툴바

Python pandas 3강 - Nan 제거

'Programming > Python' 카테고리의 다른 글

'Programming/Python'의 다른글

관련글

티스토리툴바