R 중급 - apply 계열 함수 정리 (apply, lapply, sapply, tapply, mapply)

Tools/R

R 중급 - apply 계열 함수 정리 (apply, lapply, sapply, tapply, mapply)

2019. 4. 1. 16:48

Apply 계열 함수 정리

Overview

Function	Description
`apply`	Apply functions over array margins
`by`	Apply a function to a data frame split by factors
`eapply`	Apply a function over values in an environment
`lapply`	Apply a function over a list or vector
`mapply`	Apply a function to multiple list or vector arguments
`rapply`	Recursively apply a function to a list
`tapply`	Apply a function over a ragged array

출처 - https://csgillespie.github.io/efficientR/programming.html

데이터를 다룰 때, 원자별, 그룹별로 함수를 적용할 경우가 많다.
Apply 계열의 함수는 데이터 구조를 갖는 R object 를 input 으로 받아 원소 별 혹은 그룹별로 함수를 적용시키는 것
input 과 output 의 형태에 따라 여러 종류로 나뉜다.
- apply (input : array, output : array)
- lapply (input : list or vector, output : list)
- sapply (input : list or vector, output : vector or array)
- vapply (input : list or vector, output : vector or array)
- tapply (input : list or vector and factor, output : vector or array)
- mapply (input : list or vector, output : vector or array)

apply

x <- matrix(1:9, c(3,3))  
x 
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9

Apply 함수는 행렬의 행 또는 열 방향으로 특정 함수를 적용한다.
apply(array, 방향, 함수)
1: 행, 2: 열

apply(x, 1, function(x) {2*x}) 
#>      [,1] [,2] [,3]
#> [1,]    2    4    6
#> [2,]    8   10   12
#> [3,]   14   16   18

# apply 함수는 vector 에 적용할 수 없다. 
# dim attribute 가 있어야 함
# apply(c(1,2,3), 1, function(x) {2*x}) 
# 에러 출력

lapply

apply 함수의 단점은 input 으로 array 만 입력할 수 있다는 것이다.
일반적으로 vector 를 input 넣는 경우가 많은데, 이를 위해 lapply 가 존재한다.
입력으로 vector 또는 list 를 받아 list 를 반환한다.

result <- lapply(1:3, function(x) x*2)
result
#> [[1]]
#> [1] 2
#> 
#> [[2]]
#> [1] 4
#> 
#> [[3]]
#> [1] 6
unlist(result) # vector 로 바꾸고 싶으면 unlist 
#> [1] 2 4 6

데이터 프레임에도 lapply 를 적용할 수 있다.
데이터 프레임은 list 에 기반한 s3 object 이기 때문

head(iris)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa
typeof(iris)
#> [1] "list"
lapply(iris[, 1:4], mean)
#> $Sepal.Length
#> [1] 5.843333
#> 
#> $Sepal.Width
#> [1] 3.057333
#> 
#> $Petal.Length
#> [1] 3.758
#> 
#> $Petal.Width
#> [1] 1.199333
y <- lapply(iris[, 1:4], function(x) {x > 3})
head(lapply(y, function(x) x[1:5]))
#> $Sepal.Length
#> [1] TRUE TRUE TRUE TRUE TRUE
#> 
#> $Sepal.Width
#> [1]  TRUE FALSE  TRUE  TRUE  TRUE
#> 
#> $Petal.Length
#> [1] FALSE FALSE FALSE FALSE FALSE
#> 
#> $Petal.Width
#> [1] FALSE FALSE FALSE FALSE FALSE

sapply

sapply 는 list 대신 행렬 or 벡터로 반환한다.
lapply는 list 를 반환하므로 list 를 다시 unlist 하는 것이 번거롭다.
lapply 의 wrapper 이다.

y <- sapply(iris[,1:4], function(x) {x > 3})
typeof(y) # Logical matrix 로 반환한다.
#> [1] "logical"
class(y)
#> [1] "matrix"
head(y)
#>      Sepal.Length Sepal.Width Petal.Length Petal.Width
#> [1,]         TRUE        TRUE        FALSE       FALSE
#> [2,]         TRUE       FALSE        FALSE       FALSE
#> [3,]         TRUE        TRUE        FALSE       FALSE
#> [4,]         TRUE        TRUE        FALSE       FALSE
#> [5,]         TRUE        TRUE        FALSE       FALSE
#> [6,]         TRUE        TRUE        FALSE       FALSE

z <- sapply(iris[, 1:4], mean)
z
#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#>     5.843333     3.057333     3.758000     1.199333

vapply

FUN.VALUE argument 에 output format 을 명확히 정의해서 더 안전함

y <- vapply(iris[, 1:4], function(x) {x > 3}, numeric(length(iris[, 1]))) # numeric vector 로의 반환 
typeof(y)
#> [1] "double"
class(y)
#> [1] "matrix"
head(y)
#>      Sepal.Length Sepal.Width Petal.Length Petal.Width
#> [1,]            1           1            0           0
#> [2,]            1           0            0           0
#> [3,]            1           1            0           0
#> [4,]            1           1            0           0
#> [5,]            1           1            0           0
#> [6,]            1           1            0           0

z <- vapply(iris[, 1:4], function(x) {x > 3}, logical(length(iris[, 1]))) # logical vector 로의 반환 
typeof(z)
#> [1] "logical"
class(z)
#> [1] "matrix"
head(z)
#>      Sepal.Length Sepal.Width Petal.Length Petal.Width
#> [1,]         TRUE        TRUE        FALSE       FALSE
#> [2,]         TRUE       FALSE        FALSE       FALSE
#> [3,]         TRUE        TRUE        FALSE       FALSE
#> [4,]         TRUE        TRUE        FALSE       FALSE
#> [5,]         TRUE        TRUE        FALSE       FALSE
#> [6,]         TRUE        TRUE        FALSE       FALSE

tapply

tapply 는 그룹별 처리를 위한 함수이다.
그룹을 인자로 주고 (factor 형), 원소별 처리가 아니라 그룹별 처리를 함

tapply(1:10, rep(c(1,2), each=5), sum) # 그룹 인자는 알아서 factor 형으로 변환이 됨 
#>  1  2 
#> 15 40

iris 데이터에서 Species 별로 Sepal.Length 의 평균 구하기

tapply(iris$Sepal.Length, iris$Species, mean)
#>     setosa versicolor  virginica 
#>      5.006      5.936      6.588

mapply

sapply 와 비슷하지만 여러개의 인자를 넘긴다.

mapply(function(i, s) {
       sprintf(" %d%s ", i, s)
}, 1:3, c("a", "b", "c"))
#> [1] " 1a " " 2b " " 3c "

iris 데이터에서 Sepal.Length 와 Sepal.Width 를 합친 새로운 변수 생성
인자가 몇 개 올지 모르므로, 앞선 함수들에서는 data 를 먼저 인자로 넘겼지만 mapply 에서는 함수를 먼저 인자로 넘긴다.

y <- mapply(`+`, iris$Sepal.Length, iris$Sepal.Width)
# 위 코드는 mapply(function(a, b) a+b, iris$Sepal.Length, iris$Sepal.Width) 와 같다. 
head(iris[,c("Sepal.Length", "Sepal.Width")])
#>   Sepal.Length Sepal.Width
#> 1          5.1         3.5
#> 2          4.9         3.0
#> 3          4.7         3.2
#> 4          4.6         3.1
#> 5          5.0         3.6
#> 6          5.4         3.9
y[1:5]
#> [1] 8.6 7.9 7.9 7.7 8.6

저작자표시

'Tools > R' 카테고리의 다른 글

R 중급 - 데이터 구조 심화 (Data structure) (0)	2019.04.09
해들리 위컴은 어떻게 수많은 R 패키지를 개발할 수 있었을까? (0)	2019.04.07
일관성 있는 R 코드 작성하기: 해들리 위컴의 R 코딩 스타일 가이드 (0)	2019.03.25
R - dplyr 을 통한 데이터 변형과 장점 (0)	2019.03.24
윈도우에서 R 업데이트 하기 (과거 R 패키지 그대로 가져오는법) (3)	2019.03.24

Deepplay interested in data analytics and ML modeling

admin write link

notice

블로그 운영 정보

my link

statistics

total :
today :
yesterday :

Tools/R

R 중급 - apply 계열 함수 정리 (apply, lapply, sapply, tapply, mapply)

apply

lapply

sapply

vapply

tapply

mapply

'Tools > R' 카테고리의 다른 글

notice

category

recent posts

recent comments

tag cloud

my link

statistics

티스토리툴바