Apply

vcfR documentation

by
Brian J. Knaus and Niklaus J. Grünwald

One of the weaknesses of R is that loops can be relatively slow to execute. The apply family of functions attempts to address this. Use ?apply or ?lapply for examples of other flavors of the apply command.

Use of apply()

Create a test matrix.

tmp <- matrix(rep(1:3, times=3), ncol=3)
tmp

##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    2    2    2
## [3,]    3    3    3

‘Apply’ the function ‘sum’ over rows.

apply(tmp, MARGIN=1, sum)

## [1] 3 6 9

‘Apply’ the function ‘sum’ over rows.

apply(tmp, MARGIN=2, sum)

## [1] 6 6 6

Use of custom functions

If the operation we wish to apply to a data structure exists as an R function, we can call it from the apply command. We can also define our own functions to apply over a data structure.

In practice, if we wanted to get averages over a matrix, there are existing functions that should be used. Here we’ll create our own as an example.

myMean <- function(x){
  sum(x)/length(x)
}

apply(tmp, MARGIN=1, myMean)

## [1] 1 2 3

Through defining our own functions we can extract all sorts of summaries from data in a fairly efficient manner.

Homework

A large part of quality control of data sets is finding and mitigating missing data. Here I’ll create a larger toy data set and add some missing data. As homework, create a custom function that will help you identify missing data.

toy <- matrix(ncol=10, nrow=12)
set.seed(999)
toy[] <- rnorm(length(toy))

colnames(toy) <- paste("sample", 1:ncol(toy), sep="_")
rownames(toy) <- paste("variant", 1:nrow(toy), sep="_")

set.seed(999)
is.na(toy[round(runif(n=30, min=1, max=length(toy)))]) <- TRUE
toy

##              sample_1   sample_2   sample_3     sample_4    sample_5   sample_6
## variant_1  -0.2817402  0.9387494 -1.1252685 -0.370527471  0.58226648 -0.9233114
## variant_2  -1.3125596         NA  0.6422657  0.522867793 -0.03472639  1.1649540
## variant_3          NA  0.9576504 -1.1067376  0.517805536 -0.11666415  1.0420687
## variant_4   0.2700705         NA         NA -1.402510873 -0.64498209         NA
## variant_5  -0.2773064         NA -1.5540951 -0.485636726          NA         NA
## variant_6  -0.5660237  0.1006576         NA  0.008498139  0.36609447 -1.1469577
## variant_7  -1.8786583  0.9013448  2.3826642 -1.282113287          NA -1.4081795
## variant_8          NA -2.0743571  0.6012761           NA  0.28261247 -0.2823287
## variant_9  -0.9677497 -1.2285633  0.1793613  0.300665411          NA -0.4177700
## variant_10 -1.1210094  0.6430443  1.0805315  0.276478845 -1.27921590         NA
## variant_11         NA         NA         NA -2.050877659  0.43536881 -0.1062858
## variant_12  0.1339774  0.2940356 -2.1137370  0.014190211 -0.56550098         NA
##               sample_7    sample_8   sample_9  sample_10
## variant_1   0.94970110 -1.20409383         NA -1.5100543
## variant_2           NA -0.37684776         NA -0.6772986
## variant_3   0.97400041  1.36364858 -0.2417764 -0.2979716
## variant_4   0.06229143 -0.25288275         NA -1.5191194
## variant_5   0.53842205          NA -1.6509552 -0.9118353
## variant_6  -2.06482325  0.43714914  0.4782007 -0.8358807
## variant_7           NA          NA -0.8052824 -0.2171495
## variant_8  -0.16022669  0.02768521         NA -1.0710323
## variant_9  -0.64292273          NA         NA  0.9450480
## variant_10  0.98529855  1.28372914 -2.5954909  1.1279968
## variant_11 -1.22857333 -1.12974161  0.2901482 -1.2786429
## variant_12  0.08522467  1.04665773  1.3836599  0.4576313

Now we create a custom function to process our matrix.

my_fun <- function(x){
  length(x)
}

Lastly, we use apply to iterate the function over the matrix.

apply(toy, MARGIN=1, my_fun)

##  variant_1  variant_2  variant_3  variant_4  variant_5  variant_6  variant_7 
##         10         10         10         10         10         10         10 
##  variant_8  variant_9 variant_10 variant_11 variant_12 
##         10         10         10         10         10

Can you modify the function my_fun() so that it counts missing data in each sample and variant?

USDA Agricultural Research Service, Horticultural Crops Research Lab.