vcfR documentation

by
Brian J. Knaus and Niklaus J. Grünwald

The R packages adegenet and poppr are popular tools for population genetic analysis. The genlight objects, and poppr’s snpclone object, were designed to handle high throughput sequencing datasets. Here we describe how to convert a vcfR object to a genlight and snpclone object.

Data import

We will use the example dataset provided by vcfR.

library(vcfR)
data(vcfR_example)

Creating genlight objects

The genlight object is used by adegenet and poppr. It was designed specifically to handle high-throughput genotype data. At present it appears to only support two alleles at a locus, but varying levels of ploidy. Variant callers such as FreeBayes and the GATK’s haplotype caller currently support more than two alleles per locus. To address this incompatibility, vcfR2genelight omits loci that include more than two alleles. The benefit of the genlight object is that the genlight object is much more efficient to use than the genind object as it was designed with high throughput sequencing in mind. When verbose is set to TRUE the function vcfR2genlight will throw a warning and report how many loci it has omitted. When verbose is set to FALSE the loci will be omitted silently.

vcf_file <- system.file("extdata", "pinf_sc50.vcf.gz", package = "pinfsc50")
vcf <- read.vcfR(vcf_file, verbose = FALSE)
x <- vcfR2genlight(vcf)
## Warning in vcfR2genlight(vcf): Found 312 loci with more than two alleles.
## Objects of class genlight only support loci with two alleles.
## 312 loci will be omitted from the genlight object.
x
##  /// GENLIGHT OBJECT /////////
## 
##  // 18 genotypes,  21,719 binary SNPs, size: 2.2 Mb
##  31239 (7.99 %) missing data
## 
##  // Basic content
##    @gen: list of 18 SNPbin
## 
##  // Optional content
##    @ind.names:  18 individual labels
##    @loc.names:  21719 locus labels
##    @chromosome: factor storing chromosomes of the SNPs
##    @position: integer storing positions of the SNPs
##    @other: a list containing: elements without names

A warning was thrown because our vcfR object included variants that include more than two alleles. The genlight object was designed to handle variants with only two alleles. This warning is intended to inform the user that the data was subset to only variants with two allele so they can proceed to analysis in adegenet or poppr.

Creating snpclone objects

The genlight object is extended by the snpclone object for analysis of clonal and partially clonal populations in poppr. The genlight object can be converted to a snpclone object with functions in the poppr package.

library(poppr)
## Loading required package: adegenet
## Loading required package: ade4
## 
##    /// adegenet 2.1.9 is loaded ////////////
## 
##    > overview: '?adegenet'
##    > tutorials/doc/questions: 'adegenetWeb()' 
##    > bug reports/feature requests: adegenetIssues()
## This is poppr version 2.9.3. To get started, type package?poppr
## OMP parallel support: unavailable
x <- as.snpclone(x)
x
##  ||| SNPCLONE OBJECT |||||||||
## 
##  || 18 genotypes,  21,719 binary SNPs, size: 2.2 Mb
##  31239 (7.99 %) missing data
## 
##  || Basic content
##    @gen: list of 18 SNPbin
##    @mlg: 16 original multilocus genotypes
##    @ploidy: ploidy of each individual  (range: 2-2)
## 
##  || Optional content
##    @ind.names:  18 individual labels
##    @loc.names:  21719 locus labels
##    @chromosome: factor storing chromosomes of the SNPs
##    @position: integer storing positions of the SNPs
##    @other: a list containing: elements without names 
## 
## NULL

Note that we now have a mlg slot to hold multilocus genotype indicators.


Copyright © 2017, 2018 Brian J. Knaus. All rights reserved.

USDA Agricultural Research Service, Horticultural Crops Research Lab.