vcfR documentation

by
Brian J. Knaus and Niklaus J. Grünwald

If you experience some sort of unexpected behaviour it is possible that you have discovered a bug. Ways to report a bug include contacting me directly. If you are a GitHub user it would be better if you created an issue. The success, and efficiency, of any attempt to help you will typically depend on your ability to create a minimal reproducible example. A minimal reproducible example is a small amount of code that allows me, or others, to reproduce the behaviour you are observing. Through providing a minimal reproducible example you accomplish two things. First, it provides me with a quick way to validate that the behaviour is repeatable on my system, confirming that this is not something specific to your system. Second, it provides me with an example of the reported behaviour that I can ‘poke and prod’ so that I can (hopefully) find a solution. To create a minimal reproducible example you will need some VCF data. Data for an example can be provided by using example data included in vcfR or by using your own data.

vcfR data

The R package vcfR comes with some example data. This is used in the examples and vignettes. With a little work you can modify this example data to provide a minimal reproducible example.

library(vcfR)
data(vcfR_test)
vcfR_test
## ***** Object of Class vcfR *****
## 3 samples
## 1 CHROMs
## 5 variants
## Object size: 0 Mb
## 0 percent missing data
## *****        *****         *****

Any problem is likely to occur in the ‘gt’ or ‘fix’ slots. These are matrices that can be manipulated with the square brackets ([]).

vcfR_test@gt
##      FORMAT        NA00001          NA00002          NA00003       
## [1,] "GT:GQ:DP:HQ" "0|0:48:1:51,51" "1|0:48:8:51,51" "1/1:43:5:.,."
## [2,] "GT:GQ:DP:HQ" "0|0:49:3:58,50" "0|1:3:5:65,3"   "0/0:41:3"    
## [3,] "GT:GQ:DP:HQ" "1|2:21:6:23,27" "2|1:2:0:18,2"   "2/2:35:4"    
## [4,] "GT:GQ:DP:HQ" "0|0:54:7:56,60" "0|0:48:4:51,51" "0/0:61:2"    
## [5,] "GT:GQ:DP"    "0/1:35:4"       "0/2:17:2"       "1/1:40:3"
vcfR_test@gt[1,]
##           FORMAT          NA00001          NA00002          NA00003 
##    "GT:GQ:DP:HQ" "0|0:48:1:51,51" "1|0:48:8:51,51"   "1/1:43:5:.,."
vcfR_test@gt[1,2]
##          NA00001 
## "0|0:48:1:51,51"
# Change the genotype
vcfR_test@gt[1,2] <- "3|4:48:1:51,51"
vcfR_test@gt
##      FORMAT        NA00001          NA00002          NA00003       
## [1,] "GT:GQ:DP:HQ" "3|4:48:1:51,51" "1|0:48:8:51,51" "1/1:43:5:.,."
## [2,] "GT:GQ:DP:HQ" "0|0:49:3:58,50" "0|1:3:5:65,3"   "0/0:41:3"    
## [3,] "GT:GQ:DP:HQ" "1|2:21:6:23,27" "2|1:2:0:18,2"   "2/2:35:4"    
## [4,] "GT:GQ:DP:HQ" "0|0:54:7:56,60" "0|0:48:4:51,51" "0/0:61:2"    
## [5,] "GT:GQ:DP"    "0/1:35:4"       "0/2:17:2"       "1/1:40:3"

If you can find the part of your data that reproduces the problem you should be able to engineer an example by using this example data.

Your data

Of course, it might just be simpler if you send me your data. Some of you may not be comfortable with this, particularly if your data is unpublished. But many people have been comfortable sending me their data to troubleshoot issues. Keep in mind that VCF data ‘self documents’ in that it includes information about what software created it and information about the reference file used to call variants. I think this is interesting because I get to learn a bit about all the projects people are using vcfR with. But if you’re uncomfortable with this you may either want to use the above option, or come up with a way of obfuscating any information you are uncomfortable sharing.

Reporting the issue

Once you have some data, you should provide a few lines of code that reproduces the issue. If you can forward this data, code, and any error or warning messages to me (or post it as an issue) it should help me troubleshoot this matter. Better yet, if you use RMarkdown, sharing your *.Rmd and its compiled output from your system illustrating the matter (usually *.html), that would be great.

Lastly, it never hurts to report your session information to validate this is not a version issue. Mine is as follows.

sessionInfo()
## R version 3.4.4 (2018-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] methods   stats     graphics  grDevices utils     datasets  base     
## 
## other attached packages:
##  [1] RColorBrewer_1.1-2 memuse_4.0-0       poppr_2.7.1       
##  [4] adegenet_2.1.1     ade4_1.7-11        pinfsc50_1.1.0    
##  [7] cowplot_0.9.2      ggplot2_2.2.1      reshape2_1.4.3    
## [10] vcfR_1.8.0.9000    rmarkdown_1.9     
## 
## loaded via a namespace (and not attached):
##  [1] viridisLite_0.3.0 splines_3.4.4     gtools_3.5.0     
##  [4] shiny_1.0.5       assertthat_0.2.0  expm_0.999-2     
##  [7] sp_1.2-7          highr_0.6         yaml_2.1.19      
## [10] LearnBayes_2.15.1 pegas_0.10        pillar_1.2.2     
## [13] backports_1.1.2   lattice_0.20-35   quadprog_1.5-5   
## [16] glue_1.2.0        phangorn_2.4.0    digest_0.6.15    
## [19] promises_1.0.1    colorspace_1.3-2  htmltools_0.3.6  
## [22] httpuv_1.4.2      Matrix_1.2-14     plyr_1.8.4       
## [25] pkgconfig_2.0.1   gmodels_2.16.2    purrr_0.2.5      
## [28] xtable_1.8-2      scales_0.5.0      gdata_2.18.0     
## [31] later_0.7.2       tibble_1.4.2      mgcv_1.8-23      
## [34] lazyeval_0.2.1    magrittr_1.5      mime_0.5         
## [37] deldir_0.1-15     evaluate_0.10.1   nlme_3.1-137     
## [40] MASS_7.3-50       vegan_2.5-1       tools_3.4.4      
## [43] formatR_1.5       stringr_1.3.1     munsell_0.4.3    
## [46] cluster_2.0.7-1   bindrcpp_0.2.2    compiler_3.4.4   
## [49] rlang_0.2.1       grid_3.4.4        igraph_1.2.1     
## [52] labeling_0.3      boot_1.3-20       gtable_0.2.0     
## [55] R6_2.2.2          knitr_1.20        dplyr_0.7.5      
## [58] seqinr_3.4-5      fastmatch_1.1-0   bindr_0.1.1      
## [61] rprojroot_1.3-2   spdep_0.7-7       permute_0.9-4    
## [64] ape_5.1           stringi_1.2.2     polysat_1.7-2    
## [67] parallel_3.4.4    Rcpp_0.12.17      spData_0.2.8.3   
## [70] tidyselect_0.2.4  coda_0.19-1

The benefits of reporting

If you have identified an issue with the vcfR code, then reporting this issue will have at least two positive outcomes. First, I’ll (hopefully) resolve this matter and you can continue with your research. Second, I will probably use your example to engineer a unit test. A unit test is a bit of code that is executed every time vcfR is built and tests that vcfR is operating correctly. My unit tests can be seen as a collection of code that has created issues in the past. By testing these during builds it ensures that these problems are fixed and haven’t inadvertantly been unfixed while trying to address a different issue. So by reporting issues, you can actually help make vcfR a better software. Thank you!


Copyright © 2017, 2018 Brian J. Knaus. All rights reserved.

USDA Agricultural Research Service, Horticultural Crops Research Lab.