Posts

Mind the Gap: Mixed Models in R

There are over 80 packages in R for mixed models, providing a patchwork of overlapping functionality. This complicates the teaching and implementation of mixed models for R users of all abilities. What exactly are the problems and what should we do about it?

Adventures in Babysitting: Web Scraping for the Python and HTML Novice

This is adapted from a talk I gave at Pycascades in Portland, Oregon 2020-02-09. All the resources from this talk are available at my GitHub repo. ## Loading required package: reticulate Introduction Here is a Great Way to Learn Python! But in truth, who actually reads these A to Z?? (spoiler: not me) Many people find these instructional books useful, but as a new Python user, I do not typically read this books front to back.

Bad Data Management: A Star Wars Story

Most of us think of Star Wars as the rebellion versus the empire – good versus evil with key figures: the Jedi, the Sith, Darth Vader, Luke Skywalker, the Emperor, Rey, Kylo Ren. But, The downfall of the governing bodies was not Darth Vader, Luke Skywalker, Kylo Ren or Palpatine. Actually, Star Wars is a story of really bad data management practices. The Republic: The Jedi maintained decent, well-organised archives in a large library and they even had paid librarians on staff.

Faster R Scripts through Code Profiling

Sources Thomas Lumley, Github repo useRfasteR Hadley Wickham, Profiling , Advanced R Dirk Eddelbuettel, Rcpp The Process for Improving Code: (quote from Advanced R) Find the biggest bottleneck (the slowest part of your code). Try to eliminate it (you may not succeed but that’s ok). Repeat until your code is “fast enough.” Easy peasy, right??? Some general guidelines for speeding up R code Use data frames less - they are expensive to create, often copied in whole when modified, and their rownames attribute can really slow things down.

Best Practices for Data Workflows in Ag Science

I was struggling with the data set. The first 50 rows were summary stats (…I think), and then the actual data started. The file was over 250 columns wide, composed largely of phenotypic traits. Perhaps one third of the columns were unique variables gathered over several years, but, the variable names were inconsistent across years. Actually, there was no information on the variables, what they were measuring, and what scale they were on.

What I learned at the UseR! conference

Top Lessons from R users conference held July 10 - 13 in Brisbane, AUS With 900 registrants and dozens of talks, there is much to report ( videos of most talks provided by R Consortium YouTube channel) I’ll skip loads of it and just focus on the top 10 cool stuff. The hex wall was just straight up cool. Here’s the code for that . I participated in a half day workshop on Rcpp ( here and here ).