R

We couldn’t deliver any final product, but we could provide the means for other people to develop a final product

Ross Ihaka & Robert Gentleman, creators of R, on why they never commercialized the software

What is R

R is a, “language and environment for statistical computing and graphics” which means that one could use code to produce a variety of statistical of analyses like linear regression, produce publication quality plots such as histograms, and connect with a number of other data analysis tools such as a SQL database.

One of the main advantages of R is the number of ‘packages’ or feature extensions available. For example, if you want to go beyond basic data visualization, the ggplot2 package allows you to manipulate colors, shapes, and scales in a flexible manner. It’s these extensions that makes data analytics more streamlined and up-to-date.

Why Should I Learn R

For actuaries, R is able to replicate spreadsheet work typically done in Excel, utilize advanced techniques such as Generalized Linear Models, and it scales well with large datasets. The Society of Actuaries saw these rich capabilities and therefore made the Predictive Analytics certification exam to test candidates on their, “ability to employ selected analytic techniques to solve business problems and effectively communicate the solution.” If you’d like to get the leg up on a future actuary exam, learning R is the way to go.

For data analysts, writing out code to conduct a thorough statistical analysis is what you’re being paid to do. Therefore, a few bullet points on the specific advantages of R that Lou Bajuk highlighted are below:

  • Many (if not most) introductory courses to statistics and data science teach R now
  • R has become the world’s largest repository of statistical knowledge with reference implementations for thousands, if not tens of thousands, of algorithms that have been vetted by experts.
  • R has a great community of supportive data scientists from diverse backgrounds. For example, R-Ladies is a global organization dedicated to promoting gender diversity in the R Community.

How can I use R

R is available as a free download here. You’ll quickly see that the default interface leaves much to be desired, which is where RStudio® comes in to provide a console, syntax-highlighting editor, and robust tools for viewing the data. RStudio® should be installed after downloading R. The RStudio® download can be found here.

How Can I learn R?

MATH 354, Data Analysis 1, is Colgate University’s comprehensive course on using R for data analysis. The first couple of weeks in this course are dedicated to learning the base R language/coding conventions so no experience in R or coding is necessary. A flowchart of the data analysis process that is covered throughout this entire course is provided.

RStudio’s Thomas Mock has a one-hour webinar to R for users coming from a traditional point-and-click software like SAS. He goes through reading data into R, exploratory data analysis, statistical tests, and creating a publication-ready plot. The webinar can be found here.

Click for video link!

One advantage of R is the variety of graphics you can make with it. From simple histograms to treemaps to even animations, you are given a lot of control of what is shown and how it’s shown. Knowing how to visualize data is a key way to present a data science story and so it’s nice to see examples of how to implement such aids. R Graph Gallery is a collection of charts made in R and they all contain the associated code to reproduce them yourself. Additionally, Nathan Yau provides a broader list of tutorials, guides, and examples of charts used for data.

R Graph Gallery

Nathan Yau’s Chart Types

Another comprehensive resource that is highly recommended is Hadley Wickham’s and Garrett Groumand’s book, R for Data Science. Learn how to clean data, draw plots, model data, and communicate your findings. Additionally, the Society of Actuaries recommends James et. al.’s An Introduction to Statistical Learning with Applications in R. It provides an accessible overview on statistical learning, popular modeling and prediction techniques, all while including end-of-chapter labs and problems to do in R.

Design a site like this with WordPress.com
Get started