What is R?

This article was created using RStudio and R markdown to document the following analysis. R is a programming language and environment commonly used for statistical computing, data analysis, and data visualization. RStudio is an integrated development environment (IDE) for R that provides a user-friendly interface, debugging tools, and other features that make it easier to write and execute R code. R Markdown is a file format for making dynamic documents with R, allowing users to embed R code within a Markdown document to create reports that are not only easy to read but also include live calculations, graphs, and other data visualizations. Together, these tools offer a comprehensive ecosystem for data science tasks, from data manipulation to analysis to reporting.

R packages

Packages are a key part of working with R.They contain functions that allow you to perform a wide range of tasks in R. Some of them even contain datasets to practice on.

Here we will be using a package called tidyverse. The tidyverse package is actually a collection individual packages that can help perform a wide variety of analysis tasks. We load the tidyverse package as follows:

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Viewing the data

The diamonds dataset in the ggplot2 package is a great example for previewing R functions. Lets preview the data using the head() function, which displays the columns and the first several rows of data:

head(diamonds)
## # A tibble: 6 × 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

Visualizing the data

We can build a simple visualization with ggplot2 with:

ggplot(data = diamonds, aes(x = carat, y = price)) +
  geom_point()

The code above takes the diamonds data, plots the carat column on the X-axis, the price column on the Y-axis, and represents the data as a scatter plot using the geom_point() command.

It can help to separate out some of the components. Let us separate the data by cut with the facet_wrap() function:

ggplot(data = diamonds, aes(x = carat, y = price, color = cut)) +
  geom_point() +
  facet_wrap(~cut)