R
R
?This article was created using RStudio and R markdown
to
document the following analysis. R
is a programming
language and environment commonly used for statistical computing, data
analysis, and data visualization. RStudio is an integrated development
environment (IDE) for R
that provides a user-friendly
interface, debugging tools, and other features that make it easier to
write and execute R
code. R Markdown
is a file
format for making dynamic documents with R, allowing users to embed
R
code within a Markdown document to create reports that
are not only easy to read but also include live calculations, graphs,
and other data visualizations. Together, these tools offer a
comprehensive ecosystem for data science tasks, from data manipulation
to analysis to reporting.
R packages
Packages
are a key part of working with
R.
They contain functions
that allow you to
perform a wide range of tasks in R.
Some of them even
contain datasets to practice on.
Here we will be using a package called tidyverse.
The
tidyverse
package is actually a collection individual
packages
that can help perform a wide variety of analysis
tasks. We load the tidyverse
package as follows:
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
The diamonds
dataset in the ggplot2
package
is a great example for previewing R
functions. Lets preview
the data using the head()
function, which displays the
columns and the first several rows of data:
head(diamonds)
## # A tibble: 6 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
We can build a simple visualization with ggplot2
with:
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point()
The code above takes the diamonds
data, plots the carat
column on the X-axis, the price column on the Y-axis, and represents the
data as a scatter plot using the geom_point()
command.
It can help to separate out some of the components. Let us separate
the data by cut with the facet_wrap()
function:
ggplot(data = diamonds, aes(x = carat, y = price, color = cut)) +
geom_point() +
facet_wrap(~cut)