```{r}
1+1
```
[1] 2
Coming up! | Probability & Distributions!
Starting with the very basics
Josef Fruehwald
September 4, 2024
To run R code in a Quarto notebook, you need to insert a “code chunk”. In visual editor mode, you can do that by typing the forward slash (/
) and start typing in “R Code Chunk”. In the source editor mode, you have to have a line with ```{r}
(three “backticks” followed by “r” in curly braces), then a few blank lines followed by another ```
To actually run the code, you can either click on the green play button, or press the appropriate hotkey for your system
Honestly, instead of gambling on how R may or may not interpret PEMDAS, just add parentheses ( )
around every operation in the order you want it to happen.
The formula to convert Celsius to Fahrenheit is
\[ \frac{9}{5}\text{C} + 32 \]
Somewhere around 20℃, the website Tops Aff declares it tops aff weather. What temperature is that in ℉?
To assign values to a variable in R, you can use either <-
or ->
. Most style guides shun ->
, but I actually wind up using it a lot.
Assign different values to C
to see their conversion to Fahrenheit.
When using a number in R, we can only use digits and dots (.
). If we try to enter “one hundred thousand” with a comma separator, we’ll get an error.
We also can’t use any percent signs (%
) or currency symbols ($
, £
, €
)
When we type in text without any quotes, R will assume it’s a variable or function that’s already been defined and go looking for it.
If the variable hasn’t been created already, we’ll get an error.
If we enter text inside of quotation marks, either single quotes '
or double quotes "
, R will instead treat the text as a value that we could, for example, assign to a variable, or just print out.
You will often get confused about this and get the Error: object '' not found
message. Even if you do this for 15 years, you will still sometimes enter plain text when you meant to put it in quotes, and put text in quotes you meant to enter without. It’s always annoying, but doesn’t mean you’re bad at doing this.
What value is going to get printed below?
What value is going to be printed below? Change the code so that "fruit"
gets printed.
There are two specialized values that you could call “True/False” or “Logical” or “Boolean” values
These are often created using logical comparisons
When you have a missing value, that’s given a special NA
value.
Distinguishing between missing data and 0 data is super important for data analysis, but isn’t always done well. For example, if we asked 3 people what their names were, and only remembered to asked 2 of them what their age was, we’d get a really different estimate of their average age if we entered 0 for the missing person!
Vectors are basically 1 dimensional lists of values.1 You can have numeric, character or logical vectors in R, but you can’t mix types. One way to create vectors is with the c()
(for concatenate) function. There needs to be a comma ,
between every value that you add to a vector.
digital_words <- c(
"enshittification",
"chat",
"gamers",
"ice cream so good",
"millennial pause",
"skibidi"
)
print(digital_words)
[1] "enshittification" "chat" "gamers"
[4] "ice cream so good" "millennial pause" "skibidi"
You can also create vectors of sequential vectors with the :
operator.
Create a vector containing the names of three cities.
There are a lot of functions for creating vectors.
[1] 1.000000 1.444444 1.888889 2.333333 2.777778 3.222222 3.666667 4.111111
[9] 4.555556 5.000000
You can do arithmetic on a whole vector of numbers. digital_word_votes
is a vector of how many votes each word got. We can get the sum like so:
Any single value we add, subtract, multiply, or divide will apply each value in the vector.
Convert the raw counts of votes in digital_word_votes
to proprtional votes.
Proportions are calculated by dividing each single amount by the total amount.
If you’ve never programmed before, this part will make sense, and if you haven’t programmed before, this part will be confusing.
If you have a vector, and you want to get the first value from it, you put square brackets []
after the variable name, and put 1
inside.
[1] "enshittification" "chat" "gamers"
[4] "ice cream so good" "millennial pause" "skibidi"
[1] "enshittification"
If you want a range of values from a vector, you can give it a vector of numeric indices.
Also really useful is the ability to do logical indexing. For example, if we wanted to see which digital words got twenty or fewer votes, we can do
We can use this sequence of TRUE
and FALSE
values to get the actual words from the digital_words
vector.
Using logical indexing, get the word that got the most votes out of digital_words
.
If we can use that logical comparison as an index vector.
To write more readable code, it might be nice to create intermediate variables, or use more newlines
[1] "enshittification"
or
The most common kind of data structure we’re going to be working with are Data Frames. These are two dimensional structures with rows and columns. The data types within each column all need to be the same.
library(tibble)
word_df <- tibble(
type = "digital",
word = digital_words,
votes = digital_word_votes
)
print(word_df)
# A tibble: 6 × 3
type word votes
<chr> <chr> <dbl>
1 digital enshittification 111
2 digital chat 59
3 digital gamers 11
4 digital ice cream so good 11
5 digital millennial pause 45
6 digital skibidi 46
To navigate data frames, there are a few handy functions. First, in RStudio you can launch a viewer with View()
Keeping things inside the Quarto notebook, other useful functions are summary()
, nrow()
, ncol()
and colnames()
.
To get all of the data from a single column of a data frame, we can put $
after the data frame variable name, then the name of the column.
[1] "enshittification" "chat" "gamers"
[4] "ice cream so good" "millennial pause" "skibidi"
We’re going to have more, interesting ways to get specific rows from a data frame later on in the course, but for now if you want to subset just the rows that have 20 or fewer votes, we can use subset
.
The “pipe” (|>
) is going to play a big role in our R workflow. What it does is take whatever is on its left hand side and inserts it as the first argument to the function on the left hand side. Here’s a preview.
Packages get installed once with install.pacakges()
But they need to be loaded every time with library()
If you try to load a package that you haven’t installed yet, you’ll get this error:
The reason they aren’t called “lists” is because there’s another kind of data object called a list that has different properties.↩︎
@online{fruehwald2024,
author = {Fruehwald, Josef},
title = {Starting with {R}},
date = {2024-09-04},
url = {https://lin611-2024.github.io/notes/meetings/2024-09-04_starting-r.html},
langid = {en}
}