Introduction to the Toolkit

Dr. Mine Dogucu

Getting to Know Each Other

Merhaba

Hello
Private Sub Form_Load() MsgBox "Hello, World!" End Sub
Hallo
مرحبا
print('Hello world')
नमस्ते & السلام عليكم
print("Hello world")
<html> Hello world</html>
¡Hola!
سلام

Meet and Greet Each Other

In groups three or four meet and greet each other.

Include

Your name
Your year
I live …
My favorite thing about UCI is …
I am awesome because …

Find something in common between all of you by expanding the conversation.
Find a difference.

Getting to Know the Course

The most important thing about this course

The course website

Poll Everywhere

What is Data Science?

Think 💭 - Pair 👫🏽 - Share 💬

What do you think data science is about and what will we learn in this course? There is no right or wrong answer.

What is Data Science?

Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured, and unstructured data. Wikipedia

What is Data Science?

Data science also integrates domain knowledge from the underlying application domain (e.g., natural sciences, information technology, and medicine). Data science is multifaceted and can be described as a science, a research paradigm, a research method, a discipline, a workflow, and a profession. Wikipedia

What types of data will we use in this course

We will use a variety of datasets from biological studies to business answering questions serving different purposes in life. Data will come different size, shape, and form and will include numbers, categories, text etc.

Is this a statistics course or a computing course?

A little bit of both.

Do I need prior programming/statistics experience?

No

Example work by students

How to be successful in this course

  • Be punctual
  • Be organized
  • Do the work

How to make your professor happy

  • Be kind
  • Be honest

Getting to Know the Toolbox

hello woRld

Object assignment operator

my_apples <- 4
Windows Mac
Shortcut Alt + - Option + -

R is case-sensitive

My_apples
Error in eval(expr, envir, enclos): object 'My_apples' not found

If something comes in quotes, it is not defined in R.

n_apples <- c(7, my_apples, 3)

names <- c("Menglin", "Gloria", "Robert")

data.frame(person = names, apple_count = n_apples)
   person apple_count
1 Menglin           7
2  Gloria           4
3  Robert           3

Vocabulary

do(something)

do() is a function;
something is the argument of the function.

do(something, colorful)

do() is a function;
something is the first argument of the function;
colorful is the second argument of the function.

Getting Help

In order to get any help we can use ? followed by function (or object) name.

?c

Tip

You should not copy paste code from my slides or from the internet. Part of learning to code is building up your muscle memory.

tidyverse_style_guide

canyoureadthissentence?

tidyverse_style_guide

n_apples <- c(7, my_apples, 3)

names <- c("Menglin", "Gloria", "Robert")

data.frame(person = names, 
           apple_count = n_apples)
  • After function names do not leave any spaces.

  • Before and after operators (e.g. <-, =) leave spaces.

  • Put a space after a comma, not before.

  • Object names are all lower case, with words separated by an underscore.

Tip

You can let RStudio do the indentation for your code.

RStudio Setup

Quarto

markdown


_Hello world_ 

__Hello world__

~~Hello world~~ 

Hello world

Hello world

Hello world

Quarto parts

Quarto parts

Quarto parts

Quarto parts

Quarto parts

Quarto parts

Slides for this course

Slides that you are currently looking at are also written in Quarto. You can take a look at them on our course’s GitHub organization in the slides repo.

Does this look familiar?

  • hw1

  • hw1_final

  • hw1_final2

  • hw1_final3

  • hw1_finalwithfinalimages

  • hw1_finalestfinal

What if we tracked our file with a better names for each version and have only 1 file hw1?

  • hw1 added questions 1 through 5

  • hw1 changed question 1 image

  • hw1 fixed typos

We will call the descriptions in italic commit messages.

git vs. GitHub

  • git allows us to keep track of different versions of a file(s).

  • GitHub is a website where we can store (and share) different versions of the files.

Demo

We have actually done something similar to this demo before by cloning the test repo and committing, and pushing.

Tip

Always use .Rproj file to open projects. Then open the appropriate .qmd / .R file from the Files pane. If you don’t open .Rproj file you will not be able to see the Git pane.

Cloning a repo

repo is a short form of repository. Repositories contain all of your project’s files as well as each file’s revision history.

For this course our daily repos (lecture code, activity etc.) are hosted on Github.

To clone a GitHub repo to our computer, we first copy the cloning link as shown in screencast then start an RStudio project using that link.

Cloning a repo pulls (downloads) all the elements of a repo available at that specific time.

Commits

Once you make changes to your repo (e.g. take notes during lecture, answer an activity question) you can take a snapshot of your changes with a commit.

This way if you ever have to go back in version history you have your older commits to get back to.

This is especially useful, for instance, if you want to go back to an earlier solution you have committed.

Push

All the commits you make will initially be local (i.e. on your own computer).

In order for us to see your commits and your final submission on any file, you have to push your commits. In other words upload your files at the stage in that specific time.

(An incomplete) Git/GitHub glossary

Git: is software for tracking changes in any set of files

GitHub: is an internet host for Git projects.

repo: is a short form of repository. Repositories contain all of your project’s files as well as each file’s revision history.

clone: Cloning a repo pulls (downloads) all the elements of a repo available at that specific time.

commit: A snapshot of your repo at a specific point in time. We distinguish each commit with a commit message.

push: Uploads the latest “committed” state of your repo to GitHub.

Do you git it?