Skip to content

Commit 53dc56f

Browse files
committed
first
1 parent 71c3ff0 commit 53dc56f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+7841
-0
lines changed

.DS_Store

14 KB
Binary file not shown.

.gitignore

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.Rproj.user
2+
.Rhistory
3+
.RData
4+
.Ruserdata

01-importing.Rmd

+235
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
2+
---
3+
title: 'Week 2: Examples for Data Importing'
4+
---
5+
6+
# Load packages
7+
```{r}
8+
library(readr)
9+
library(readxl)
10+
```
11+
12+
# Let's focus on readr package first
13+
# Check the current directory
14+
15+
```{r}
16+
getwd()
17+
```
18+
19+
# Visit Environmental Performance Index at https://epi.yale.edu/
20+
# Go to "Downloads > EPI2020Raw data"
21+
# https://epi.yale.edu/downloads/epi2020rawdata20200604v02.zip
22+
# Download this file to your Desktop and unzip this file.
23+
# Focus on CTH_raw_na file and move this file to data folder uder your working directory
24+
25+
```{r}
26+
# move the file to working directory
27+
#file.copy("/Users/gulinan/Desktop/epi2020rawdata20200604v02/CTH_raw_na.csv", "data/CTH_raw_na.csv")
28+
# When your working directory is the path provided by
29+
# getwd() function, but you want to access the data folder and
30+
# list all the files in it:
31+
list.files(file.path(getwd(), "data"))
32+
```
33+
34+
# CTH_raw_na is a csv file
35+
# Have a look at the first three lines of the CTH data
36+
```{r}
37+
# read_lines is a function from readr package
38+
# ?read_lines
39+
# read_lines() reads up to n_max lines from a file.
40+
read_lines("data/CTH_raw_na.csv", n_max=3)
41+
```
42+
43+
# The first line is header
44+
# read_* functions all assume that col_names=T by default.
45+
# read_* functions always skip empty rows through ski_empty_rows=T by default
46+
47+
48+
```{r}
49+
# more info
50+
?read_csv
51+
```
52+
53+
```{r}
54+
# Read the data into R console
55+
cth_data <- read_csv("data/CTH_raw_na.csv")
56+
```
57+
# When everything is OK, you will see your data under Environement section
58+
```{r}
59+
# Inverstigate data carefully
60+
View(cth_data)
61+
```
62+
63+
```{r}
64+
# Check the types of variables: character, numeric, integer
65+
str(cth_data)
66+
```
67+
68+
# Note: NA stands for missing values in the data
69+
70+
```{r}
71+
# Check the class of the data
72+
class(cth_data)
73+
```
74+
75+
```{r}
76+
# Check the dimensions
77+
dim(cth_data)
78+
```
79+
80+
```{r}
81+
# Check the column names
82+
colnames(cth_data)
83+
```
84+
85+
```{r}
86+
# Not comfortable with column names?
87+
colnames(cth_data)[-seq(1:3)] <- paste(seq(1950,2014), sep = "")
88+
```
89+
90+
```{r}
91+
# Check the column names one more time!
92+
colnames(cth_data)
93+
```
94+
95+
96+
# Since in readr function path is defined as:
97+
# "Either a path to a file, a connection",
98+
# we can also import csv files into R as well.
99+
# Let's import a csv file on the internet into R now
100+
101+
# Visit Google' Covid-19 mobility report
102+
# at https://www.google.com/covid19/mobility/
103+
# Global data is available to be downloaded as a csv file
104+
# at the following url:
105+
# https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv?cachebust=c050b74b9ee831a7
106+
107+
```{r}
108+
# First specify the url address of the data
109+
url <- "https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv?cachebust=c050b74b9ee831a7"
110+
# Then read it into R (It takes a bit time)
111+
google_mobility <- read_csv(url)
112+
```
113+
114+
```{r}
115+
# Investigate it as you wish
116+
View(google_mobility)
117+
```
118+
119+
```{r}
120+
# or download this file into your local computer
121+
# download.file functions is under utils library
122+
download.file(url, "data/google_mobility.csv")
123+
```
124+
125+
126+
# Let's focus on readxl package
127+
# readxl does not come with tidyverse.
128+
# For that reason, install readxl package and then load it
129+
```{r}
130+
# install.packages("readxl")
131+
library(readxl)
132+
```
133+
134+
# Visit at the web site of Monitoring the situation of
135+
# children and women in Europe and Central Asia
136+
# at http://transmonee.org/
137+
# Go to http://transmonee.org/database/ and
138+
# Download Excel file named
139+
# "Population at the beginning of the year by sex and selected age groups"
140+
# into to your working directory and rename it as Population-1989-2015
141+
# since the file name is very long.
142+
143+
```{r}
144+
# List the sheet names
145+
excel_sheets("data/Population-1989-2015.xlsx")
146+
```
147+
# read_excel() reads both xls and xlsx files and detects the format from the extension.
148+
```{r}
149+
pop <- read_excel("data/Population-1989-2015.xlsx")
150+
View(pop)
151+
```
152+
# The excel file is very crowded. It consists of many tables and notes in text format
153+
# Let's focus on Total population on January 1 which is located between 5th and
154+
# 39th rows. Note that the 5th row is a heading.
155+
```{r}
156+
pop_red <- read_excel("data/Population-1989-2015.xlsx", range = cell_rows(5:39))
157+
colnames(pop_red)[1:2] <- c("Country", "Number")
158+
View(pop_red)
159+
```
160+
# Now it seems a bit OK! Since new data set still involves
161+
# a few empty rows!..We can get rid of these rows
162+
# when we are familiar with dplyr package.
163+
164+
165+
# However, how about you have a very big data set?
166+
# Maybe, it is better to get more help from
167+
# https://readxl.tidyverse.org/articles/sheet-geometry.html
168+
# https://readxl.tidyverse.org/articles/cell-and-column-types.html
169+
170+
171+
# Sometimes, the Excel file may involve multiple sheets
172+
# Go to http://transmonee.org/database/download/ and
173+
# Download the Excel file of TransMonEE full database for 2019 and
174+
# save the file into your working directory. Do this only once.
175+
```{r}
176+
url <- "http://transmonee.org/wp-content/uploads/2016/05/TM-2019-EN-June.xlsx"
177+
download.file(url, "data/TM-2019-EN-June.xlsx")
178+
```
179+
# Check the number of sheets
180+
```{r}
181+
excel_sheets("data/TM-2019-EN-June.xlsx")
182+
```
183+
# It consists of 6 sheets
184+
```{r}
185+
full_data <- read_excel("data/TM-2019-EN-June.xlsx")
186+
View(full_data)
187+
```
188+
189+
# By default, it prints the first sheet
190+
```{r}
191+
juvenile_data <- read_excel("data/TM-2019-EN-June.xlsx", sheet = "5. Juvenile Justice & Crime")
192+
View(juvenile_data)
193+
```
194+
195+
# More work has to be done to retrieve this data!
196+
# Any idea and suggestion may contribute to the R Community!
197+
198+
################## Study at home ###########################################################################
199+
# Iterating over multiple files or multiple
200+
# sheets can be done via purr package.
201+
# https://readxl.tidyverse.org/articles/articles/readxl-workflows.html
202+
203+
# If you have any solutions, share it on Twitter with #mat381e #rstats and #unicef hashtags!..
204+
205+
# https://rviews.rstudio.com/2019/10/09/building-interactive-world-maps-in-shiny/
206+
207+
208+
# Sometimes, your data may be in a Google SpreadSheet
209+
# Such as the gapminder data at
210+
# https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit#gid=780868077
211+
# More info can be found at
212+
# https://moderndive.com/2-viz.html
213+
# Then you can use googlesheets4 package to dowload
214+
# this data into your working directory
215+
216+
```{r}
217+
#install.packages("googlesheets4")
218+
library(googlesheets4)
219+
url <- "https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit#gid=780868077"
220+
gapmind <- read_sheet(url)
221+
```
222+
223+
224+
# On the use of this package,
225+
# it is better to read this
226+
# https://www.tidyverse.org/google_privacy_policy/
227+
228+
# Or you can just download the Google Spreadsheet as a csv or xls file and then
229+
# read it via read_csv or read_excel function.
230+
231+
# You can write out your files as well.
232+
233+
234+
235+

0 commit comments

Comments
 (0)