rbtl - Research Beyond the Lab

class: center, middle, inverse, title-slide

# rbtl - Research Beyond the Lab
## Reference management with Zotero
### Lars Schöbitz
### 2022-04-28

---

# Today

1. Homework Assignment 2
1. Reference Management - Zotero
1. Open Source - Licenses
1. Reproducible Research
1. Homework Assignment 3

---
# Homework Assignment 2

```r
dat_in_sum_day <- dat_in %>% 
  filter(value <= 1000) %>% 
  mutate(date = as_date(date_time)) %>% 
  group_by(date, location, indicator) %>% 
  summarise(min = min(value),
            median = median(value),
            mean = mean(value),
            sd = sd(value),
            max = max(value)) 
```

---
# Homework Assignment 2

```r
*dat_in_sum_day <- dat_in %>%
  filter(value <= 1000) %>% 
  mutate(date = as_date(date_time)) %>% 
  group_by(date, location, indicator) %>% 
  summarise(min = min(value),
            median = median(value),
            mean = mean(value),
            sd = sd(value),
            max = max(value)) 
```

- Objects that store dataframes: `dat_in` and `dat_in_sum_day`
---
# Homework Assignment 2

```r
dat_in_sum_day <- dat_in %>% 
* filter(value <= 1000) %>%
* mutate(date = as_date(date_time)) %>%
* group_by(date, location, indicator) %>%
* summarise(min = min(value),
*           median = median(value),
*           mean = mean(value),
*           sd = sd(value),
*           max = max(value))
```

- Objects that store dataframes: `dat_in` and `dat_in_sum_day`
- Functions: `filter()`, `mutate()`,`as_date()`, `group_by()`, `summarise()`, etc.

---
# Homework Assignment 2

```r
dat_in_sum_day `<-` dat_in %>% 
  filter(value <= 1000) %>% 
  mutate(date = as_date(date_time)) %>% 
  group_by(date, location, indicator) %>% 
  summarise(min = min(value), 
            median = median(value), 
            mean = mean(value), 
            sd = sd(value), 
            max = max(value))  
```

- Objects that store dataframes: `dat_in` and `dat_in_sum_day`
- Functions: `filter()`, `mutate()`,`as_date()`, `group_by()`, `summarise()`, etc.
- Assignment operator: `<-`

---
# Homework Assignment 2

```r
dat_in_sum_day <- dat_in `%>%`  
  filter(value <= 1000) `%>%`
  mutate(date = as_date(date_time)) `%>%` 
  group_by(date, location, indicator) `%>%` 
  summarise(min = min(value), 
            median = median(value), 
            mean = mean(value), 
            sd = sd(value),  
            max = max(value))  
```

- Objects that store dataframes: `dat_in` and `dat_in_sum_day`
- Functions: `filter()`, `mutate()`,`as_date()`, `group_by()`, `summarise()`, etc.
- Assignment operator: `<-`
- Pipe operators: `%>%`

---
# Homework Assignment 2 - Imported raw data

```r
dat_link <- "https://raw.githubusercontent.com/Global-Health-Engineering/manuscript-hospital-air-quality/main/data/intermediate/malawi-hospitals-air-quality.csv"

dat_in <- read_csv(dat_link)

dat_in
```

```
# A tibble: 203,806 × 6
  date_time           id    location indicator value unit 
  <dttm>              <chr> <chr>    <chr>     <dbl> <chr>
1 2019-10-08 13:59:01 hos1  guardian pm2.5      19.4 uq_m3
2 2019-10-08 13:59:01 hos1  guardian pm10       27   uq_m3
3 2019-10-08 14:04:41 hos1  guardian pm2.5      44.9 uq_m3
4 2019-10-08 14:04:41 hos1  guardian pm10       56.7 uq_m3
5 2019-10-08 14:10:21 hos1  guardian pm2.5     202.  uq_m3
6 2019-10-08 14:10:21 hos1  guardian pm10      240.  uq_m3
# … with 203,800 more rows
```

---
# Summarised derived data

.pull-left[

]

.pull-right[

```
# A tibble: 890 × 8
# Groups:   date, location [445]
  date       location indicator   min median   mean     sd   max
  <date>     <chr>    <chr>     <dbl>  <dbl>  <dbl>  <dbl> <dbl>
1 2019-10-01 Lhouse   pm10       22.8  121.  143.   107.   447. 
2 2019-10-01 Lhouse   pm2.5      12.2   46.6  54.8   41.1  194  
3 2019-10-02 Lhouse   pm10       24.9  205.  231.   153.   772. 
4 2019-10-02 Lhouse   pm2.5      12.8   72.6  88.0   65.2  375. 
5 2019-10-02 Lions    pm10        7.5   15.2  16.1    5.23  30.2
6 2019-10-02 Lions    pm2.5       4.6    6     6.62   1.85  12.4
# … with 884 more rows
```

]

---
class: inverse, middle

.big[Reference Management]

---
# Why?

- You will read a lot
- You want to stay organized
- You don't want to waste your time on formatting

---
# Which tool?

- Mendeley 
- EndNote
- Zotero
- many, many more

???

**Mendeley**

1. Mendeley is owned by Elsevier. 
2. It encrypts your database and makes money with your data
3. You can only collaborate with 3 people on one project.

**EndNote**

1. EndNote doesn't come free, you need to buy a license. 
2. They also used a prioprietary citation style files that could only be opened by EndNote.

---
# Which tool?

- Mendeley 
- EndNote
- **Zotero**
- many, many more

???

**Mendeley**

1. Mendeley is owned by Elsevier. 
2. It encrypts your database and makes money with your data
3. You can only collaborate with 3 people on one project.

**EndNote**

1. EndNote doesn't come free, you need to buy a license. 
2. They also used a prioprietary citation style files that could only be opened by EndNote.

---
# Why Zotero?

.footnote[[Screenshot taken from zotero.org on 2022-03-03](https://www.zotero.org/)]

---
# Zotero is Open Source - Why is that good?

- Free Software
- Transparent about access to your own data
- The source code that Zotero is developed in is public
- Commitment to support open software and open standards
- Zotero developers helped create the open [Citation Style Langauge (CSL)](https://citationstyles.org/)

---
# Open Source - Licenses

- [Open Source isn't just code on the internet](https://open-source-for-researchers.github.io/open-source-workshop/01-what-is-open-source)

- Use permissive licenses to allow others to reuse, remix and build upon (also for commercial purposes)
- Recommended licenses 
    - **Text, slides, images**: [Creative Commons](https://creativecommons.org/about/cclicenses/) (CC0, CC-BY, CC-BY-SA) 
    - **Software**: [MIT License](https://en.wikipedia.org/wiki/MIT_License), [Hippocratic License](https://firstdonoharm.dev/), [Unlicense](https://unlicense.org/) for software

- https://tldrlegal.com/ - plain english explanations of licences in bullet form.
- https://kbroman.org/steps2rr/pages/licenses.html - Read Karl Broman
???

- CCO: a public dedication tool, which allows creators to give up their copyright and put their works into the worldwide public domain
- CC-BY: attribution to the creator. The license allows for commercial use.
- CC-BY-SA: attribution. you must license the modified material under identical terms

- MIT License: As a permissive license, it puts only very limited restriction on reuse
- Hippocratic license: Ethical Source license that specifically prohibits the use of software to violate universal standards of human rights, and embodying the [Ethical Source Principles](https://ethicalsource.dev/principles/).
- Unlicense: A template for dedicating your software to the public domain

---
class: inverse, middle

.big[Open Source != Open Access]

???

- Papers and journal articles that are not paywalled
- Paywall removed from either publishers or research institutions/libraries

From: https://open-source-for-researchers.github.io/open-source-workshop/01-what-is-open-source

---
class: inverse, middle

.big[Open Source != Open Data]

???

- Refers to the practice of sharing the data that produces your results
- This is especially important if you write code that depends in the data - 
- the code can’t usually be re-run without it.

From: https://open-source-for-researchers.github.io/open-source-workshop/01-what-is-open-source

---

https://the-turing-way.netlify.app/_images/reproducibility.jpg
class: inverse, middle

.large[Open Source (Code) + Open Data =   
Reproducible Research]

---
class: left
background-image: url(https://the-turing-way.netlify.app/_images/reproducibility.jpg)
background-position: right
background-size: contain

# Reproducible Research

.footnote[The Turing Way Community, & Scriberia. (2021). Illustrations from the Turing Way book dashes. Zenodo. https://doi.org/10.5281/zenodo.5706310]

---
class: inverse, middle

.big[Homework Assignment]

---
# Homework Assignment 3 - Learning Objectives

These learning objectives are related to the assignment for this week.

- Learners are able to import references to a Zotero group library
- Learners can use an exported library from Zotero in Better BibTex Format to generate an automated reference list in an R Markdown file
- Learners can edit a file in the Citation File Format (.cff) to add their name to the author list

---
# Homework Assignment 3 - Due Date

- Complete Assignment 2 before you complete Assignment 3
- Assignment 3, due on 15th March
- Readings on Reproducible Research

---
class: center, middle

# Thanks! 🌻

Slides created via the R packages:

[**xaringan**](https://github.com/yihui/xaringan)<br>
[gadenbuie/xaringanthemer](https://github.com/gadenbuie/xaringanthemer)

The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](http://yihui.name/knitr), and [R Markdown](https://rmarkdown.rstudio.com).

Access slides as [PDF on GitHub](https://rbtl-fs22.github.io/website/slides/pt2-d03-reference-management/pt2-d03-reference-management.pdf)

All material is licensed under [Creative Commons Attribution Share Alike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/).