Introduction to Quarto

November 5, 2024

Thank You

Special thanks to Nick Bearman for sharing an earlier version of these slides!

Outline & Learning objectives:

  • What is reproducibility and replicability?
  • Why do it & how we do it?
  • Understand what reproducibility and replicability are
  • How quarto can be used to create documents, web pages, in an effort to increase replicability and reproducibility
  • create your own presentation

What is Reproducibility?

  • Ability for other people with a similar level of skill to reproduce your work.
  • Other people:
    • colleagues in company,
    • group members in a project,
    • yourself in a year when you want to use your project work for something else,
  • Fundamental part of research
  • Also is best practice - which will allow others to reproduce your work.

Sometimes we get confused into the terminology…

“[…] when the same analysis steps performed on the same dataset […] produce the same answer.” (Turing Way)

Why we need Reproducibility?

  • We need to have confidence that our research is good quality and we are doing good science

  • Peter Fisher (1993) compared seven different pieces of GIS software doing a viewshed analysis

  • and got seven (slightly) different results!

Why do it?

  • Fisher also discovered a major error in one piece of software which gave completely incorrect results.

  • Highlights the need for:

    • Standards & testing to make sure this doesn’t happen
    • Algorithms used to be published so people can see what is happening
    • Issues when only binary files are available, and not the source code

Fisher, P. F. (1993). Algorithm and implementation uncertainty in viewshed analysis. International Journal of Geographical Information Systems, 7(4), 331–347. https://doi.org/10.1080/02693799308901965

Why do it?

  • Standards & testing to make sure this doesn’t happen

  • Algorithms used to be published so people can see what is happening

    • Publish algorithms in journals
    • Even more important with machine learning - transparency is important
  • Issues when only binary files are available, and not the source code

    • Growth in open source software - so you can see (and unpick) what is happening

Research

  • Some journals & conferences ask you to submit code along with your paper

  • AGILE - https://reproducible-agile.github.io/

  • Anyone (with a similar level of skills) should be able to do reproduce your research and benefit from it.

  • One reason for open source tools.

  • If you do analysis in ArcGIS Pro, you need ArcGIS Pro to recreate that analysis.

  • If you don’t have ArcGIS Pro, what do you do?

How do we do this?

  • Documenting what you did is standard - Methods

  • If you can do what you did in a script, then you can also share this

  • ArcGIS Pro / QGIS

    • graphical interface, click buttons, etc
  • R / Python

    • write out the script

Setup - “environments”

  • To replicate a piece of work, you need to know what software they used

  • What version?

  • Which libraries / packages?

  • What version of libraries or packages?

Writing, Presentations

Also works for writing and presentations as well.

  • Markdown allows you to write plain text with tags - stars, hashes, etc.

  • Can also do analysis in this

  • LaTeX is a developed version of Markdown (or Markdown is a simple version of LaTeX)

  • RMarkdown allows you to run R code

  • Quarto allows you to run other code (Python, R, etc.)

  • This presentation is written in Quarto 😄

Let’s dive into Quarto

  • Quarto is a new, open-source, scientific and technical publishing system
  • Combine text and code to produce formatted documents
  • Publish reproducible and dynamic presentations, dashboards, websites, blogs, and books in HTML, PDF, MS Word, etc.
  • Multi-language support for R, Python, Julia, and more
  • Quarto extends RMarkdown and shares similarities with Juypter Notebooks.

Artwork from “Hello, Quarto” keynote by Julia Lowndes and Mine Çetinkaya-Rundel, presented at RStudio Conference 2022. Illustrated by Allison Horst.

Formats

  • Documents: HTML, PDF, MS Word, Open Office, ePub
  • Presentations: Revealjs, PowerPoint,
  • Wikis: MediaWiki, JiraWiki, …
  • Many templates exist for academic documents: quarto-journals
  • And much more: Jupyter, RTF, InDesign, …

How does Quarto work?

taken from What is Quarto - A Quick Intro FAQ

.qmd

qmd file

.ipynb

jupyter notebook

How does Quarto handle code chunks?

  • example with the iris dataset (flowers)
data(iris)

plot(iris$Sepal.Length, iris$Sepal.Width, 
     main = "Scatter Plot of Sepal Length vs Sepal Width",
     xlab = "Sepal Length (cm)",
     ylab = "Sepal Width (cm)",
     pch = 16, col = iris$Species)

Plots

```{r}
#| label: "iris-plot"
#| echo: TRUE
#| fig-format: svg
#| cache: TRUEs

data(iris)

plot(iris$Sepal.Length, iris$Sepal.Width, 
     main = "Scatter Plot of Sepal Length vs Sepal Width",
     xlab = "Sepal Length (cm)",
     ylab = "Sepal Width (cm)",
     pch = 16, col = iris$Species)

```

defaults to knitr engine (you can override the engine with engine: jupyter)

```{python}
#| label: fig-polar
#| fig-cap: "A line plot on a polar axis"

import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(
  subplot_kw = {'projection': 'polar'} 
)
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()
```

Air Quality

Figure 1 further explores the impact of temperature on ozone level.

Figure 1: Temperature and ozone level.

Population density map in PA, 2010

  • code was run in a markdown, and image saved

Population density map in PA, 2010 (interactive)

When to use Quarto?

Strengths & Weaknesses of Quarto for slides

Strengths 💪

  • Consistency in Output
    • Focus on content
  • Support for (Explicit) Version Control (e.g. git)
  • Great for Code (in Slides)
  • Automation / Generated Contents
  • Interactivity

Weaknesses 😢

  • Harder to do fine layouting
    • No WYSIWYG
  • New Syntax to learn
  • Software Maturity

Key Benefit: (Explicit) Version Control

  • Going back through time
  • Great for collaboration
  • Allow sharing and adaptation
  • Allows automation

Practice what you preach!

By setting up your teaching materials in a reproducible manner, you demonstrate the value of reproducibility directly

  • Useful for others
  • Useful for future you when you teach this course again

Additional Resources

Thank you! 🙏

References

Cook, Joshua J. 2999. “An Introduction to Quarto: A Versatile Open-Source Tool for Data Reporting and Visualization.”
Paciorek, Christopher. 2023. “An Example Quarto Markdown File.”
Rey, Sergio J. 2009. “Show Me the Code: Spatial Analysis and Open Source.” Journal of Geographical Systems 11: 191–207.
Singleton, Alex David, Seth Spielman, and Chris Brunsdon. 2016. “Establishing a Framework for Open Geographic Information Science.” International Journal of Geographical Information Science 30 (8): 1507–21.