Material

Fall semester


Slides Quizzes Data
Part 1: Introduction to R programming
Lecture 1 Introduction to R html - pdf Quiz 1 data
Lecture 2 Descriptive statistics html - pdf Quiz 2
Lecture 3 Basic data manipulation html - pdf Quiz 3 data
Lecture 4 Data visualization html - pdf data
Lecture 5 Quarto & \(\LaTeX\) html - pdf data
Lecture 6 Text data & sentiment analysis html - pdf data
Lecture 7 Homework correction html - pdf data

Part 2: Introduction to Econometrics
Lecture 8 Univariate regressions html - pdf Quiz 4 data
Lecture 9 Multivariate regressions html - pdf Quiz 5 data
Lecture 10 Inference html - pdf Quiz 6 data
Lecture 11 Causality html - pdf
Lecture 12 Interpretation html - pdf
Lecture 13 Applications in academic research html - pdf data
Lecture 14 Maps and geolocalized data html - pdf data

Guidelines Source Data
Exams
Homework html - pdf qmd data






Spring semester


Slides Data
Guidelines and refreshers
Lecture 16 How to conduct a research project html - pdf
Lecture 17 Refresher: R Programming html - pdf data
Lecture 18 Refresher: Econometrics html - pdf


Guidelines Grading Data
Exams
Research project html - pdf html - pdf data











Cheatsheets















Contents

Fall semester



I. Introduction to R
1. Getting started
2. Anatomy of a data.frame

II. Descriptive statistics
1. Distributions
2. Central tendency
3. Spread
4. Inference

III. Basic data manipulation
1. The dplyr package
2. Merge and reshape
3. A few words on learning R

IV. Data visualization
1. The ggplot() function
2. Adding dimensions
3. Types of geometry
4. How (not) to lie with graphics

V. Quarto & LaTeX
1. Basic principles
2. Useful features
3. LaTeX for equations

VI. Text data & sentiment analysis
1. Cleaning text data
2. Sentiment analysis



VIII. Univariate regressions
1. Joint distributions
2. Univariate regressions
3. Binary variables

IX. Multivariate regressions
1. Adding variables
2. Control variables
3. Interactions

X. Inference
1. Asymptotic inference
2. Exact inference
3. Hypothesis testing

XI. Causality
1. Main sources of bias
2. Randomized control trials

XII. Interpretation
1. Point estimates
3. Regression tables

XIII. Applications in academic research
1. Causal approach (Behaghel et al., 2015)
2. Correlational approach (Chetty et al., 2014)
3. Structural approach (Nerlove, 1963)

XIV. Maps and geolocalized data
1. Geolocalized data
2. Geographic variables













Syllabus

Course description


The objective of this course is to provide you with the necessary statistical and data visualization tools to perform well-structured and meaningful data analyses for your own research projects. By the end of this course you should be able to produce relevant statistics and compelling graphics to include in your future reports, presentations, or research projects. To this end, this course combines notions of R programming, basic Statistics, and introductory Econometrics.

During the first semester, the emphasis will successively be put on R Programming and Econometrics. Still, all along the course bridges will be built between these two dimensions of data analysis. The second semester will be dedicated to a research project. You will have to carry out an original data analysis by applying the tools covered in class during the first semester to your own research question. For your soon-acquired programming and data analysis skills to be easily valuated on an online CV or as a writing sample, you will learn how to make your final project look like this document.

Practical matters

The course will be taught in French but the material is written in English. It is meant to be accessible without prior knowledge in R programming or Statistics. Lectures will take place at Campus Jourdan - Room 3-38, every Monday from 13:30 to 15:30. Please reach me at louis.sirugue@psemail for any question or comment about the course.

Please make sure to download and install R, RStudio, and Quarto before the first lecture, and to bring your laptop with you at each class.

Don’t hesitate to send me an email if you’re already facing issues at that stage.




Grading system
Fall semester


The grading of the first semester will be based on:

  • Six weekly online quizzes - 25%
  • One homework - 30%
  • One final exam - 45%

Online quizzes consist in sets of 3 to 5 short questions related to the content of the previous class. Links to the quizzes will be available in the Material section of the course webpage. You will be able to log in to each quiz using the verification code received by email. You will have the possibility to retry the quiz as many times as you want before submitting. When you’re done with a quiz you must click on download results and send the downloaded file to me by email. This is to keep track of your answers in case of disagreement about the grading. You have until the beginning of the following lecture to submit your results and send them to me by email.

The homework is a set of exercises related to the first five lectures of the course. It can be done alone or by pairs, and must be handed over via email by the end of the sixth week (15th of October 18:00). Late submissions will be penalized by half a point for each 30min beyond the deadline. It will soon be available on the Material section of the course webpage. The precise grading system of the homework is also available on the Material section of the course webpage already. You are very welcome to help each other, but you must write your answers yourselves. Copy-pasting is not permitted.

The final exam will be a paper exam in classroom. You are allowed to bring a cheatsheet with you to the exam as long as it is handwritten and standing on a single A4 (21cm x 29.7cm) page, i.e., recto only. Cheatsheets that do not comply with these rules will be confiscated. Beyond that, standard examination rules apply.




Spring semester


The grading of the second semester will be entirely based on your research project that you will carry out all along the semester. Research projects must be done by groups of two, so both members of a pair will have the same grade. You will be evaluated three times, at three different stages of the research process:

  • Presentation of research idea - 25%
  • Midterm report - 30%
  • Final project/presentation - 45%

As the first three lectures of the second semester will be refreshers on the notions seen during the first semester, you will have some time to come up with your own research question and to find relevant data to test your hypotheses empirically (you will be guided on where and how to look for data). Then, in each session you will have to apply a given step of the research process to your research project: data cleaning, descriptive statistics, data visualization, hypothesis testing, etc. until your final presentation. Here is an example of what the final document should look like, and the guidelines and grading system are available in the Material section of the homepage. You will have the possibility to write it in French or in English depending on what you think would be the most useful for you.





Schedule
Fall semester


Part 1: Introduction to R programming
04/09/23 Lecture 1 * Introduction to R
11/09/23 Lecture 2 * Descriptive statistics
18/09/23 Lecture 3 * Basic data manipulation
25/09/23 Lecture 4 Data visualization
02/10/23 Lecture 5 R markdown & \(\LaTeX\)
09/10/23 Lecture 6 Text data & sentiment analysis
16/10/23 Lecture 7 Homework correction
Part 2: Introduction to Econometrics
23/10/23 Lecture 8 * Univariate regressions
06/11/23 Lecture 9 * Multivariate regressions
13/11/23 Lecture 10 * Inference
20/11/23 Lecture 11 Causality
04/12/23 Lecture 12 Interpretation
11/12/23 Lecture 13 Applications in academic research
18/12/23 Lecture 14 Maps and geolocalized data
15/01/24 Lecture 15 Final exam

Standard lectures:

During these lectures we will cover the material core material of the course.

Buffer lectures:

These lectures are slightly more advanced, but their content will not be needed for the exams. There are two buffer lectures, each placed before an exam to give you time to prepare for the exam.

Exams:

The homework is due by the 15th of October 18:00, and will be corrected in class during the 7th lecture.
The final exam will take place during the very last lecture.
The six online quizzes will take place during the first three weeks of each part of the course. The corresponding lectures are marked with a * symbol.

Horizontal dashed lines represent break periods. They all last 1 week except the 3-week-long Christmas break.





Spring semester


Group 1 Group 2
Part 1: Guidelines and refreshers
Lecture 1 TBA TBA How to conduct a research project
Lecture 2 TBA TBA Refresher: R Programming
Lecture 3 TBA TBA Refresher: Econometrics
Part 2: Research project
Lecture 4 TBA TBA Presentation of your research question and dataset
Lecture 5 TBA TBA Follow-up: Data cleaning I
Lecture 6 TBA TBA Follow-up: Data cleaning II
Lecture 7 TBA TBA Follow-up: Descriptive statistics
Lecture 8 TBA TBA Follow-up: Visualizing the data
Lecture 9 TBA TBA Follow-up: Regression analysis
Lecture 10 TBA TBA Follow-up: Midterm report feedback
Lecture 11 TBA TBA Follow-up: Causality assessment
Lecture 12 TBA TBA Follow-up: Robustness
Lecture 13 TBA TBA Follow-up: Heterogeneity
Lecture 14 TBA TBA Follow-up: Last tips
Lecture 15 TBA TBA Final presentation

Guidelines and refreshers:

The first lecture will cover in detail how to conduct your research project, and the next two lectures will go back on the most important notions seen during the first semester. These are meant to give you time to form groups of 2 and to find your research question and your dataset. You will be guided on where and how to look for data, and both the data and the research question can be taken from an existing academic article.

Follow-ups:

Follow-up sessions are meetings during which you will be given feedback on what you’ve done and indications on what to do next. Between each session you will have to apply a given step of the research process to your research project: data cleaning, descriptive statistics, data visualization, hypothesis testing, etc., until your final presentation. Here is an example of what the final document should look like, and the guidelines and grading system are available at the bottom of the Material section of the homepage. You will have the possibility to write it in French or in English depending on what you think would be the most useful for you.

Exams:

The 4th lecture will be dedicated to the presentation of your research questions and datasets.
A midterm report must be handed out by email after the 9th lecture. Lecture 10 will be devoted to giving you feedback about it.
The last lecture will be dedicated to your final presentations.
More detailed guidelines are provided above and in the first set of slides “How to conduct a research project”.

Horizontal dashed lines represent break periods.





References



This course is inspired by:

  • Introduction to R, by H. Bull and P. Charousset. Paris School of Economics (2020)
  • Advanced Mircoeconometrics, by D. Margolis and F. Libois. Paris School of Economics (2020)
  • How to Lie with Graphics, by C. Bontemps. Toulouse School of Economics (2020)
  • Introduction to Econometrics with R, by F. Oswald, V. Viers, P. Villedieu, and G. Kenedi. SciencesPo Dep. of Economics (2020)
  • Introduction to Causal Inference, by L. Zabrocki. Paris Sciences et Lettres - CPES (2020)
  • Geolocalized Datasets and Applications for Economics, by F. Libois and E. Madinier. Paris School of Economics (2020)
  • Countless posts on stackoverflow and R-bloggers