You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Tobias Lindstrøm Jensen 251c0deb31 Ready for day two 3 years ago
jupyter Ready for day two 3 years ago
.gitignore Initial commit 3 years ago
ESS8e02.hdf Post lecture reflection. Updates. Removed some examples. Removed regression. Made som parts optional 3 years ago
ESS8e02.sav First commit in new repro 3 years ago Minor adjustments before run next week 3 years ago

Introduction to Python and Data Carpentry for Social Science

A workshop over two days, 26th and 27th of February 2019, 12.30-15.30 both days.


Data processing (and analysis) is crucial in many sciences including the social sciences---but how do I get started? This course requires no programming experience and we will build a working knowledge for performing simple data processing using the programming language Python. We will first cover basic programming and how to work with the Jupyter Notebook tool. This basic part will then be extended with data processing and visualization using the Pandas tool.


The participants are encouraged to follow and run the same examples as shown during the course. The workshop will contain several smaller 5-10min exercises and breaks.

  1. Why? How? (1/2 h)
  2. Data Carpentry?
  3. Why Python
  4. Working with Python - Jupyter Notebook - Literate Programming?
  5. Download and installation (anaconda)
  6. Sharing - reproducible - visibility and visualization
  7. Python basics with Jupyter Notebook (2 h)
  8. Running cells
  9. Variables and data types, print, arithmetic
  10. Help
  11. Functions (why, how) + import
  12. Stings
  13. Lists
  14. Control structure (if, while, for)
  15. Processing data using Pandas (2 h)
  16. Intro and key data structures (Series and DataFrame)
  17. Manipulation and arithmetic
  18. Comparison
  19. Index functionalities
  20. First-order statistics (mean, median, midrange)
  21. Not available / Not-a-Number (NaN)
  22. Multi-level indexing
  23. Reading (CSV)
  24. Saving (xlsx, html, stata, hdf)
  25. Data preparation: assembling (columns and rows via merge and concatenation)
  26. If time allows: split-apply-combine methodologies
  27. Visualization using Pandas and plotnine (1 1/2 h)
  28. Time series plots
  29. Experimenting with data from the European Social Survey - inspecting and pre-processing
  30. Histograms
  31. Box plots
  32. Manipulations - title, axis, legends
  33. Saving a plot/graph
  34. Wrap-up


Wes McKinney & PyData Development Team (2018) “pandas: powerful Python data analysis toolkit” doc

Fabio Nelli (2015) “Python Data Analytics”, Apress, ISBN-13 (electronic): 978-1-4842-0958-5

Data Visualization with ggplot2 - Cheat Sheet, 11/16, link

Plotnine - A Grammar of Graphics for Python, homepage, docs

Kristian Larsen, PyDST - A python script for accessing the API of Statistics Denmark, link Danmarks Statistik, Statistikbank API - Beta, link


Interest in more? have a look at a similar data carpentry course here