Course Description – Spring 2022

Python – Spring 2022 has been cancelled.

Please stay tuned for future P4H course offerings on this website or through our CoDHR listserv (email to request to join).

Python for the Digital Humanities

Over the past decade, Python has become the lingua franca for performing computational work in the sciences and the humanities. Compared to other procedural programming languages, Python’s learning-curve is low, and due to its popularity, an ever-increasing set of tools exists for producing, exploring, analyzing, and presenting data derived from cultural artifacts.

This semester length course seeks to drastically lower the barrier-to-entry for humanities researchers interested in unlocking the power of Python. The first six weeks (12 hours of instruction) will be geared toward those with little to no experience with programming. During that time, participants will become acculturated to the “Pythonic” way of thinking, and will become proficient in using the following fundamental building blocks of procedural programming:

  • Variables, operators, lists, and dictionaries
  • If-statements, for-loops, and functions
  • Classes, properties, and methods
  • Reading, writing, and manipulating files

The remainder of the course will be dedicated to praxis in order to reinforce/build upon Python skills while performing typical Digital Humanities tasks. Example data will be provided, though participants are encouraged to BYOD (bring your own data). Before the outset of this praxis-oriented portion of the course, the instructor will gather input as to the nature of these exercises (what would be useful for you to learn how to do?), allowing for the course to be tailored to the needs of participants. As such, while the specific number and type of exercises may change, past exercises have included:

  • Performing OCR on page images
  • Preparing novel-length texts for textual analysis (clean-up, chapter segmenting, tokenization, lemmatization, etc.)
  • Performing and visualizing word frequency analysis
  • Authorship attribution (clustering similar texts together)
  • Topic modeling
  • Natural Language Processing (named entity recognition, part of speech tagging, grammatical dependency parsing)
  • Querying REST API’s (like Twitter)
  • Parsing XML documents (like TEI)
  • Querying and updating relational databases (like MySQL)
  • Creating simple but dynamic web applications for displaying data

The only requirements for participants will be access to a modern web-browser (preferably Chrome or Firefox, but latest versions of Safari or Edge should also work), a Google account (you likely already have a personal Gmail email address or an account through your institution, but if not, sign up at, and a high-speed internet connection. Python code will be written and executed within Google Colab (, and data will be accessed via Google Drive.

For more information, visit Meeting Details, Syllabus – Spring 2022, and Registration.


Comments are closed.