Skip to content

logo

Edit or improve this page

This pilot course focuses on the feel, ask, do, think workflow to guide practitioners of data thinking from data to decisions.

Premise:

You don't trust numbers.

But you need numbers to make decisions.

Numbers, represented on a computer, must be finite: computers do not have infinite memory.

Further, the universe is expanding: one can never out-compete the second law of thermodynamics, stating that entropy (read: "randomness") must increase.

These constraints make it necessary to find fast algorithms to process data in order to find patterns in this data.

These patterns are the basis of data thinking.

Crucially, data thinking requires understanding of the stories behind the humans that collected, measured, curated, and disseminated or data used to inform decisions. These stories help understand what was included, what was excluded, and what could reasonably be expected of a particular dataset.

Further, data thinking requires visualization.

Finally, asking for help is hard for all of us.

The first basic principle of data thinking is to ask:

  • to ask for help
  • to ask a research question
  • to ponder
  • to ask why, how, and what for.

The course is oriented around the feel, ask, do, think workflow.

Concepts from psychology make an appearance, such as self-monitoring, or noticing emotional responses to verbal stimuli (such as thoughts or words associated with an experience). These basic tools can help accelerate learning and expose learners to uncertainty in an environment of psychological safety.

Such exposure to difficult, confusing contexts is vital for helping a learner traverse and navigate the tricky process of asking, answering, or refuting questions with data. Numbers are needed to make decisions, but numbers are not enough.

Based on open-ended elicitation of learner feedback, Claude (a large language model built by Anthropic, whose Series B round was led by Sam Bankman-Fried) summarized the following operational definition:

> Based on these learner definitions, here is an operational definition of "data thinking" for our class:
> 
> Data thinking is the mental process of analyzing, interpreting, and deriving meaning from data in order to understand and make decisions about the world. It involves:
> * Converting data into information and knowledge through logical and sometimes creative thinking. This includes categorizing, computing, reasoning, rationalizing, as well as making correlations and intuiting causation.
> * Asking questions about how the data was collected and making assumptions in order to comprehend its meaning. This includes considering what is included or excluded and why.
> * Connecting datasets and individual data points together through grouping, categorizing, and storytelling in order build a coherent understanding.
> * Translating data insights into subjective experiences and using them to explain, predict, gain perspective, or make judgements about real-world phenomenon.
> * Recognizing that data and thinking are interconnected and co-constructed. Data both shapes and is shaped by how we think about and make sense of the world.
> * Continually optimizing and reconsidering one's mental models in light of new data. Data thinking is an iterative process of reframing one's understanding.
> * In summary, data thinking is the active, inquisitive, and meaning-making cognitive process of understanding the world through data. It is a skill that can be developed and honed over time with practice.

Throughout the course, learners create, collect, analyze, visualize, and interpret their experiences and data, and learn state-of-the-art artificial intelligence methods to automate these processes. For example, ChatGPT, GPT-4, Claude, Bing Chat, and Bard are all used to help learners get unstuck faster than one instructors' limited resources are able to support.

Testimonials from learners:

Testimonials from the Data Thinking course I teach to cover my rent:

"And again, this has to be one of the best courses I took. Not only because of the subject, but the whole vibe is quite different for some reason, great work! :)" - continuing education learner in Estonia

"This is the most interesting class I have ever taken in 4 years. Thanks a lot, Jaan for making it engaging - learner in Estonia, Master's in Finance

ā€œI had to take some time off. During that time, I also thought about the "it" factor of this course for me. The main reason is probably that it resembles more of the real world:

  • Things break.
  • You get frustrated after some time.
  • At the next moment, you feel complete joy because you solved the problem and found such a simple solution that it feels like it shouldn't even be possible.
  • The things are not cleaned and prepped. Other courses give you the tidy data and the assignment with clear instructions on how to make it. In some cases, after completing the assignment, it feels like "What was the point of it all?" It almost feels like training the human to write the code like a computer but not taking into account all the errors that may come up on the way to the result or questioning the ways to improve the result or the certainty of it."

šŸŒ Course Webpage

https://courses.cs.ut.ee/2023/chatGPT/spring

šŸŽ“ Registration at University of Tartu

https://ois2.ut.ee/#/courses/LTAT.02.027/version/b9298b84-78e3-9ae5-bedd-1248b2c7f402/details

šŸ“… Course Calendar

https://calendar.google.com/calendar/u/0?cid=Y18zMjg1NTNjMWI5NTI4ZGUwZGNlMDZjZmZiODNhNTJkZmEwYWU0ODY0YmFkYmNmN2Q1MjdmODAyZDhkZmEzZmRlQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20

šŸ—£ļø Course discussion board

https://datathinking.zulipchat.com/

šŸ““ All Course Notebooks

https://github.com/onefact/datathinking.org-codespace/tree/main/notebooks

šŸ“ All Course Collaborative Whiteboards

https://www.figma.com/files/project/82743133/datathinking.org?fuid=1144328219103538368

šŸŽ„ Course Recordings

University of Tartu credentials required:

https://panopto.ut.ee/Panopto/Pages/Sessions/List.aspx?folderID=43bb180c-79a6-4324-b055-afa400ecd1a0

šŸŽ­ February 20, 2023 Lecture

šŸ§˜ Introduction Learning to Suck Emotions Acceptance and Commitment Therapy Exposure Therapy Choosing a Research Topic Experience Reports

šŸ§šŸ’° Data Journey: Severe Acute Respiratory Syndrome Coronavirus-2 Data Journey: J&J, Asbestos, & Talcum Powder Data Journey: ChatGPT & Incentives

šŸ“ˆ Life Expectancy and Carbon Emissions

šŸ““ Notebook

šŸ“ Whiteboard

šŸŽ­ February 23, 2023 Lecture

šŸ§˜ Structure Homework Tools

šŸ’» What Happens When I Press This Button?

šŸ“ˆ

šŸ““ Notebook

šŸ“ Whiteboard

šŸŽ­ February 27, 2023 Lecture

šŸ§˜ What to Do if New Words Are Confusing

šŸ’» Programming

šŸ“ Linear Algebra Calculus Statistics Probability

šŸ’„ ChatGPT Transformers

šŸ“ˆ

šŸ“ Whiteboard

šŸŽ­ March 2, 2023 Lecture

šŸ§˜ Feel, Ask, Do, Think Math Before Code

šŸ’» Version Control Typesetting Mathematics and Code

šŸ’„ Logistic Regression

šŸ“ˆ

šŸ““ Notebook

šŸ“ Whiteboard

šŸŽ­ March 6, 2023 Lecture

šŸ§˜ What Happens When I Click This Button? Lesson Planning

šŸ’„ Embeddings Transformers

šŸ“ˆ

šŸ““ Notebook

šŸ“ Whiteboard

šŸŽ­ March 13, 2023 lecture

šŸ§˜ Informed Consent Jailbreaking Mourning the Loss of AI Prompt: Cognitive Behavioral Therapy for Impostor Syndrome šŸ¦ŗ Jailbreak Prompt: Amplifying Negative Thinking

šŸ’¬ Prompts for Generating Markdown Tables

šŸ’» Embeddings

šŸ“ˆ

šŸ“ Notebook

https://github.com/onefact/datathinking.org-codespace/blob/main/notebooks/230313-embeddings-volume-2.ipynb

šŸ“ Whiteboard

šŸŽ­ March 16, 2023 lecture

šŸ§˜ Active Listening Motivational Interviewing Informed Consent Data Ethics Distress Tolerance

šŸ’¬ Debugging Prompts Prompt Debugging

šŸ’» Text Data Zulip Chat Data Networks Adding a Column-Separated Value (CSV) Data File to a GitHub Repository

šŸ’„ Embeddings

šŸ“ˆ

šŸ“ Notebook

https://github.com/onefact/datathinking.org-codespace/blob/main/notebooks/230316-embeddings-text-data-like-chat-logs-and-networks.ipynb

šŸ“ Whiteboard

šŸŽ­ March 20, 2023 Guest Lecture

Guest lecture: Ismael Ghalimi, šŸ¦ Twitter, šŸŒ Website, šŸ”— LinkedIn

šŸ§˜ Data Types Storytelling Visual Storytelling

šŸ“ˆ Bar Charts Line Charts Histograms

šŸŽ­ March 23, 2023 Guest Lecture

Guest lecture: Pascal Heus, šŸ¦ Twitter, šŸŒ Website, šŸ”— LinkedIn

šŸ§˜ Storytelling Closure Low- and Middle-Income Countries ``

šŸ’» Metadata Survey Data World Bank Data

Pascal writes:

Here are a couple of things I would recommend:

  1. Take a high-level approach and look at the FAIRification process: https://www.go-fair.org/fair-principles/fairification-process/. Then take a dataset through that process.
  2. When you hit step 6 around capturing metadata, maybe dive a bit deeper and start with version 2 of the Data Documentation Initiative, which is essentially a rich data dictionary with extended information about the study. The IHSN website has tools and documentation on that topic like https://www.ihsn.org/projects/DDI-standard and http://ihsn.org/archiving
  3. To put this in practice, I would use this Guide for Data Archivists: https://guide-for-data-archivists.readthedocs.io/en/latest/. Use DDI 2 as your template.

šŸŽ­ March 27, 2023 Lecture

šŸ§˜ Noticing Inertia in Tackling a Question Opportunity in Chaos

šŸ“ˆ Visualizing Timeseries

šŸ““ Notebook

šŸ“ Whiteboard

šŸŽ­ March 30, 2023 lecture

šŸ§˜ If You Never Get Stuck, You Never Learn Self-Monitoring Irreverence Feel, Ask, Do, Think Fear, Worry, Difficulty and Struggle in Learning

šŸ’¬ Copy and Pasting Homework Instructions

šŸ’» Debugging Tools and Heuristics Pair Programming Homework Solutions

šŸ’„ Linear Regression Logistic Regression t-distribution Stochastic Neighbor Embedding Embeddings

šŸ“ˆ Dimensionality Reduction Scatter Plots

šŸ““ Notebook

šŸ“ Whiteboard

šŸŽ­ April 3, 2023 lecture

šŸ§˜ Homework Retrospective Self-Gaslighting Verbal Events are Relational Operants Conditioning Behavior Using Stimulus-Response-Stimulus Tuples Vulnerability Static Analysis of Implicit Meanings of Words

šŸ’¬ Distilling Open-Ended Responses to Qualitative Questions

šŸ’½ + šŸ¤” Defining Data Thinking

šŸ““ Notebook

šŸ“ Whiteboard

šŸŽ­ April 6, 2023 lecture

šŸ’½ + šŸ¤” Research Questions Recursive Research Workflow `

šŸ§˜ Curiosity Inquiry

šŸ’» duckdb SQL Data Build Tool Data Modeling Reference Management with Zotero

šŸ“– Literature Review

šŸ§šŸ’° Citation is the Currency of Knowledge and Information

šŸ“ˆ

šŸ““ Notebook

šŸ“ Whiteboard

šŸŽ­ April 10, 2023 lecture

šŸ’½ + šŸ¤” Collaborative Creation Feel, Ask, Think, Do!

šŸ§˜ Active Listening to Learners: Deciding to Get GPT-4 Access

šŸ“– Pitchbook and CB Insights for Venture Capital Deal Flow Data

šŸ§šŸ’° Venture Capital Landscape Incentives for Hyping Large Language Models Conflicts of Interest Effective Altruism Artificial Intelligence Safety

šŸ“ˆ

šŸ““ Notebook

šŸ“ Whiteboard

šŸŽ­ April 13, 2023 lecture

šŸ’½ + šŸ¤” Systems Thinking `

šŸ“– Pitchbook and CB Insights for Venture Capital Deal Flow Data

šŸ§šŸ’° Venture Capital Landscape Incentives for Hyping Large Language Models Conflicts of Interest Effective Altruism Artificial Intelligence Safety

šŸ““ Notebook

šŸ“ Whiteboard

šŸŽ­ April 17, 2023 lecture

šŸ““ Notebook

šŸ“ Whiteboard

šŸŽ­ April 20, 2023 lecture

šŸ’½ + šŸ¤” Separate Form From Content Separate Message From Messenger Red-Teaming

šŸ“– Pitchbook and CB Insights for Venture Capital Deal Flow Data

šŸ§šŸ’° Venture Capital Landscape Incentives for Hyping Large Language Models Conflicts of Interest Effective Altruism Artificial Intelligence Safety

šŸ““ Notebook

šŸ“ Whiteboard

šŸŽ­ April 24, 2023 lecture

šŸ““ Notebook

šŸ“ Whiteboard

šŸŽ­ April 27, 2023 lecture

šŸ““ Notebook

šŸ“ Whiteboard

šŸŽ­ May 1, 2023 lecture

Holiday in Estonia :)

šŸŽ­ May 4, 2023 lecture

šŸ“ Whiteboard

šŸŽ­ May 8, 2023 lecture

šŸ“ Whiteboard

šŸŽ­ May 11, 2023 lecture

Guest lecture: Stephan Zheng (www.stephanzheng.com) is an AI researcher who has been developing the Al Economist, a multi-agent RL framework for economics. Most recently, he led a research team at Salesforce Research. Previously, he also worked on imitation learning, modeling cooperation, and robustness in deep learning and multi-agent games. His research has been published in leading machine learning conferences and scientific journals, including Science Advances, and has led to multiple patents. It has also been widely covered in US and international media, e.g., the Financial Times, MIT Tech Review, Forbes, Volkskrant, podcasts, and Dutch radio. He has served as an area chair for NeurIPS and ICML. Before machine learning, Stephan studied math and theoretical physics at Utrecht University, Harvard University, and the University of Cambridge, receiving the Lorenz graduation prize for his thesis on exotic dualities in topological string theory. He then switched to machine learning research during his PhD at Caltech. He is originally from the Netherlands.

šŸŽ­ May 15, 2023 lecture

šŸ“ Whiteboard