I have been asked about data science curriculum development and delivery at the undergraduate level frequently over the past two years. The path forward is quite mirky. I am hopeful that this blog post provides a specific, relevant and actionable blueprint. I welcome your thoughts and insights.
Let's start with the questions: How do we (Institution X) “add” data science? Which courses would be suitable to include? How does infusing data science into existing courses work? A common conversation and bulk of the inquiries with most, if not all, faculty about data science education is how to assess the feasibility and utility of a proposed data science major, minor and/or certificate given the structure of degree programs. This is a sentiment that seems to be at the core. I believe I can provide a plausible path forward.
The plan of study seems to be a central part of forming a reasonable response. A student must satisfy specific institution-wide categories (sometimes referred to as general education coursework) in addition to their selected major requirements, including core, cognate and elective coursework. All to be done within four years or less for the Bachelor’s degree. Pivoting to integrate data science is complicated. For the students and for us as faculty. Faculty, including me, want to identify suitable paths to increase data learning while also be conscience of our students’ courseload, financial and time constraints. I also think it is imperative that our students are receiving their data learning in the context of their major/discipline.
A Path Forward
I’d suggest an approach that would contextualize data science instruction similar to the Nexus of Data Science at Mount Holyoke. In theory, each department could identify courses to build their students’ data science competencies and skills, specifically crafted to boost their discipline-specific data acumen. A 3-4 data-centric course sequence could provide a depth in data science while not impacting a student’s ability to complete their degree program. These proposed course sequence could be managed by a department with the following criteria:
each course must have at least one element of the data science workflow as its main theme, e.g., data-centric
2 of these data-centric courses must be regularly offered in that department
1 data-centric course must be regularly offered by another department, emphasizing the interdisciplinary nature of data science.
As a sample, let me share the Computer Science (CS) Bachelor degree program at Spelman College (since I work there). The current CS major consists of 12 4-credit courses in CS, 4 seminars (zero or one credit for first year and senior students), 4 mathematics course cognates, and 2 science course cognates.
The mathematics courses are SMTH 231: Calculus I, SMTH 232: Calculus II, SMTH 234: Discrete Mathematics, and a choice of SMTH 214: Linear Algebra, SMTH 205: Statistics, or SMTH 233: Foundations of Mathematics. The science cognates for Computer Science are usually Physics I and II (SPHY 151 & 241 respectively); other science courses intended for majors may be substituted for Physics II.
SCIS 113: Discovering Computer Science: Python
SCIS 123: Computer Science I: Python
SCIS 215: Foundations of Computer Science with Data Structures
SCIS 216: Computer Organization and Design
SCIS 313: Algorithms
SCIS 328: Database Systems
SCIS 343: Operating Systems
SCIS 346: Programming Languages
SCIS 472: Software Engineering
3 CS electives (300-level or above)
SCIS 181, 182 (First Year Seminar, zero credit);
SCIS 481 Senior Seminar I (zero), SCIS 482 Senior Seminar II (1 credit) (includes Sr. paper)
Therefore, certain courses in this data-centric course sequencing for CS majors may satisfy more than one degree requirements. Data-centric courses include SCIS 328: Database Systems, SCIS 445: Information Retrieval and SCIS 475: Special Topics in Data Science. However, SCIS 328 is a core major requirement with the following pre-requisites (in order): SCIS 113, SCIS 123, SCIS 215 and SCIS 313. SCIS 445 and SCIS 475 are considered CS electives; hence, the pre-requisites are: SCIS 113, SCIS 123 and SCIS 215.
Students majoring in computer science already take 4 Mathematics courses. I’d suggest that a students take SMTH 205 to satisfy one of the mathematics cognate for the CS degree and SMTH 214 to satisfy this sequence. Note that SMTH 214 has a pre-requisite of SMTH 231. The student could then be taking 1 additional mathematics course than outlined in the Computer Science degree program.
Data science is an interdisciplinary field and I firmly believe that each data-centric course sequencing should reflect those principles of liberal arts education. For the sciences, SPSC 270 Data Science in the Social Sciences could be an option and it has no prerequisites. A course focused on basic research methodology, data storytelling and data ethics could be good alternatives. Science-based options may include SMTH 205, SMTH 214 or SCIS 111 if it is not part of your department’s degree requirements and any prerequisites are already satisfied by the student.
Below is a sample data-centric course sequence for CS majors and when each course could be taken based on recent course offerings. Note: a prerequisite is shown as a laptop icon, a course associated with the data-centric course sequence is shown as a basketball icon and the database course is presented as a database icon. For the course listing with descriptions, it is available here.