CSC 480/680: Introduction to Data Mining

Date	Topic	Module / Book Chapter	Deadlines
Week 1
Jan 14	Introduction to Data Mining	1	Assignment 1 Release
Jan 17	Conceptual Overview	2+3
Week 2
Jan 21	Data Manipulation	2+3
Jan 24	Linear Regression	4	Assignment 1 Deadline Assignment 2 Release
Week 3
Jan 28	Logistic Regression	4
Jan 31	Naïve Bayes	/	Assignment 2 Deadline
Week 4
Feb 04	Instance Based Learning	/
Feb 07	Support Vector Machines	5
Week 5
Feb 11	Evaluation Techniques	3
Feb 14	Decision Trees	6	Assignment 3 Release
Week 6
Feb 18	Ensemble Learning Soft/Hard Voting, Bagging Tree Ensembles: Random Forest	7
Feb 21	Ensemble Learning Boosting, Stacking, ECOC Tree Ensembles: Gradient Boosting	7	Assignment 3 Deadline
Week 7
Feb 25	Dimensionality Reduction (Online)	8	Pool of Papers Release
Feb 28	Neural Networks (Online)	4+10	Project Proposal Submission
Week 8
Mar 04	Midterm
Mar 07	Neural Networks	4+10
Week 9
Mar 11	Spring Break
Mar 14	Spring Break
Week 10
Mar 18	Deep Learning	11-14	Assignment 4 Release
Mar 21	Deep Learning	11-14
Week 11
Mar 25	NLP	16
Mar 28	Time Series	15	Assignment 4 Deadline
Week 12
Apr 01	Clustering	9
Apr 04	Feature Selection		Paper Critiques Deadline
Week 13
Apr 08	Class Imbalance
Apr 11	One Class Learning	17
Week 14
Apr 15	Paper Presentations	/
Apr 18	Paper Presentations	/
Week 15
Apr 22	Final Project Presentation	/
Apr 25	Final Project Presentation	/

Grading

CSC-480

Component	Weight
Homework Assignments	25%
Midterm Exam	30%
Critiques of 5 research papers	10% = 5 x 2%
Presentation of 1 research paper	5%
Final Project + Presentation	30% (25% + 5%)

CSC-680

Component	Weight
Homework Assignments	20%
Midterm Exam	30%
Critiques of 10 research papers	10% = 10 x 1%
Presentation of 2 research papers	5% = 2 x 2.5%
Final Project + Presentation	35% (30% + 5%)

Attendance

Students are recommended to attend all lectures. Prolonged absences must be discussed with the instructor. If you cannot attend lectures regularly, due to work or other obligations during remote learning, then please reach out to the instructor so that I know about it.

Exams

Exams cover the material from the lectures, projects, and reading. While not necessarily cumulative, each exam will require understanding many of the concepts covered in the preceding exams. Exams consist of multiple choice, short answer, and long answer questions.

For the Final Project, students will propose their own topic in consultation with the instructor. Project proposals will be due in mid-semester.

Late Submissions

A penalty of 5% per day will be levied. The course doesn’t grant extension on the homework/lab/project submission deadline unless you have an extremely compelling excuse as observance of a religious holiday (in which case you need to let me know in advance).

Letter Grades

Range	Letter
>=93	A
>=90	A-
>=87	B+
>=83	B
>=80	B-
>=77	C+
>=73	C
>=70	C-
>=60	D
<60	F

Textbook

This course adopts the textbook "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow", 2nd Edition by Aurélien Géron.
The online version of the book may be accessible for free from AU’s online Library After selecting "O’Reilly Online Learning" from the list and logging in with your AU account, you should be able to search for the book by name, or try accessing it from this link.

Academic Integrity

Even though we encourage collaboration with a partner, sharing code between groups is strictly forbidden - this is a form of plagiarism. As is showing your work to other students, even just for a second. There is rarely one single correct way to write code that solves a problem. While we want you to feel free to discuss your approach freely with a partner, you should know that there are often many solutions for a given problem and it's typically obvious when one student shares code with another. If you directly copy and paste code from the Internet (or even the text), cite your source in your comments (but also ensure that you understand what the code is doing - not all code on the web is good!). Assignments will be checked using plagiarism detection software and by hand to ensure the originality of the work.

Do not share your code with anyone other than a partner. Do not let someone look at your screen. You may get behind, or your friend may ask for help, but the consequences for plagiarism are far worse than an incomplete submission - for the submission, you will still likely get some points. If I suspect that you have purposely shared code with another student or presented someone else's work as your own, the matter will be referred to the Academic Integrity Code Administrator for adjudication. If you are found responsible for an academic integrity violation, sanctions can include a failing grade for the course, suspension for one or more academic terms, dismissal from the university, or other measures as deemed appropriate by the Dean.

All students are expected to adhere to the American University Honor Code. If you have a question about whether or not something is permissible, ask the instructor or the TA first.

Generative AI Policy

In regards to Generative AI models such as ChatGPT, you may use them only for homework assignments to assist you in coding the outline of your pipeline or trobleshooting specific parts.

The use of AI models should be reasonable and responsible. For example, AI-generated code may contain technical or conceptual issues that should be manually fixed. Another issue is adopting programming concepts and libraries not seen in class. Both scenarios will be considered as a deviation from the prompt and will be subject to grade penalties.

If you use such models, acknowledge which parts were facilited by AI in your report accompanying the code submission. You should not use them to generate complete solutions to the homework problems in this course that you submit as your own work.

It is reasonable to expect that tools like this will eventually be integrated into the workflows of many businesses in the future, however, while you are still learning the fundamentals of computer science the process of designing machine learning pipelines and writing code is just as important as the final outcome. Complete solutions to coding exercises generated by AI models are considered academic plagiarism, and will be referred to the Academic Integrity Office of American University just as if you had copied the work of a friend, website, or online tutor.

Acknowledgments

Course design by Roberto Corizzo at American University.

Thanks to Leah Ding and Nathalie Japkowicz at American University for discussions and contributions that inspired the design and the materials of this course. Thanks to Alex Godwin at American University for designing this syllabus template.

Introduction to Data Mining
[CSC 480/680 - Spring 2025]

General Course Info

Course abstract

AU Core Quantitative Literacy II (Q2) Outcomes:

Course Schedule

Syllabus

Grading

CSC-480

CSC-680

Attendance

Exams

Late Submissions

Letter Grades

Textbook

Academic Integrity

Generative AI Policy

Acknowledgments

Introduction to Data Mining[CSC 480/680 - Spring 2025]

General Course Info

Course abstract

AU Core Quantitative Literacy II (Q2) Outcomes:

Course Schedule

Syllabus

Grading

CSC-480

CSC-680

Attendance

Exams

Late Submissions

Letter Grades

Textbook

Academic Integrity

Generative AI Policy

Acknowledgments

Introduction to Data Mining
[CSC 480/680 - Spring 2025]