About
This is the website for ORIE 5270: Big Data Technologies and ORIE 6125: Computational Methods in Operations Research.
Summary
This course offers a broad overview of computational techniques and mathematical
skills useful for data scientists. Topics include: UNIX shell, regular expressions,
version control (git
), data structures and algorithms, working with databases,
data analysis using Python and related libraries (pandas
, numpy
/ scipy
, sklearn
),
parallel computing (Map-Reduce, Spark, Hadoop), an overview of standard machine
learning and optimization algorithms, and time-permitting, a guided tour of
functional programming.
Admin
Instructor: Vasilis Charisopoulos (vc333[at]cornell.edu) - OH: Wed 9-10pm, Fri 11am-12pm.
TAs:
- Sabhya Chhabria - OH: Tue 12-1pm
- Aahil Awatramani - OH: Tue 9-10pm
Lectures:
- ORIE 5270: Monday - Wednesday 10:10 - 11:00am US EST, on Zoom.
- ORIE 6125: Monday - Wednesday - Friday, 10:10 - 11:00am US EST, on Zoom.
Zoom links for lectures and office hours are available in Canvas. Lecture recordings are also available under Canvas > Panopto Recordings.
Campuswire
Except for live lectures (Zoom) and lecture recordings (Canvas), we will be using Campuswire for course announcements, lecture slides, and all other communication (instead of Piazza). Details for joining are available under the course Canvas (look under Modules > Campuswire). If you haven't been able to enroll yet but would like to access Campuswire, please send me an email (important: use your Cornell email when emailing me.)
Grade
ORIE 5270
The major portion of your grade will come from homework assignments (90%). All assignments are weighted equally.
The remaining 10% will be based off of participation (being active on the class forum, completing the course & TA evals, etc.). You can fulfill this requirement in multiple ways: for example, answering 3-4 questions on Campuswire through the course of the semester and completing the course evals will be enough to get full participation credit. Since you might be following the class asynchronously, attending lectures is not a requirement to obtain full participation points.
ORIE 6125
Your grade will be broken down as follows:
- Homeworks (45%)
- Final project (45%)
- Participation (10%)
Homework
There will be a total of 7 homework assignments, released roughly every 2 weeks Tentatively, these will be released and due Fridays at 12pm (noon) US EST. We will use Gradescope for homework submissions.
If you submit the course evaluation at the end of the semester, you will be allowed to drop your lowest homework grade. You also get a total of 7 slip days throughout the semester that you can use to turn in assignments late. These are meant to help you in case of personal emergencies, travel, job interviews, or light sickness. You are responsible for keeping track of the number of slip days you have used up.
Note: if you fall ill due to COVID, you can reach out to SDS to request COVID-related accomodations. Extensions warranted by a documented COVID case (or any form of serious illness) will not "use up" any of your slip days.
Regrade requests
For each assignment, you will be to submit regrade requests (via Gradescope's interface) for up to a week after its grades are released.
Final project (ORIE 6125)
A major component of 6125 is a final project, where you will have a chance to combine tools and techniques we will go over throughout the semester.
Goals
The goal is for you to create (or extend) a project utilizing several concepts, tools and techniques covered in class. For example, your project could involve:
- Version control (preferably
git
) - Unit testing for new features
- Documentation for any public APIs, if applicable
- HTML doc & visualization of the project
- Attention to performance, where relevant (you don't need to extensively tune your project, but you should e.g. make informed decisions about the kind of data structures you use in core routines)
Your project can be related to your research, and need not be started from scratch. If there is a problem you have been working on and it needs a computational study, now might be the time to do it! Likewise, if you previously created a library or program to solve a certain problem, you can extend that project with new functionality, improved performance, updated documentation, and more complete unit tests.
Dates & Deliverables
There are three deliverables: a project proposal, a short writeup / report of what you did, and the source code and other components of the project itself.
Project proposal: by March 31st Friday, April 16th, you should have a project proposal
hosted on a repository for your project (preferably at Cornell's Github) describing:
-
List of team members
-
Short motivation for the project
-
Description of expected deliverables (library / API / website or whatever else is applicable to your project)
-
Description of tools you expect to use
Please email me (vc333[at]cornell.edu
) a link to your repository using the subject line "ORIE 6125 Project" by
the aforementioned due date.
Final report: By a date TBA (most likely the end of exams), you should provide the following in your project's repository:
-
A document explaining the project (no more than 3-4 pages), including:
- Motivation
- Implementation details and challenges
- Computational results, if any - timing tests, problems solved, etc.
- Potential future directions
-
Documentation
This will depend on the project and the language you chose to implement it in, but there should be a set of documentation that can be accessed separately from the source code. Ideally, this documentation should be automatically generated from the source code using some documentation tool (e.g.
pydoc
orSphinx
). -
Source code
You should include source files, test files, and a README file explaining how to use your software / get started.
-
(Optional) A brief description with links to the documentation (in HTML). This will be hosted on the web page for future courses to reference.
Past projects
Here are some example projects from past iterations of the course.
- RandomStreams.jl - by Patrick Steele, Stephen Pallone, James Dong.
A Julia implementation of the MRG32k3a random number generator proposed by Pierre L'Ecuyer. Allows creation of multiple statistically independent random number streams and substreams.
- Fast Fourier Transform - by Emily Fischer, Dave Lingenbrink, David Eckman, Venus Lo, Wei Qian.
Implementation of inverse Laplace transforms, Fast Fourier transforms and their inverses.
- Support Vector Machines with Rejection - by Matthew Zalesak, Andrew Daw, Samuel Gutekunst.
Implementation of a support vector machine that allows rejection of outliers.