Project Week

Introduction

This week is a ‘off week’ in terms of content. If this were an in person class, it would be time for you to work on your project and get help with specific issues that you are personally working with.

Key Questions

  • What topics are you interested in learning more about?
  • What issues are you currently struggling with or feel like are slowing down your progress?
  • How is your project going?

Assignment Overview

Continue work on the week 5 project. You are also strongly encouraged to submit a post to Piazza that gives an overview of what you are doing for your project along with a status report of how it is going. This will help the class get an idea of how their project stacks up in comparison with other projects and give some inspiration for those who are not quite sure what to do.

Additional Resources

If you are all good to go and ahead on your project here are some tools you can take a look at that we are not going to be covering, but are great and useful tools to learn.

Pandas
This library is a good data analysis library. Nothing about it is specific to big-data so we are not going to be spending much time with it, but if you are going to be doing a lot of data work in Python you will probably end up working with it at some point.
REST APIs
If web connected stuff is something you are interested in, you will want to learn about REST APIs. These are basically web servers you can query to get raw data back. So you might query it to get a list of users of a website or you might send it JSON to load into a database. It is a common way of interacting with servers that is handy to know. I do a lot of work with them so if you want to learn more look at the linked overview and feel free to ask me more questions, but again, it is beyond the scope of this class.
Amazon AWS
This is the link to the education portion of the AWS website, you can get some free credit to use their services as a student. I had a hard time picking between AWS and Google Cloud, AWS is more commonly used but has a steeper learning curve. If all of the remote/cloud stuff is something you are interested in you might want to spend some time here. I have experience with some of the services, but there are so many it is hard to know them all.

Review

At this point you have all the tools you need to work on conventional, smaller data sets. This project should let you demonstrate and practice those skills. If there is anything you are unsure about or need additional practice with, this is the time to figure that out and get the rough edges smoothed out. Moving on from here we are going to start working on much larger data sets and start introducing some new tools and flows to scale and distribute loads among many computers in the cloud.