This week you will start working on your final project. At the same time we will look at the SparkSQL interface (SQL just pops up everywhere huh?). This is a library that lets you use Spark in a similar fashion to SQL. It will automatically convert SQL like queries to the appropriate maps, reduces, collects and so forth in Spark. For tasks that SQL is well suited for, this is a great option. For tasks that are too complex for SQL, you might need to fall back on writing your own functions for maps and folds.
None. I mean its Spark, with SQL. You already did the Spark stuff and you did SQL awhile ago. Go work on your project instead. :)
This sums up our coverage of Spark. This week was lighter in order to give you time to get started on your final project. If you want to use DataFrames in the project you are welcome to but you also don’t need to. They are just one more tool in the tool belt. They make some tasks a lot easier, those tasks being ones like you would do in SQL. They don’t really help for tasks when you need to do a complex set of operations or manipulate the number or kinds of attributes of your data.