Hello Spark

Introduction

This section introduces the technical details of using Spark. After we have played with it a bit we can start talking about the theory behind it.

This isn’t going to be the kindest of weeks on your confidence, but stick with it. There are likely two hurdles to overcome. One is getting map, reduce and the concept of lambda functions to click. After enough working with it, hopefully it will. The other is getting anything to run on Dataproc as a submitted job. If any one thing is wrong, it messes up the whole job so it is a painful process debugging it and why I strongly suggesting making smaller tables to practice on.

The Process

This is really more of a process that you need to work through that is best demonstrated by seeing it done. So this section is a collection of videos walking you through starting with some data and eventually running jobs on it on Dataproc.