Web Interfaces

Introduction

There are two main ways to access the various web interfaces on Google Cloud Dataproc. The first is by setting up an SSH tunnel. This is the more secure and recommended way to do things. Follow the instructions provided by Google to do this. When it talks about using a web browser, you can also make a shortcut.

Chrome Shortcut

My target here is:

"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --proxy-server="socks5://localhost:1080" --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" --user-data-dir=/tmp/cs512-demo-spark-m    

And I can then connect to addresses like http://cs512-spark-m:18080 using that browser. cs512-spark-m is the name of the master node in Google Compute Engine.

Firewall Option

The other option, not documented by Google which is a little less secure but easier to setup up is to make an exception for the ports you want access to in the firewall.

Firewall Interface

You need to add a firewall rule that filters for your computers IP address. If you just Google for the exact phrase ‘my ip’ it will show you your IP address. Then add an exception for TCP on port 18080. If you use a computer on a different network or your networks IP address changes you will need to update the filter.

You can check if this works by connecting to your master nodes ip address on port 18080 in your browser. It would look like http://123.456.789.123:18080 with the middle portion being replaced by your actual nodes ip address seen in the interface here:

Master IP address

Keep in mind this will only work when your cluster is turned on and the IP will only be visible when the cluster is turned on.