There are two main ways to access the various web interfaces on Google Cloud Dataproc. The first is by setting up an SSH tunnel. This is the more secure and recommended way to do things. Follow the instructions provided by Google to do this. When it talks about using a web browser, you can also make a shortcut.
My target here is:
"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --proxy-server="socks5://localhost:1080" --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" --user-data-dir=/tmp/cs512-demo-spark-m
And I can then connect to addresses like http://cs512-spark-m:18080
using that browser. cs512-spark-m
is the name of the master node in Google Compute Engine.
The other option, not documented by Google which is a little less secure but easier to setup up is to make an exception for the ports you want access to in the firewall.
You need to add a firewall rule that filters for your computers IP address. If you just Google for the exact phrase ‘my ip’ it will show you your IP address. Then add an exception for TCP on port 18080. If you use a computer on a different network or your networks IP address changes you will need to update the filter.
You can check if this works by connecting to your master nodes ip address on port 18080 in your browser. It would look like http://123.456.789.123:18080
with the middle portion being replaced by your actual nodes ip address seen in the interface here:
Keep in mind this will only work when your cluster is turned on and the IP will only be visible when the cluster is turned on.