Terrascope
  • Overview
  • Get started
  • Introduction
    • Terrascope Introduction
    • The Copernicus Programme
    • Registration and authentication
  • Data
    • Sentinel Missions
    • Sentinel-1
    • Sentinel-2
    • Sentinel-3
    • Sentinel-5P
    • PROBA-V mission
    • PROBA-V
    • SPOT-VGT mission
    • SPOT-VGT
    • Additional Products
  • APIs
    • catalogue APIs
    • OpenSearch
    • TerraCatalogueClient
    • STAC
    • Product download
    • Streamlined Data Access APIs
    • openEO
    • Additional Web-services
    • CropSAR Service
    • Web Map Service (WMS)
    • Web Map Tile Service (WMTS)
  • Tools
    • Terrascope GUI
    • Terrascope Viewer
    • openEO web editor
    • Virtual Environments
    • Virtual Machine
    • JupyterLab
    • Hadoop Cluster
    • EOplaza
    • Getting started
    • Manage your Organisation
    • Publish a Service
    • Execute a Service
    • Manage a Service
    • Reporting
  • Quotas and Limitations
  • Support
    • Contact
    • Terrascope Forum
    • Terrascope Sample Examples
  • FAQ

Starting your first Spark job

To speed up processing, it is often desirable to distribute the various processing components over a number of machines. The easiest way is to use the Apache Spark processing framework. The fastest way to get started, is to read the Spark Quick Start. The Spark version installed on VITO’s Terrascope cluster is 2.3.2, so it is recommended to stick to this version. However, a newer version is available on request if really required. Spark is also installed on your Virtual Machine, so you can run spark-submit from the command line.

To run jobs on the Hadoop cluster, the yarn-cluster mode has to be used, and you need to authenticate with Kerberos. For the authentication, just run kinit on the command line. You will be asked to provide your password. Two other useful commands are klist to show whether you have been authenticated, and kdestroy to clear all authentication information. After some time, your Kerberos ticket will expire, so you’ll need to run kinit again.

If you want to use Python 3.8 or above, you’ll have to use Spark 3, as Spark 2 support was dropped from Python 3.8. There is a Spark 3 example available in the Python Spark example below.

A Python Spark example is available, which should help you getting started.

Back to top

Copyright 2018 - 2024 VITO NV All Rights reserved

 
  • Terms of Use

  • Privacy declaration

  • Cookie Policy

  • Contact