Spark graph

Last updated on 16th July 2024

This is an experimental feature!

Experimental features are early versions for users to test before final release. We work hard to ensure that every available Simudyne SDK feature is thoroughly tested, but these experimental features may have minor bugs we still need to work on.

If you have any comments or find any bugs, please share with support@simudyne.com.

This page explains how to setup and use the Simudyne SDK to distribute a single simulation on Spark.

Spark graph backend

The spark graph backend allows you to run a single large graph on a Spark cluster. Running a distributed graph simulation depends on the package core-graph-spark which needs to be imported in your project:

pom.xml

<dependency>
    <groupId>simudyne</groupId>
    <artifactId>simudyne-core-graph-spark_2.11</artifactId>
    <version>${simudyne.version}</version>
</dependency>

To enable Simudyne SDK using Spark as the backend implementation of the SDK, you need to uncomment the following line in your properties file:

simudyneSDK.properties

### CORE-ABM-SPARK ###
core-abm.backend-implementation=simudyne.core.graph.spark.SparkGraphBackend

Then you need to configure the properties related to core-abm on Spark. You have two possibilities to configure them:

  • modify core-abm-spark properties in the simudyneSDK.properties file
  • set configuration parameters as command parameters when using spark-submit command

Some properties are already listed with default values in simudyneSDK.properties:

### CORE-ABM-SPARK ###
core-abm-spark.master-url = local[*]
core-abm-spark.checkpoint-directory = /var/tmp
core-abm-spark.log-level = WARN
# core-abm-spark.spark.executor.memory = 2g
# core-abm-spark.spark.sql.shuffle.partitions = 24

You must be aware that a property set in the simudyneSDK.properties file will override the one passed to the spark-submit.

You can then submit your job using spark-submit. Here is a example with some configurations options:

spark-submit --class Main --master <sparkMasterURL>  --deploy-mode client --files simudyneSDK.properties,licenseKey name-of-the-fat-jar.jar

This command will run the main function of the class Main and distribute it on Spark. You can then access the console through the config parameters nexus-server.hostname and nexus-server.port.

They default to localhost and 8080. You can also interact with the server through the REST API

spark-submit allows you to configure Spark. You need to choose a configuration that best suits your cluster. To learn more about Spark configuration, refer to the official documentation.

Some useful resources can be found on Cloudera's website.