Distributed Graph

Last updated on 16th July 2024

Experimental features are early versions for users to test before final release. We work hard to ensure that every available Simudyne SDK feature is thoroughly tested, but these experimental features may have minor bugs we still need to work on.

If you have any comments or find any bugs, please share with support@simudyne.com.

This page explains how to setup and use the Simudyne SDK to distribute a single simulation using Spark and Akka. Spark is used as a cluster management tool while Akka manages all the communication logic.

Distributed Graph Backend

The distributed graph backend allows you to run large graphs on a cluster of machines. Running a distributed graph simulation depends on the core-graph-distributed package which needs to be imported in your project:

pom.xml

<dependency>
    <groupId>simudyne</groupId>
    <artifactId>simudyne-core-graph-distributed_2.11</artifactId>
    <version>${simudyne.version}</version>
</dependency>

Simudyne has two implementations of the distributed graph. One runs locally using subprocesses and can be used for testing purposes. This implementation does NOT require Spark. The other implementation distributes the load using Spark and thus needs to be provided with a working Spark cluster.

Testing model in distributed environment

Aimed towards model developers, we have included a way of running distributed graph using subprocesses, without need for spark, in order to establish model's compatibility and distributed-graph readiness.

Properties

Both implementations share the same main properties.

You need to configure the properties related to core-graph-distributed. You have two possibilities to configure them:

modify the properties in the simudyneSDK.properties file (enforced for local implementation)
set configuration parameters as command parameters when using spark-submit command (only available using the Spark implementation)

Some properties are already listed with default values in simudyneSDK.properties:

### CORE ABM distributed (experimental)
# core.graph.experimental.timeouts.base=240
core.graph.experimental.clusterSize=3

core.graph.experimental.timeouts.base: due to the distributed nature of the environment, the distributed graph needs a setting to be able to establish when the communication between nodes should be considered slow or failing. Observing that the steps take a similar time to the value this is set to, is a likely indicator of potential failure.
core.graph.experimental.clusterSize: Controls how many nodes are required in order for the graph to be considered ready for work. When using the local implementation, controls the number of spawned subprocesses. When using the Spark implementation, controls the number of partitions. Remember to set --executor-cores and --num-executors during spark-submit.

Limitations

The Distributed graph comes with limitations, most notable being:

no support for Immutable Schema, please set feature.immutable-schema=false

no support for dynamic agent creation/stopping

no support for SystemMessages

You also might want to disable the health check for models by setting nexus-server.health-check to false in order to avoid their distribution.

Local Implementation

To test your model running on Distributed Graph, add the following lines to your properties file:

simudyneSDK.properties

core-abm.backend-implementation=simudyne.core.graph.experimental.dig.treelike.backend.SubprocessBackend
feature.immutable-schema=false

You can then start the server and control the simulation through the Console or the REST API without further configuration.

Cluster Implementation

To let Spark manage nodes of the Distributed Graph implementation, add the following:

simudyneSDK.properties

core-abm.backend-implementation=simudyne.core.graph.experimental.dig.treelike.backend.spark.ClusterBackend
core.graph.experimental.distributed.master-url=local[*]
core.graph.experimental.distributed.log-level=WARN
feature.immutable-schema=false

Please be aware that properties set in simudyneSDK.properties file takes precedence over options passed to spark-submit.

You can then submit your job using spark-submit. Here is an example:

spark-submit --class Main --master yarn  --deploy-mode client --files simudyneSDK.properties,licenseKey hdfs://path/name-of-the-fat-jar.jar

This command will run the main function of the class Main and distribute it on Spark. You can then access the console through the config parameters nexus-server.hostname and nexus-server.port. They default to localhost and 8080. You can also interact with the server through the REST API

spark-submit allows you to configure Spark. You need to choose a configuration that best suits your cluster. To learn more about Spark configuration, refer to the official documentation.

Some useful resources can be found on Cloudera's website.

Spark Graph Distributed Requirements