Experimental features are early versions for users to test before final release. We work hard to ensure that every available Simudyne SDK feature is thoroughly tested, but these experimental features may have minor bugs we still need to work on.
If you have any comments or find any bugs, please share with support@simudyne.com.
This page explains how to setup and use the Simudyne SDK to distribute a single simulation using Spark and Akka.
Spark is used as a cluster management tool while Akka manages all the communication logic.
Distributed Graph Backend
The distributed graph backend allows you to run large graphs on a cluster of machines.
Running a distributed graph simulation depends on the core-graph-distributed package which needs to be imported in your project:
Simudyne has two implementations of the distributed graph. One runs locally using subprocesses and can be used for testing purposes. This implementation does NOT require Spark.
The other implementation distributes the load using Spark and thus needs to be provided with a working Spark cluster.
Testing model in distributed environment
Aimed towards model developers, we have included a way of running distributed graph using subprocesses,
without need for spark, in order to establish model's compatibility and distributed-graph readiness.
Properties
Both implementations share the same main properties.
You need to configure the properties related to core-graph-distributed.
You have two possibilities to configure them:
modify the properties in the simudyneSDK.properties file (enforced for local implementation)
set configuration parameters as command parameters when using spark-submit command (only available using the Spark implementation)
Some properties are already listed with default values in simudyneSDK.properties:
core.graph.experimental.timeouts.base: due to the distributed nature of the environment, the distributed graph needs a setting to be able to establish when the
communication between nodes should be considered slow or failing. Observing that the steps take a similar time to the value this is set to, is a likely indicator of potential failure.
core.graph.experimental.clusterSize: Controls how many nodes are required in order for the graph to be considered ready for work.
When using the local implementation, controls the number of spawned subprocesses.
When using the Spark implementation, controls the number of partitions. Remember to set --executor-cores and --num-executors during spark-submit.
Limitations
The Distributed graph comes with limitations, most notable being:
no support for Immutable Schema, please set feature.immutable-schema=false
no support for dynamic agent creation/stopping
no support for SystemMessages
You also might want to disable the health check for models by setting nexus-server.health-check to false in order to avoid their distribution.
Local Implementation
To test your model running on Distributed Graph, add the following lines to your properties file:
Please be aware that properties set in simudyneSDK.properties file takes precedence over options passed to spark-submit.
You can then submit your job using spark-submit. Here is an example:
spark-submit --class Main --master yarn --deploy-mode client --files simudyneSDK.properties,licenseKey hdfs://path/name-of-the-fat-jar.jar
This command will run the main function of the class Main and distribute it on Spark. You can then access the console through the config parameters nexus-server.hostname and nexus-server.port.
They default to localhost and 8080. You can also interact with the server through the REST API
spark-submit allows you to configure Spark. You need to choose a configuration that best suits your cluster.
To learn more about Spark configuration, refer to the official documentation.