{"data":{"markdownRemark":{"html":"<p>This page explains how to setup and use the Simudyne SDK to run on top of Spark, for distributing your models.</p>\n<p>The first requirement is to install Spark , running standalone or on top of Hadoop YARN. <strong>The required version is Spark 2.2.</strong></p>\n<p>We recommend using the version of Spark running on Cloudera products : <a href=\"https://www.cloudera.com/products/open-source/apache-hadoop/apache-spark.html\">https://www.cloudera.com/products/open-source/apache-hadoop/apache-spark.html</a></p>\n<p>Once Spark is installed you can check it is running correctly lauching the Spark-shell in a terminal : </p>\n<div class=\"gatsby-highlight\" data-language=\"bash\"><pre class=\"language-bash\"><code class=\"language-bash\">./bin/spark-shell</code></pre></div>\n<p>\n  <a\n    class=\"gatsby-resp-image-link\"\n    href=\"/static/bash_spark-shell-cb318999964e5c815243c27abc78d4e5-68bde.png\"\n    style=\"display: block\"\n    target=\"_blank\"\n    rel=\"noopener\"\n  >\n  \n  <span\n    class=\"gatsby-resp-image-wrapper\"\n    style=\"position: relative; display: block; padding: 20px; max-width: 642px; margin-left: auto; margin-right: auto;\"\n  >\n    <span\n      class=\"gatsby-resp-image-background-image\"\n      style=\"padding-bottom: 47.28971962616823%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAJCAYAAAAywQxIAAAACXBIWXMAABJ0AAASdAHeZh94AAABQUlEQVQoz5VS2W6DQAxcQjkDhDOcAYWiqj+VKFKi/P8fTD1uidKqL3kYeXdtj2cMZvn4xDzPOB6PP3HGsizo+h6HYUAvkRjk3DStxqIokKYpkiRBGIZ6Nsbgfr/DtP0BbdOgbVslyPMcu90OcRwr2LDdbrU5iiJ945k1fLdtW2tIeL1eYbphRN912O/3mKZJlVFBVVUPsFkhSpjLsgxlWSLLM30PgkAJb7ebKBy+FTaCcRwVdV0rVLXcmSMB1VcS10FlUSq5H/hKeD6fYab5/UHQidJUCgppWu1RAS36vq9KaNN1XSVYQduMp9MJJowTKfRhWZYm1uQrWHsulwuM9eYgkOme56kCTnccR+Nms3mdMElzsRSrRX4tRlrkr8A7B9EuBxAc+nfQL8KiqpHKnkjAfXH5JCIJC7mKZ/yn+pnwCwBa1gEi5E8DAAAAAElFTkSuQmCC'); background-size: cover; display: block;\"\n    >\n      <picture>\n        <source\n          srcset=\"/static/bash_spark-shell-cb318999964e5c815243c27abc78d4e5-38156.webp 173w,\n/static/bash_spark-shell-cb318999964e5c815243c27abc78d4e5-85678.webp 345w,\n/static/bash_spark-shell-cb318999964e5c815243c27abc78d4e5-5c2c3.webp 690w,\n/static/bash_spark-shell-cb318999964e5c815243c27abc78d4e5-e2dab.webp 1035w,\n/static/bash_spark-shell-cb318999964e5c815243c27abc78d4e5-efaec.webp 1070w\"\n          sizes=\"(max-width: 642px) 100vw, 642px\"\n          type=\"image/webp\"\n        />\n        <source\n          srcset=\"/static/bash_spark-shell-cb318999964e5c815243c27abc78d4e5-c006b.png 173w,\n/static/bash_spark-shell-cb318999964e5c815243c27abc78d4e5-484fe.png 345w,\n/static/bash_spark-shell-cb318999964e5c815243c27abc78d4e5-51909.png 690w,\n/static/bash_spark-shell-cb318999964e5c815243c27abc78d4e5-7c4f0.png 1035w,\n/static/bash_spark-shell-cb318999964e5c815243c27abc78d4e5-68bde.png 1070w\"\n          sizes=\"(max-width: 642px) 100vw, 642px\"\n          type=\"image/png\"\n        />\n        <img\n          class=\"gatsby-resp-image-image\"\n          style=\"width: 100%; height: 100%; margin: 0; vertical-align: middle; position: absolute; top: 0; left: 0; box-shadow: inset 0px 0px 0px 400px white;\"\n          src=\"/static/bash_spark-shell-cb318999964e5c815243c27abc78d4e5-51909.png\"\n          alt=\"bash spark shell\"\n          title=\"\"\n          src=\"/static/bash_spark-shell-cb318999964e5c815243c27abc78d4e5-51909.png\"\n        />\n      </picture>\n      </span>\n  </span>\n  \n  </a>\n    </p>\n<p>You need to identify your Spark master URL which points towards the master node of your cluster. Above the master URL indicates Spark is running locally (master = local[*]). The master URL should generally be a <code class=\"language-text\">spark://host:port</code> type of URL, on a standalone cluster or <code class=\"language-text\">yarn</code> if you use Hadoop YARN.</p>\n<p>You can then start your project from one of the quickstart projects, preconfigured for Spark : </p>\n<ul>\n<li>Maven : <a href=\"https://github.com/simudyne/simudyne-maven-java-spark\">https://github.com/simudyne/simudyne-maven-java-spark</a></li>\n<li>SBT : <a href=\"https://github.com/simudyne/simudyne-sbt-java-spark\">https://github.com/simudyne/simudyne-sbt-java-spark</a></li>\n</ul>\n<p>Clone or download the repository and setup your credentials like a standard simudyne project.</p>\n<p>Uncomment the following line in your properties file to enable Simudyne SDK using Spark as the backend implementation of the SDK :</p>\n<p class=\"code-header\">simudyneSDK.properties</p>\n<div class=\"gatsby-highlight\" data-language=\"scala\"><pre class=\"language-scala\"><code class=\"language-scala\">### CORE<span class=\"token operator\">-</span>ABM<span class=\"token operator\">-</span>SPARK ###\ncore<span class=\"token operator\">-</span>abm<span class=\"token punctuation\">.</span>backend<span class=\"token operator\">-</span>implementation<span class=\"token operator\">=</span>simudyne<span class=\"token punctuation\">.</span>core<span class=\"token punctuation\">.</span>graph<span class=\"token punctuation\">.</span>spark<span class=\"token punctuation\">.</span>SparkGraphBackend</code></pre></div>\n<p>You have then two possibilities to configure Spark properties : </p>\n<ul>\n<li>modify <code class=\"language-text\">core-abm-spark</code> properties in the <code class=\"language-text\">simudyneSDK.properties</code> file</li>\n<li>set configuration parameters as command parameters when using <code class=\"language-text\">spark-submit</code> command</li>\n</ul>\n<p>You must be aware that a property set in the <code class=\"language-text\">simudyneSDK.properties</code> file will override the one passed to the <code class=\"language-text\">spark-submit</code>.</p>\n<p>To run your model, you will need to build a fatJar file which will carry your model, the Simudyne SDKand all the necessary dependencies. You will then need to upload it to the master node of your cluster where you can submit your Spark jobs.</p>\n<p>Here is the command to build your fatJar file :</p>\n<p class=\"code-header\">Maven</p>\n<div class=\"gatsby-highlight\" data-language=\"bash\"><pre class=\"language-bash\"><code class=\"language-bash\">mvn -s settings.xml compile package</code></pre></div>\n<p class=\"code-header\">SBT</p>\n<div class=\"gatsby-highlight\" data-language=\"bash\"><pre class=\"language-bash\"><code class=\"language-bash\">sbt assembly</code></pre></div>\n<p>You can then upload this jar file to your master node via SSH and then submit your job with :</p>\n<div class=\"gatsby-highlight\" data-language=\"bash\"><pre class=\"language-bash\"><code class=\"language-bash\">spark2-submit --class Main --master <span class=\"token operator\">&lt;</span>sparkMasterURL<span class=\"token operator\">></span>  --deploy-mode client --num-executors 30 --executor-cores 5 --executor-memory 30G --conf <span class=\"token string\">\"spark.executor.extraJavaOptions=-XX:+UseG1GC\"</span> --files simudyneSDK.properties name-of-the-fat-jar.jar</code></pre></div>\n<p>You should set <code class=\"language-text\">--num-executors</code>, <code class=\"language-text\">--executor-cores</code>, <code class=\"language-text\">--executor-memory</code> parameters according to your own cluster resources. Useful resource : <a href=\"http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/\">http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/</a></p>","headings":[],"frontmatter":{"title":"Spark setup","toc":null,"experimental":null}},"site":{"siteMetadata":{"title":"Simudyne Docs","latestVersion":"2.6"}}},"pageContext":{"absolutePath":"/home/vsts/work/1/s/content/2.0/reference/distributed_computation/spark_setup.md","versioned":true,"version":"2.0","kind":"reference","pagePath":"/reference/distributed_computation/spark_setup","chronology":{"prev":{"name":"Distributed Computation","path":"/reference/distributed_computation"},"next":{"name":"Alternative Setups","path":"/reference/alternative_setups"}},"lastUpdated":"2026-04-21T13:56:54.827Z"}}