{"data":{"markdownRemark":{"html":"<p>Data about the current state of the simulation can be retrieved as a JSON via the <a href=\":version/reference/rest_api/api_simulations_interactive_id_ticks\">REST API</a>. The Simudyne SDK can also export all simulation data to <a href=\"https://parquet.apache.org/\">Parquet</a> files for further analysis. </p>\n<h5>Note for Windows users</h5>\nYou will need a file named `winutils.exe` to be able to use Parquet on Windows.\n<p>You can find it in the <code class=\"language-text\">hadoop-winutils</code> directory <a href=\"http://content.simudyne.com/$web/hadoop-winutils-master.zip\">here</a>,\nor you can copy-paste the following URL into your browser : <code class=\"language-text\">http://content.simudyne.com/$web/hadoop-winutils-master.zip</code>.</p>\n<p>Once you have downloaded the <code class=\"language-text\">hadoop-winutils</code> , run the <code class=\"language-text\">Winutils_setup.bat</code> batch file to set your environment variable accordingly.</p>\n<p>If you already have an installed version of Hadoop and just lack the <code class=\"language-text\">winutils.exe</code>, you can add it to your <code class=\"language-text\">C:\\hadoop-x.x.x\\bin</code> directory manually.</p>\n<p>When using Parquet on Windows, the system will try to access <code class=\"language-text\">...\\hadoop-winutils\\bin</code> (or <code class=\"language-text\">...\\hadoop-x.x.x\\bin</code> if you already had hadoop installed) to find the file <code class=\"language-text\">winutils.exe</code>.\nIf you are getting error messages like <code class=\"language-text\">Shell Failed to locate the winutils binary in the hadoop binary path</code>,\ncheck that your <code class=\"language-text\">HADOOP_HOME</code> environment variable is set and that your <code class=\"language-text\">winutils.exe</code> is located in the <code class=\"language-text\">bin</code> directory inside the directory of the <code class=\"language-text\">HADOOP_HOME</code> destination.\nFor instance, if the location of your <code class=\"language-text\">hadoop-winutils</code> directory is <code class=\"language-text\">C:\\hadoop-winutils</code>, then <code class=\"language-text\">HADOOP_HOME</code> must be <code class=\"language-text\">C:\\hadoop-winutils</code>.</p>\n<h2 id=\"batch-run-parquet-export\"><a href=\"#batch-run-parquet-export\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Batch run parquet export</h2>\n<p>The Simudyne SDK will not export files to Parquet by default. To enable, set the value of the config field <code class=\"language-text\">core.parquet-export</code> in the <code class=\"language-text\">SimudyneSDK.properties</code> file to true. (<a href=\":version/reference/models/model-configuration\">More about Model Config</a>.)</p>\n<p>The path to create the Parquet files in should be provided in the config field <code class=\"language-text\">core.parquet-export-path</code>. This can be a HDFS path, or a local file system path. If no value is specified for <code class=\"language-text\">core.parquet-export-path</code>, the Parquet files will be dumped to a tmp directory, or the hdfs home if running with spark.</p>\n<p>By default, when running a batch run, Agent and Link data is not serialised, and so not output to parquet. This is to reduce the amount of data being held in memory when sending the batch results to the console. If the data is being output to parquet and does not need to be viewed on the console, the in memory data storage can be turned off allowing the Simudyne SDK to export Agent and Link data to parquet as well as the general Model data. This is done by setting the config field <code class=\"language-text\">core.return-data</code> to <code class=\"language-text\">false</code>. </p>\n<p>For large model runs that produce a lot of data, setting this config field to false will also reduce the amount of memory being held by the simulation, which can help avoid potential OutOfMemory exceptions and improve the efficiency of the model.</p>\n<p>If the data does not need to be displayed on the console, but Agent and Link data is not needed, the config fields <code class=\"language-text\">core-abm.serialize.agents</code> and <code class=\"language-text\">core-abm.serialize.links</code> should be set to false, to avoid generating uncessary data.</p>\n<h2 id=\"scenario-run-parquet-export\"><a href=\"#scenario-run-parquet-export\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Scenario run parquet export</h2>\n<p>Scenario runs do not hold the data in memory because they are not managed by the console, and the data cannot be viewed on the console. This means that Agent and Link data is serialised by default, and so should be explicitly turned off if not needed. (Use the config fields <code class=\"language-text\">core-abm.serialize.agents</code> and <code class=\"language-text\">core-abm.serialize.links</code> to control this.)</p>\n<p>Data export format for scenario runs is controlled via the POST request sent to start the scenario run. (See the scenario REST specification for more details on the POST request <a href=\":version/rest_api/scenario\">here</a>.)</p>\n<p>By default the scenario will output data as json files. To specify the output format as parquet, set the 'format' field in the 'output' section of the POST request.</p>\n<div class=\"gatsby-highlight\" data-language=\"json\"><pre class=\"language-json\"><code class=\"language-json\"><span class=\"token punctuation\">{</span>\n  //Other scenario json fields\n  <span class=\"token property\">\"output\"</span><span class=\"token operator\">:</span> <span class=\"token punctuation\">{</span><span class=\"token property\">\"uri\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"/path/to/export/to\"</span> <span class=\"token punctuation\">,</span> <span class=\"token property\">\"format\"</span><span class=\"token operator\">:</span> <span class=\"token string\">\"parquet\"</span><span class=\"token punctuation\">}</span>\n<span class=\"token punctuation\">}</span></code></pre></div>\n<h2 id=\"model-sampler-parquet-export\"><a href=\"#model-sampler-parquet-export\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Model sampler parquet export</h2>\n<p>The model sampler will always output data to parquet. As with scenarios, the data is not held in memory, so Agent and Link data is serialised by default and should be explicity turned off if not needed using the config fields <code class=\"language-text\">core-abm.serialize.agents</code> and <code class=\"language-text\">core-abm.serialize.links</code>.</p>\n<h2 id=\"interactive-run-parquet-export\"><a href=\"#interactive-run-parquet-export\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Interactive run parquet export</h2>\n<p>In most cases, it will be unecessary to output parquet data when running interactive runs. Therefore, by default parquet data will not be exported when running interactive runs, even if the config field <code class=\"language-text\">core.parquet-export</code> is true. If parquet output is required for interactive runs, the config field <code class=\"language-text\">feature.interactive-parquet-output</code> should be set to true, in addition to the config fields <code class=\"language-text\">core.parquet-export</code> and <code class=\"language-text\">core.parquet-export-path</code>.</p>\n<p>When running an interactive run, the parquet files will be closed (and ready for reading) when the interactive run is deleted or restarted. </p>\n<h2 id=\"files-created\"><a href=\"#files-created\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Files created</h2>\n<p>Multiple Parquet files could be created for each simulation run. The root Parquet file, will contain all output fields related to the model. This includes the values of global fields and accumulators. All complex objects that are nested in the JSON output of the simulation are flattened. For example, if a model's JSON contained nested fields as follows</p>\n <p class=\"code-header\">Model JSON output</p>\n<div class=\"gatsby-highlight\" data-language=\"json\"><pre class=\"language-json\"><code class=\"language-json\"><span class=\"token punctuation\">{</span>\n  <span class=\"token property\">\"someValue\"</span><span class=\"token operator\">:</span> <span class=\"token number\">23</span><span class=\"token punctuation\">,</span>\n  <span class=\"token property\">\"system\"</span><span class=\"token operator\">:</span><span class=\"token punctuation\">{</span>\n    <span class=\"token property\">\"aglobal\"</span> <span class=\"token operator\">:</span> <span class=\"token number\">24</span><span class=\"token punctuation\">,</span>\n    <span class=\"token property\">\"anAccumulator\"</span><span class=\"token operator\">:</span> <span class=\"token punctuation\">{</span>\n      <span class=\"token property\">\"count\"</span><span class=\"token operator\">:</span> <span class=\"token number\">25</span><span class=\"token punctuation\">,</span>\n      <span class=\"token property\">\"value\"</span><span class=\"token operator\">:</span> <span class=\"token number\">26</span>\n    <span class=\"token punctuation\">}</span>\n  <span class=\"token punctuation\">}</span>\n<span class=\"token punctuation\">}</span>  </code></pre></div>\n<p>The Parquet root file created for this would contain the following fields.</p>\n<p>| <code class=\"language-text\">someValue</code> | <code class=\"language-text\">system__aglobal</code> | <code class=\"language-text\">system__anAccumulator__count</code> | <code class=\"language-text\">system__anAccumulator__value</code> |</p>\n<p>The field name is made up of the path to the JSON field where every element is seperated by a double underscore <code class=\"language-text\">__</code>.</p>\n<p>If a model's JSON output contains arrays of objects, such as <code class=\"language-text\">Agents</code> or <code class=\"language-text\">Links</code>, these are exported to seperate Parquet files. (One file per Agent or Link type.) The name of the Parquet file will be the path to the agent.</p>\n <p class=\"code-header\">Model JSON output with agents and links</p>\n<div class=\"gatsby-highlight\" data-language=\"json\"><pre class=\"language-json\"><code class=\"language-json\"><span class=\"token punctuation\">{</span>\n  <span class=\"token property\">\"someValue\"</span><span class=\"token operator\">:</span> <span class=\"token number\">23</span><span class=\"token punctuation\">,</span>\n  <span class=\"token property\">\"system\"</span><span class=\"token operator\">:</span><span class=\"token punctuation\">{</span>\n    <span class=\"token property\">\"Agents\"</span> <span class=\"token operator\">:</span> <span class=\"token punctuation\">{</span>\n      <span class=\"token property\">\"Cell\"</span><span class=\"token operator\">:</span> <span class=\"token punctuation\">[</span>\n        <span class=\"token punctuation\">{</span>\n          <span class=\"token property\">\"alive\"</span><span class=\"token operator\">:</span> <span class=\"token boolean\">false</span><span class=\"token punctuation\">,</span>\n          <span class=\"token property\">\"_id\"</span><span class=\"token operator\">:</span> <span class=\"token number\">0</span>\n        <span class=\"token punctuation\">}</span><span class=\"token punctuation\">,</span>\n        <span class=\"token punctuation\">{</span>\n          <span class=\"token property\">\"alive\"</span><span class=\"token operator\">:</span> <span class=\"token boolean\">true</span><span class=\"token punctuation\">,</span>\n          <span class=\"token property\">\"_id\"</span><span class=\"token operator\">:</span> <span class=\"token number\">1</span>\n        <span class=\"token punctuation\">}</span>\n      <span class=\"token punctuation\">]</span>\n    <span class=\"token punctuation\">}</span><span class=\"token punctuation\">,</span>\n    <span class=\"token property\">\"Links\"</span><span class=\"token operator\">:</span> <span class=\"token punctuation\">{</span>\n      <span class=\"token property\">\"Neighbour\"</span> <span class=\"token operator\">:</span> <span class=\"token punctuation\">[</span>\n        <span class=\"token punctuation\">{</span>\n          <span class=\"token property\">\"_to\"</span> <span class=\"token operator\">:</span><span class=\"token number\">123</span><span class=\"token punctuation\">,</span>\n          <span class=\"token property\">\"_from\"</span><span class=\"token operator\">:</span> <span class=\"token number\">256</span>\n        <span class=\"token punctuation\">}</span><span class=\"token punctuation\">,</span>\n        <span class=\"token punctuation\">{</span>\n          <span class=\"token property\">\"_to\"</span><span class=\"token operator\">:</span> <span class=\"token number\">256</span><span class=\"token punctuation\">,</span>\n          <span class=\"token property\">\"_from\"</span><span class=\"token operator\">:</span> <span class=\"token number\">123</span>\n        <span class=\"token punctuation\">}</span>\n      <span class=\"token punctuation\">]</span>\n    <span class=\"token punctuation\">}</span>\n  <span class=\"token punctuation\">}</span>\n<span class=\"token punctuation\">}</span>  </code></pre></div>\n<p>Parquet files:</p>\n<p><em>root</em></p>\n<p>| <code class=\"language-text\">someValue</code> |</p>\n<p><em>root__system__Agents__Cell</em></p>\n<p>| <code class=\"language-text\">alive</code> | <code class=\"language-text\">_id</code> |</p>\n<p><em>root__system__Links__Neighbour</em></p>\n<p>| <code class=\"language-text\">_to</code> | <code class=\"language-text\">_from</code> |</p>\n<h2 id=\"fields-added\"><a href=\"#fields-added\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Fields added</h2>\n<p>Every Parquet table will also include a field <code class=\"language-text\">tick</code> which tells you which tick this data was produced for and a field <code class=\"language-text\">seed</code> that tells you the random number generator seed being used to for this run of the simulation.</p>\n<h2 id=\"output-directory-structure\"><a href=\"#output-directory-structure\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Output directory structure</h2>\n<p>When exporting data to parquet, the folder layout can be specified in the config using the config field <code class=\"language-text\">core.parquet-export.folder-structure</code>. There are two options supported for this field, <code class=\"language-text\">group-by-type</code> and <code class=\"language-text\">group-by-run</code>. If no value is specified, it will default to <code class=\"language-text\">group-by-type</code>.</p>\n<h3 id=\"group-by-type-structure\"><a href=\"#group-by-type-structure\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Group by type structure</h3>\n<p>When the parquet folder structure is group by type, folders are created for each parquet table type, and a parquet file for each run is created inside these folders.</p>\n<p>For this example, the root export directory passed through the config field <code class=\"language-text\">core.parquet-export-path</code> is /exportFolder.</p>\n<p class=\"code-header\">Group by type batch output folders</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">/exportFolder/\n    {simulation_id}/\n        runs/\n            root/\n              run000.parquet\n              run001.parquet\n              run002.parquet\n            root__system__Agents__Cell\n              run000.parquet\n              run001.parquet\n              run002.parquet\n            metadata.json\n            finished.json</code></pre></div>\n<ul>\n<li>exportFolder -> This is the root export directory</li>\n<li>{simulation_id} -> This is the UUID created for every run of the simulation (This is the ID used with the REST API)</li>\n<li>runs -> The root folder for all Parquet run data</li>\n<li>root -> The data for each parquet table type will be in its own folder    </li>\n<li>run000.parquet, run001.parquet -> The Parquet files created for each run.</li>\n<li>metadata.json -> A file containing some metadata about the data produced.</li>\n<li>finished.json -> An empty file created to signal that no new data will be added to this directory.</li>\n</ul>\n<p class=\"code-header\">Group by type scenario output folders</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">/exportFolder/\n    {simulation_id}/\n        runs/\n            root/\n              scenario0run0001.parquet\n              scenario0run0002.parquet\n              scenario0run0003.parquet\n            root__system__Agents__Cell\n              scenario0run0001.parquet\n              scenario0run0002.parquet\n              scenario0run0003.parquet\n            metadata.json\n            finished.json</code></pre></div>\n<ul>\n<li>exportFolder -> This is the root export directory</li>\n<li>{simulation_id} -> This is the UUID created for every run of the simulation (This is the ID used with the REST API)</li>\n<li>runs -> The root folder for all Parquet run data</li>\n<li>root -> The data for each parquet table type will be in its own folder   </li>\n<li>scenario0.run0.parquet -> The Parquet files created for each run.   </li>\n<li>metadata.json -> A file containing some metadata about the data produced.</li>\n<li>finished.json -> An empty file created to signal that no new data will be added to this directory.</li>\n</ul>\n<p>The model sampler output folders will match the scenario output folders. </p>\n<h3 id=\"group-by-run-structure\"><a href=\"#group-by-run-structure\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Group by run structure</h3>\n<p>When the parquet folder structure is group by runs, folders are created for each simulation run, and a parquet file for each table type is created inside these folders.</p>\n<p>For this example, the root export directory passed through the config field <code class=\"language-text\">core.parquet-export-path</code> is /exportFolder.</p>\n<p class=\"code-header\">Group by run batch output folders</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">/exportFolder/\n    {simulation_id}/\n        runs/\n            run000/\n              root001.parquet\n              root__system__Agents__Cell001.parquet\n            run001/\n              root001.parquet\n              root__system__Agents__Cell001.parquet\n            run002/\n              root001.parquet\n              root__system__Agents__Cell001.parquet\n            metadata.json\n            finished.json</code></pre></div>\n<ul>\n<li>exportFolder -> This is the root export directory</li>\n<li>{simulation_id} -> This is the UUID created for every run of the simulation (This is the ID used with the REST API)</li>\n<li>runs -> The root folder for all Parquet run data</li>\n<li>run0 -> The data for each run of the simulation will be in its own folder     </li>\n<li>root.parquet, root<strong>system</strong>Agents__Cell001.parquet -> The Parquet files created.</li>\n<li>metadata.json -> A file containing some metadata about the data produced.</li>\n<li>finished.json -> An empty file created to signal that no new data will be added to this directory.</li>\n</ul>\n<p class=\"code-header\">Group by run scenario output folders</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">/exportFolder/\n    {simulation_id}/\n        runs/\n            scenario0.run0/\n              root001.parquet\n              root__system__Agents__Cell001.parquet\n            metadata.json\n            finished.json</code></pre></div>\n<ul>\n<li>exportFolder -> This is the root export directory</li>\n<li>{simulation_id} -> This is the UUID created for every run of the simulation (This is the ID used with the REST API)</li>\n<li>runs -> The root folder for all Parquet run data</li>\n<li>scenario0.run0 -> The data for each scenario and run will be in its own folder     </li>\n<li>root.parquet, root<strong>system</strong>Agents__Cell001.parquet -> The Parquet files created.</li>\n<li>metadata.json -> A file containing some metadata about the data produced.</li>\n<li>finished.json -> An empty file created to signal that no new data will be added to this directory.</li>\n</ul>\n<p>The model sampler output folders will match the scenario output folders. </p>\n<h2 id=\"metadatajson\"><a href=\"#metadatajson\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>metadata.json</h2>\n<p>A metadata file is added to the data export giving details about the data. The metadata contains</p>\n<ul>\n<li>model_name -> The name of the model that we can use to query the API</li>\n<li>source -> Simudyne</li>\n<li>source_version -> The version of The Simudyne SDK that produced this data</li>\n<li>format -> Parquet</li>\n<li>creation_date -> The date this data was produced</li>\n<li>schema -> The nested schema that matches this data output</li>\n<li>custom -> Custom data that can be passed through in the create simulation request</li>\n</ul>\n<h2 id=\"finishedjson\"><a href=\"#finishedjson\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>finished.json</h2>\n<p>This is an empty file created at the end of a run to let you know that no new parquet files will be created in this directory.</p>\n<h2 id=\"outputting-to-hdfs\"><a href=\"#outputting-to-hdfs\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Outputting to HDFS</h2>\n<p>When exportting to a cluster there are a few things to make sure you have set up ahead of time.</p>\n<ul>\n<li>core.parquet-export-path cannot use a relative path. A typical format would be like so hdfs://localhost:9000/user/simudyne/output</li>\n<li>Writing of the parquet output is handled by the executors, with the driver node handling the metadata and finished files.</li>\n<li>\n<p>You will want to make sure you have sufficient permissions to be able to write to HDFS from Spark, below are a few methods.</p>\n<ul>\n<li><code class=\"language-text\">hdfs dfs –chmod –R 755 /user/simudyne/output</code></li>\n<li><code class=\"language-text\">hadoop fs -chown -R user:simudyne  /user/simudyne/output</code></li>\n<li>or simply by running the command as the <code class=\"language-text\">hdfs user - sudo -u hdfs spark-submit</code></li>\n</ul>\n</li>\n<li>In order to access the parquet output you will need to access the file as you would normally via HDFS</li>\n<li>You will want to confirm which output structure you are using, for more info <a href=\":version/reference/data_output/parquet_output#output-directory-structure\">check here</a></li>\n<li>The Simudyne SDK has no default set replication factor, if you wish to modify this please follow the normal procedures such as updating your <code class=\"language-text\">hdfs-site.xml</code> and modify/add <code class=\"language-text\">dfs.replication</code> setting the value as desired.</li>\n</ul>\n<p>For more detailed information working with Spark/HDFS in your model(s) please refer to <a href=\":version/reference/distributed_computation/spark_setup\">Spark Setup</a></p>","headings":[{"value":"Batch run parquet export","depth":2},{"value":"Scenario run parquet export","depth":2},{"value":"Model sampler parquet export","depth":2},{"value":"Interactive run parquet export","depth":2},{"value":"Files created","depth":2},{"value":"Fields added","depth":2},{"value":"Output directory structure","depth":2},{"value":"Group by type structure","depth":3},{"value":"Group by run structure","depth":3},{"value":"metadata.json","depth":2},{"value":"finished.json","depth":2},{"value":"Outputting to HDFS","depth":2}],"frontmatter":{"title":"Parquet data export","toc":null,"experimental":null}},"site":{"siteMetadata":{"title":"Simudyne Docs","latestVersion":"2.6"}}},"pageContext":{"absolutePath":"/home/vsts/work/1/s/content/2.2/reference/data_output/parquet_output.md","versioned":true,"version":"2.2","kind":"reference","pagePath":"/reference/data_output/parquet_output","chronology":{"prev":{"name":"Data Output","path":"/reference/data_output"},"next":{"name":"Avro schema","path":"/reference/data_output/avro_schema"}},"lastUpdated":"2026-04-21T13:56:54.836Z"}}