Last updated on 16th July 2024
Data about the current state of the simulation can be retrieved as a JSON via the REST API. The Simudyne SDK can also export all simulation data to Parquet files for further analysis.
The Simudyne SDK will not export files to Parquet by default. To enable, set the value of the config field core.parquet-export
in the SimudyneSDK.properties
file to true. (More about Model Config.)
The path to create the Parquet files in should be provided in the config field core.parquet-export-path
. This can be a HDFS path, or a local file system path. If no value is specified for core.parquet-export-path
, the Parquet files will be dumped to a tmp directory, or the hdfs home if running with spark.
You can find it in the hadoop-winutils
directory here,
or you can copy-paste the following URL into your browser : http://content.simudyne.com/$web/hadoop-winutils-master.zip
.
Once you have downloaded the hadoop-winutils
, run the Winutils_setup.bat
batch file to set your environment variable accordingly.
If you already have an installed version of Hadoop and just lack the winutils.exe
, you can add it to your C:\hadoop-x.x.x\bin
directory manually.
When using Parquet on Windows, the system will try to access ...\hadoop-winutils\bin
(or ...\hadoop-x.x.x\bin
if you already had hadoop installed) to find the file winutils.exe
.
If you are getting error messages like Shell Failed to locate the winutils binary in the hadoop binary path
,
check that your HADOOP_HOME
environment variable is set and that your winutils.exe
is located in the bin
directory inside the directory of the HADOOP_HOME
destination.
For instance, if the location of your hadoop-winutils
directory is C:\hadoop-winutils
, then HADOOP_HOME
must be C:\hadoop-winutils
.
Multiple Parquet files could be created for each simulation run. The root Parquet file, named root.parquet will contain all output fields related to the model. This includes the values of global fields and accumulators. All complex objects that are nested in the JSON output of the simulation are flattened. For example, if a model's JSON contained nested fields as follows
Model JSON output
{
"someValue": 23,
"system":{
"aglobal" : 24,
"anAccumulator": {
"count": 25,
"value": 26
}
}
}
The Parquet root file created for this would contain the following fields.
| someValue
| system__aglobal
| system__anAccumulator__count
| system__anAccumulator__value
|
The field name is made up of the path to the JSON field where every element is seperated by a double underscore __
.
If a model's JSON output contains arrays of objects, such as Agents
or Links
, these are exported to seperate Parquet files. (One file per Agent or Link type.) The name of the Parquet file will be the path to the agent.
Model JSON output with agents and links
{
"someValue": 23,
"system":{
"Agents" : {
"Cell": [
{
"alive": false,
"_id": 0
},
{
"alive": true,
"_id": 1
}
]
},
"Links": {
"Neighbour" : [
{
"_to" :123,
"_from": 256
},
{
"_to": 256,
"_from": 123
}
]
}
}
}
Parquet files:
root
| someValue
|
root__system__Agents__Cell
| alive
| _id
|
root__system__Links__Neighbour
| _to
| _from
|
Every Parquet table will also include a field tick
which tells you which tick this data was produced for and a field seed
that tells you the random number generator seed being used to for this run of the simulation.
Folders are created in the root export directory passed through the config field core.parquet-export-path
, to seperate and identify the data for different runs.
Scenario run folders
exportFolder/
{simulation_id}/
runs/
scenario0.run0/
root.parquet
root\_\_system.parquet
metadata.json
Batch/Interactive run folders
exportFolder/
{simulation_id}/
runs/
run0/
root.parquet
root\_\_system.parquet
metadata.json
DONE
A metadata file is added to the data export giving details about the data. The metadata contains