{"data":{"markdownRemark":{"html":"<p>Apache Hive provides a SQL-like interface to query data built on top of Apache Hadoop, and allows Hadoop users to extract this data for further analysis with ease and at scale. The Simudyne SDK (as of version 2.4) allows user to specify the output of their data to be stored in Hive tables in a  <a href=\"https://parquet.apache.org/\">Parquet</a> format.</p>\n<h2 id=\"hive-parquet-export\"><a href=\"#hive-parquet-export\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Hive Parquet Export</h2>\n<p>The Simudyne SDK will not export to Hive by default. To enable, set the value of the config field <code class=\"language-text\">core.hive-export.enabled</code> in the <code class=\"language-text\">simudyneSDK.properties</code> file to true. </p>\n<p>You must also set <code class=\"language-text\">core.hive-export-path=hive2://localhost:10000/default</code> changing the localhost and port as required. The default refers to the table the data will be populated too.</p>\n<p>Also required are the fields <code class=\"language-text\">core.export.username</code> and <code class=\"language-text\">core.export.password</code> in order for authentication to the Hive server to be completed.</p>\n<p>Furthermore, there are two additional details both for local Parquet and Hive output that a user may wish to change. These are <code class=\"language-text\">core.data-export.generic-flush</code> and <code class=\"language-text\">core.data-export.values-flush</code>. These typically would be the same values (the option to change is left to the user for altering default export or custom channels) and refers to how many records will be outputting to a single file, or in the case of Hive how many entries are sent in a single query.</p>\n<p>(<a href=\":version/reference/modelling/model-configuration\">More about Model Config</a>.)</p>\n<div class=\"ui segment info message\">\n  <h4>Required Config</h4>\n  As provided with any tutorials or default `simudyneSDK.properties` files the parameters for Hive must be set to default, and cannot function without as well having the export path, username, and password. If you are using 2.4 please make sure these parameters exist and if not in use are set to false, but they are required lookups.\n  <br />\n  As well because of the usage of the username/password you will not be able to output both to SQL and Hive by default.\n</div>\n<h2 id=\"hive-output-structure\"><a href=\"#hive-output-structure\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Hive Output Structure</h2>\n<p>As the underlying format of the Hive output is the same as a local parquet output, please refer to (<a href=\":version/reference/data_export\">Parquet Data Export</a>.) for more information on the difference between Batch and Scenario runs, and how to group by different structures. Note this will create parquet output to your Hive table in the same manner.</p>\n<p>By default Agent and Link data is not serialised, and so not output to parquet. This is to reduce the amount of data being held in memory when sending the batch results to the console. If the data is being output to parquet and does not need to be viewed on the console, the in memory data storage can be turned off allowing the Simudyne SDK to export Agent and Link data to parquet as well as the general Model data. This is done by setting the config field <code class=\"language-text\">core.return-data</code> to <code class=\"language-text\">false</code>. </p>\n<p>For large model runs that produce a lot of data, setting this config field to false will also reduce the amount of memory being held by the simulation, which can help avoid potential OutOfMemory exceptions and improve the efficiency of the model.</p>\n<p>If the data does not need to be displayed on the console, but Agent and Link data is not needed, the config fields <code class=\"language-text\">core-abm.serialize.agents</code> and <code class=\"language-text\">core-abm.serialize.links</code> should be set to false, to avoid generating uncessary data.</p>\n<h2 id=\"controlling-flush-size\"><a href=\"#controlling-flush-size\" aria-hidden=\"true\" class=\"anchor\"><svg aria-hidden=\"true\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Controlling Flush Size</h2>\n<p>While a defualt values exists for <code class=\"language-text\">core.data-export.generic-flush</code> and <code class=\"language-text\">core.data-export.values-flush</code>, control of these values is handed to users as differences in model output and machine performance can differ vastly. Effectively, this flush is what will exist in memory before being written to a file or sent to a Hive Query. </p>\n<p>While increasing this value will result in less files (subsequent files are created with the same name/run structure but with an <code class=\"language-text\">_n</code> value appended allowing further commands/scripts to either parse or combine these files) it will directly affect memory.</p>\n<p>As memory tends to be a bottleneck for larger scale simulations, you should adjust this value if you are having either failed batch runs, or are encountering issues where GC overhead limits are encountered.</p>","headings":[{"value":"Hive Parquet Export","depth":2},{"value":"Hive Output Structure","depth":2},{"value":"Controlling Flush Size","depth":2}],"frontmatter":{"title":"Hive via Parquet","toc":null,"experimental":null}},"site":{"siteMetadata":{"title":"Simudyne Docs","latestVersion":"2.6"}}},"pageContext":{"absolutePath":"/home/vsts/work/1/s/content/2.6/reference/data_export/hive.md","versioned":false,"version":"2.6","kind":"reference","pagePath":"/reference/data_export/hive","chronology":{"prev":{"name":"MySQL","path":"/reference/data_export/sql"},"next":{"name":"H2 and JDBC","path":"/reference/data_export/h2"}},"lastUpdated":"2026-04-21T13:56:54.868Z"}}