Last updated on 16th July 2024
There are some caveats when working with a distributed graph that may show up in your model. As well the command and configuration you send to your spark-submit command can be tuned to improve performance by ensuring you are making efficient usage if your cluster. The below sections are not required when moving from local to distributed, but based on your model setup if you start encountering issues these are likely to be the reasons for errors in your model.
It's recommended for you to view Apache Spark's site for info on what Spark tuning is and how to use it.
Because messages will be passed across the network, they need to be serializable. This leads to some restrictions like if you are using the java Optional
.
If you need to use them, you will need to copy the source in order to be able to make it implement Serializable
. Here is an example with java OptionalDouble
:
public class OptDouble implements Serializable {
private static final OptDouble EMPTY = new OptDouble();
private final boolean isPresent;
private final double value;
private OptDouble() {
isPresent = false;
this.value = Double.NaN;
}
private OptDouble(double value) {
this.isPresent = true;
this.value = value;
}
public static OptDouble of(double input) {
return new OptDouble(input);
}
public static OptDouble empty() {
return EMPTY;
}
public double getAsDouble() {
if (!isPresent) {
throw new NoSuchElementException("No value present");
}
return value;
}
public boolean isPresent() {
return isPresent;
}
@Override
public int hashCode() {
return isPresent ? Double.hashCode(value) : 0;
}
@Override
public String toString() {
return isPresent ? String.format("OptDouble[%s]", value) : "OptDouble.empty";
}
}
When running in a distributed fashion, any agent can be anywhere in the environment. Consequently, passing agents around will create errors.
If your programs shows lots of java.io.NotSerializableException
errors, these are probably caused by improper use of the java keyword this
.
The correct way to handle any sharing of data is to send messages.
For example you need to prevent passing reference when initializing Agents. This is an example of a incorrect piece of code, and its correction:
// will throw a serialization error, Model reference is passed
class Model {
int field = 1;
public void setup() {
Group<SomeAgent> someAgentGroup =
generateGroup(SomeAgent.class, count,
agent -> { agent.field = field; });
}
}
// a single value is passed, not the model reference
class Model {
int field = 1;
public void setup() {
int localField = field;
Group<SomeAgent> someAgentGroup =
generateGroup(SomeAgent.class, count,
agent -> { agent.field = localField; });
}
}
The difference in the example above is that the field
variable in the first Model is in fact this.field
, this
being the Model. this
will pass the entire model to the function and tries to serialize it.
The second Model avoids this by copying the needed variable to a local value in the Model#setup function.
This also can happen when using a function instead of variables:
Group<RandomAgent> randomAgentGroup =
generateGroup(RandomAgent.class, count,
agent -> {
SeededRandom prng = agent.getPrng();
agent.random = getRandomSample(prng, 0, 100);
});
// will throw an error
double getRandomSample(SeededRandom prng, double min, double max) {
return prng.uniform(min, max).sample();
}
// will behave as expected
static double getRandomSample(SeededRandom prng, double min, double max) {
return prng.uniform(min, max).sample();
}
Because the non-static method is part of the class, it will force the system to serialize the entire Model in order to be able to access this function. If the method is static, it does not depend on an instance of the Model and can be called safely from a distributed environment.
Agent-Based Model contains a GlobalState that allows you to make global variables available to all agents. At their creation, usually at Model#Setup(), agents are distributed and given a copy of the GlobalState. After this distribution, the global state should not be updated. There is currently no consistency mechanism to ensure updates on GlobalState are carried over the network. Consequently, all GlobalStates' update should be done at the beginning (or before) the Model#setup call, to ensure the consistency of the model across all nodes.