Quick start#
Let us assume we want to perform scalability experiments for a very simple model. This allows us to focus on the experiment itself rather than the model. We will skip over details for now, but provide links to reference pages with more information.
Performing scalability experiments for a LUE computation translating rasters involves the following steps:
Determine a good array partition shape
Determine the strong or weak scalability efficiencies
Performing scalability experiments involves a lot of book keeping. A model must be run several times and the time it stakes to perform all computations has to be saved for post-processing. Information to save includes the amount of hardware used, the shape of the arrays, the shape of the partitions, and the time it took. To help with this administration we will use the lue_scalability.py command[1].
Note
This quick start illustrates performing scalability experiments using a LUE model implemented using the Python language bindings, but everything works similar when using a model implemented using the C++ API.
For simplicity we will be performing scalability experiments over the CPU cores within a single desktop computer here, but everything works similar when performing experiments over cluster nodes in a multi-node computer cluster.
Model#
The model we will be using here is Conway’s Game of Life. A version of the model that is capable of saving results as rasters is available in the LUE tutorial repository. For this quick start, we simplified that model to be (only) useful in scalability experiments. Here is the complete working model:
1#!/usr/bin/env python
2import sys
3from pathlib import Path
4
5import docopt
6import numpy as np
7
8import lue.framework as lfr
9import lue.qa.scalability.instrument as lqi
10
11
12def initialize_generation(array_shape, partition_shape):
13 generation = lfr.create_array(
14 array_shape=array_shape,
15 partition_shape=partition_shape,
16 dtype=np.dtype(np.float32),
17 fill_value=0,
18 )
19 fraction_alive_cells = 0.25
20 generation = lfr.uniform(generation, np.float32, 0, 1) <= fraction_alive_cells
21
22 return generation
23
24
25def next_generation(generation):
26 kernel = np.array(
27 [
28 [1, 1, 1],
29 [1, 0, 1],
30 [1, 1, 1],
31 ],
32 dtype=np.uint8,
33 )
34 nr_alive_cells = lfr.focal_sum(generation, kernel)
35
36 # Next state of currently alive cells
37 underpopulated = nr_alive_cells < 2
38 overpopulated = nr_alive_cells > 3
39
40 # Next state of currently dead cells
41 reproducing = nr_alive_cells == 3
42
43 generation = lfr.where(
44 generation,
45 # True if alive and not under/overpopulated
46 ~(underpopulated | overpopulated),
47 # True if dead with three neighbours
48 reproducing,
49 )
50
51 return generation
52
53
54@lfr.runtime_scope
55def game_of_life_scalability(
56 count: int,
57 nr_workers: int,
58 array_shape: tuple[int, int],
59 partition_shape: tuple[int, int],
60 nr_generations: int,
61 result_pathname: str,
62) -> None:
63
64 generation = initialize_generation(array_shape, partition_shape)
65 lfr.wait(generation)
66
67 experiment = lqi.ArrayExperiment(nr_workers, array_shape, partition_shape)
68 experiment.start()
69
70 for _ in range(count):
71 run = lqi.Run()
72
73 run.start()
74
75 for _ in range(1, nr_generations + 1):
76 generation = next_generation(generation)
77
78 lfr.wait(generation)
79 run.stop()
80
81 experiment.add(run)
82
83 experiment.stop()
84 lqi.save_results(experiment, result_pathname)
85
86
87def parse_shape(string):
88 string = string.strip("[]")
89 return tuple(int(extent) for extent in string.split(","))
90
91
92def main():
93 usage = """\
94Calculate the generations of alive cells according to the Game of Life
95cellular automaton
96
97Usage:
98 {command}
99 --lue:count=<count> --lue:nr_workers=<nr_workers>
100 --lue:array_shape=<array_shape> --lue:partition_shape=<partition_shape>
101 --lue:result=<result>
102 <nr_generations>
103
104Options:
105 <nr_generations> Number of Game of Life generations to calculate
106""".format(
107 command=Path(sys.argv[0]).name
108 )
109
110 # Filter out arguments meant for the HPX runtime
111 argv = [arg for arg in sys.argv[1:] if not arg.startswith("--hpx")]
112 arguments = docopt.docopt(usage, argv)
113 nr_generations = int(arguments["<nr_generations>"])
114 count = int(arguments["--lue:count"])
115 nr_workers = int(arguments["--lue:nr_workers"])
116 array_shape = parse_shape(arguments["--lue:array_shape"])
117 partition_shape = parse_shape(arguments["--lue:partition_shape"])
118 result_pathname = arguments["--lue:result"]
119
120 assert nr_generations >= 0, nr_generations
121 assert count > 0, count
122 assert nr_workers > 0, nr_workers
123
124 game_of_life_scalability(
125 count,
126 nr_workers,
127 array_shape,
128 partition_shape,
129 nr_generations,
130 result_pathname,
131 )
132
133
134if __name__ == "__main__":
135 main()
A few changes were made to the original model. First an import statement was added for the
scalability.instrument
sub-package. The code in this sub-package simplifies adhering to the requirements
lue_scalability.py poses on a model.
Next, the part of the model containing the code we want to measure the scalability of was identified. This is
the code between the run.start()
and run.wait()
calls. The time it takes to execute
the statements between those calls ends up in the result file, and is used for further processing. It is
important that time spent on code that should not be part of the measurements has finished executing before
calling run.start()
and that time spent on code that should be part of the measurements has
finished executing before calling run.end()
.
An experiment involves measuring the time it takes to perform one or more model runs. All measurements end up
in the result file. More information about the scalability.instrument
sub-package can be found in the
associated reference pages.
Configuration#
Platform#
The lue_scalability.py command requires information about the platform used to perform the experiments on. In
this example we will be using a platform called orkney, which is a desktop computer. Its configuration can be
found in the lue_qa repository, in
configuration/orkney/platform.json
.
{
"name": "orkney",
"scheduler": {
"kind": "shell"
},
"nr_cluster_nodes": 1,
"cluster_node": {
"nr_packages": 1,
"package": {
"nr_numa_nodes": 1,
"numa_node": {
"memory": 128,
"nr_cores": 12,
"core": {
"nr_threads": 2
}
}
}
}
}
The terminology used in describing a cluster node is inspired by the Portable Hardware Locality (hwloc) software package.
In case of multi-node computer clusters, scheduler can be “slurm” and several settings can be added as well, like the name of the partition to use. The lue_qa repository contains examples for those as well.
Worker#
Since we are using the orkney desktop computer as our platform here, the worker we are interested in here is
the CPU core. The worker configuration file can be found in the lue_qa repository, in
configuration/orkney/core_numa_node/{partition_shape,strong_scalability,weak_scalability}.json
. For each
kind of experiment (partition shape, strong scalability, weak scalability), a worker configuration can be
provided. In the case of the strong and weak scalability experiments it is not unusual for the configurations
to be the same, and a symbolic link can be used to refer to a single file containing this information.
Example configuration for the partition shape experiment:
{
"scenario": "core_numa_node",
"count": 3,
"locality_per": "numa_node",
"worker": {
"type": "thread",
"pool": {
"size": 12
}
}
}
Example configuration for the strong and weak scalability experiments:
{
"scenario": "core_numa_node",
"count": 3,
"locality_per": "numa_node",
"worker": {
"type": "thread",
"pool": {
"min_size": 1,
"max_size": 12,
"incrementor": 1
}
}
}
The count is the number of times an experimental run should be repeated. Doing it more than once allows for the inspection of the spread in the resulting scalability metrics.
Experiment#
For each kind of experiment we must create a configuration file containing the settings lue_scalability.py
must use. For the command to be able to find the correct configuration file, these files must be stored in a
directory named after the the platform, kind of worker, command and kind of experiment. For the Game of Life
model, example configuration files can be found in the lue_qa repository, in
experiment/configuration/orkney/game_of_life.py/core_numa_node
.
Example configuration for the partition shape experiment:
{
"array": {
"shape": [10000, 10000]
},
"partition": {
"shape": [500, 500],
"range": {
"max_nr_elements": 1250000,
"multiplier": 1.5,
"method": "linear"
}
},
"arguments": {
"positionals": [
],
"options": {
}
}
}
Experiment#
Partition shape#
Given the above configuration files we can start the scalability experiments. The first thing to do is to determine a good partition shape to use for the strong and weak scalability experiments. For this we can call lue_scalability.py with the prefixes of the paths containing the cluster, worker, and experiment configuration files. Additionally, we need to tell it that we want to perform a partition shape experiment. This results in a command like this:
# 50 Is a model-specific argument. In this case the number of generations
# that we want the model to simulate
lue_scalability.py \
partition_shape script game_of_life.py "50" orkney cpu_core \
./lue_qa/configuration ./lue_qa/experiment /tmp/scalability
This command will create a Bash script for us, containing all the commands that need to be executed to end up
with enough information to be able to select a good partition shape. The name of the script will be printed.
Its name is based on the information passed in:
<result_prefix>/<cluster_name>/<worker_name>/<command_name>/<experiment_name>.sh
.
Once the commands from this Bash script have finished executing, all results must be aggregated into a single
LUE dataset called scalability.lue
, using the same command as before, but using the import stage name
instead of script:
lue_scalability.py \
partition_shape import game_of_life.py "50" orkney cpu_core \
./lue_qa/configuration ./lue_qa/experiment /tmp/scalability
The last step of this experiment involves postprocessing the results. This results in various statistics ending up in the same LUE dataset as created in the previous step, as well as in a PDF containing a graph showing the latencies of performing the experimental runs for different partition shapes. For this we can again use the same command as before, but using the postprocess stage name instead of import:
lue_scalability.py \
partition_shape postprocess game_of_life.py "50" orkney cpu_core \
./lue_qa/configuration ./lue_qa/experiment /tmp/scalability
The experiment is now finished and we can inspect the plot in plot.pdf
to visually inspect the results. We
are interested in a partition shape for which the latency is the lowest. In general we will have multiple
options that have good (low) latencies.
Strong scalability#
Performing a strong scalability experiment involves calling the same commands as in the partition shape experiment, but now using the strong_scalability experiment name instead of partition_shape:
lue_scalability.py \
strong_scalability script game_of_life.py "50" orkney cpu_core \
./lue_qa/configuration ./lue_qa/experiment /tmp/scalability
lue_scalability.py \
strong_scalability import game_of_life.py "50" orkney cpu_core \
./lue_qa/configuration ./lue_qa/experiment /tmp/scalability
lue_scalability.py \
strong_scalability postprocess game_of_life.py "50" orkney cpu_core \
./lue_qa/configuration ./lue_qa/experiment /tmp/scalability
Both model users and LUE developers can be quite happy with these results. The model can actually use additional hardware to speed up the model, but there is still a challenge left for the developers to look into.
Weak scalability#
Performing a weak scalability experiment involves calling the same commands as in the partition shape and strong scalability experiment, but now using the weak_scalability experiment name:
lue_scalability.py \
weak_scalability script game_of_life.py "50" orkney cpu_core \
./lue_qa/configuration ./lue_qa/experiment /tmp/scalability
lue_scalability.py \
weak_scalability import game_of_life.py "50" orkney cpu_core \
./lue_qa/configuration ./lue_qa/experiment /tmp/scalability
lue_scalability.py \
weak_scalability postprocess game_of_life.py "50" orkney cpu_core \
./lue_qa/configuration ./lue_qa/experiment /tmp/scalability
Given these results, model users can make good use of additional hardware to process larger datasets. As we learned already, there is still some room left for improvement.
Bash script#
To make it easy to start a scalability experiment for other models and on different platforms, below is the
Bash script used to obtain the results presented above. To be able to run it the pathname of the directory of
the command used in the experiment (game_of_life.py
in this case) has to be added to $PATH
(and LUE has to
be installed, obviously).
#!/usr/bin/env bash
set -euo pipefail
platform_config_prefix="$HOME/development/project/github/computational_geography/lue_qa/configuration"
experiment_config_prefix="$HOME/development/project/github/computational_geography/lue_qa/experiment/configuration"
result_prefix="/tmp/scalability"
command_name="game_of_life.py"
nr_generations=50
command_arguments="$nr_generations"
platform_name="orkney"
worker_name="core_numa_node"
experiment_names="
partition_shape
strong_scalability
weak_scalability
"
for experiment_name in $experiment_names;
do
echo $experiment_name
lue_scalability.py \
$experiment_name script $command_name "$command_arguments" \
$platform_name $worker_name \
$platform_config_prefix $experiment_config_prefix $result_prefix
script="$result_prefix/$platform_name/$command_name/$worker_name/$experiment_name.sh"
bash $script
lue_scalability.py \
$experiment_name import $command_name "$command_arguments" \
$platform_name $worker_name \
$platform_config_prefix $experiment_config_prefix $result_prefix
lue_scalability.py \
$experiment_name postprocess $command_name "$command_arguments" \
$platform_name $worker_name \
$platform_config_prefix $experiment_config_prefix $result_prefix
done