Configuration
The Configuration explains Iguana how to execute your benchmark. It is divided into 5 categories
- Connections
- Datasets
- Tasks
- Storages
- Metrics
Additionally a pre and post task script hook can be set.
The configuration has to be either in YAML or JSON. Each section will be detailed out and shows configuration examples. At the end the full configuration will be shown. For this we will stick to the YAML format, however the equivalent JSON is also valid and can be parsed by Iguana.
Connections
Every benchmark suite can execute several connections (e.g. an HTTP endpoint, or a CLI application). A connection has the following items
- name - the name you want to give the connection, which will be saved in the results.
- endpoint - the HTTP endpoint or CLI call.
- updateEndpoint - If your HTTP endpoint is an HTTP Post endpoint set this to the post endpoint. (optional)
- user - for authentication purposes (optional)
- password - for authentication purposes (optional)
- version - setting the version of the tested triplestore, if set resource URI will be ires:name-version (optional)
To setup an endpoint as well as an updateEndpoint might be confusing at first, but if you to test read and write performance simultanously and how updates might have an impact on read performance, you can set up both.
For more detail on how to setup the CLI call look at Implemented Workers. There are all CLI Workers explained and how to set the endpoint such that the application will be run correctly.
Let's look at an example:
connections:
- name: "System1"
endpoint: "http://localhost:8800/query"
version: 1.0-SNAP
- name: "System2"
endpoint: "http://localhost:8802/query"
updateEndpoint: "http://localhost:8802/update"
user: "testuser"
password: "secret"
Here we have two connections: System1 and System2. System1 is only setup to use an HTTP Get endpoint at http://localhost:8800/query. System2 however uses authentication and has an update endpoint as well, and thus will be correctly test with updates (POSTs) too.
Datasets
Pretty straight forward. You might want to test your system with different datasets (e.g. databases, triplestores etc.) If you system does not work on different datasets, just add one datasetname like
datasets:
- name: "DoesNotMatter"
otherwise you might want to benchmark different datasets. Hence you can setup a Dataset Name, as well as file. The dataset name will be added to the results, whereas both can be used in the task script hooks, to automatize dataset load into your system.
Let's look at an example:
datasets:
- name: "DatasetName"
file: "your-data-base.nt"
- name: "Dataset2"
Tasks
A Task is one benchmark Task which will be executed against all connections for all datasets. A Task might be a stresstest which we will be using in this example. Have a look at the full configuration of the Stresstest
The configuration of one Task consists of the following:
- className - The className or Shorthand
- configuration - The parameters of the task
tasks:
- className: "YourTask"
configuration:
parameter1: value1
parameter2: "value2"
Let's look at an example:
tasks:
- className: "Stresstest"
configuration:
#timeLimit is in ms
timeLimit: 3600000
queryHandler:
className: "InstancesQueryHandler"
workers:
- threads: 2
className: "SPARQLWorker"
queriesFile: "queries.txt"
timeOut: 180000
- className: "Stresstest"
configuration:
noOfQueryMixes: 1
queryHandler:
className: "InstancesQueryHandler"
workers:
- threads: 2
className: "SPARQLWorker"
queriesFile: "queries.txt"
timeOut: 180000
We configured two Tasks, both Stresstests. The first one will be executed for one hour and uses simple text queries which can be executed right away.
Further on it uses 2 simulated SPARQLWorkers with the same configuration.
At this point it's recommend to check out the Stresstest Configuration in detail for further configuration.
Storages
Tells Iguana how to save your results. Currently Iguana supports two solutions
- NTFileStorage - will save your results into one NTriple File.
- RDFFileStorage - will save your results into an RDF File (default TURTLE).
- TriplestoreStorage - Will upload the results into a specified Triplestore
This is optional. The default storage is NTFileStorage
.
NTFileStorage can be setup by just stating to use it like
storages:
- className: "NTFileStorage"
However it can be configured to use a different result file name. The default is results_{DD}-{MM}-{YYYY}_{HH}-{mm}.nt
.
See example below.
storages:
- className: "NTFileStorage"
#optional
configuration:
fileName: "results-of-my-benchmark.nt"
The RDFFileStorage is similar to the NTFileStorage but will determine the RDF format from the file extension To use RDF/XML f.e. you would end the file on .rdf, for TURTLE end it on .ttl
storages:
- className: "NTFileStorage"
#optional
configuration:
fileName: "results-of-my-benchmark.rdf"
The TriplestoreStorage can be configured as follows:
storages:
- className: TriplestoreStorage
configuration:
endpoint: "http://localhost:9999/sparql"
updateEndpoint: "http://localhost:9999/update"
if you triple store uses authentication you can set that up as follows:
storages:
- className: TriplestoreStorage
configuration:
endpoint: "http://localhost:9999/sparql"
updateEndpoint: "http://localhost:9999/update"
user: "UserName"
password: "secret"
For further detail on how to read the results have a look here
Metrics
Let's Iguana know what Metrics you want to include in the results.
Iguana supports the following metrics:
- Queries Per Second (QPS)
- Average Queries Per Second (AvgQPS)
- Query Mixes Per Hour (QMPH)
- Number of Queries successfully executed (NoQ)
- Number of Queries per Hour (NoQPH)
- Each query execution (EachQuery) - experimental
For more detail on each of the metrics have a look at Metrics
Let's look at an example:
metrics:
- className: "QPS"
- className: "AvgQPS"
- className: "QMPH"
- className: "NoQ"
- className: "NoQPH"
In this case we use all the default metrics which would be included if you do not specify metrics
in the configuration at all.
However you can also just use a subset of these like the following:
metrics:
- className: "NoQ"
- className: "AvgQPS"
For more detail on how the results will include these metrics have a look at Results.
Task script hooks
To automatize the whole benchmark workflow, you can setup a script which will be executed before each task, as well as a script which will be executed after each task.
To make it easier, the script can get the following values
- dataset.name - The current dataset name
- dataset.file - The current dataset file name if there is anyone
- connection - The current connection name
- connection.version - The current connection version, if no version is set -> {{connection.version}}
- taskID - The current taskID
You can set each one of them as an argument using brackets like {{connection}}
.
Thus you can setup scripts which will start your system and load it with the correct dataset file beforehand and stop the system after every task.
However these script hooks are completely optional.
Let's look at an example:
preScriptHook: "/full/path/{{connection}}-{{connection.version}}/load-and-start.sh {{dataset.file}}"
postScriptHook: "/full/path/{{connection}}/stop.sh"
Full Example
connections:
- name: "System1"
endpoint: "http://localhost:8800/query"
- name: "System2"
endpoint: "http://localhost:8802/query"
updateEndpoint: "http://localhost:8802/update"
user: "testuser"
password: "secret"
datasets:
- name: "DatasetName"
file: "your-data-base.nt"
- name: "Dataset2"
tasks:
- className: "Stresstest"
configuration:
#timeLimit is in ms
timeLimit: 3600000
queryHandler:
className: "InstancesQueryHandler"
workers:
- threads: 2
className: "SPARQLWorker"
queriesFile: "queries.txt"
timeOut: 180000
- className: "Stresstest"
configuration:
noOfQueryMixes: 1
queryHandler:
className: "InstancesQueryHandler"
workers:
- threads: 2
className: "SPARQLWorker"
queriesFile: "queries.txt"
timeOut: 180000
preScriptHook: "/full/path/{{connection}}/load-and-start.sh {{dataset.file}}"
postScriptHook: "/full/path/{{connection}}/stop.sh"
metrics:
- className: "QMPH"
- className: "QPS"
- className: "NoQPH"
- className: "NoQ"
- className: "AvgQPS"
storages:
- className: "NTFileStorage"
#optional
- configuration:
fileName: "results-of-my-benchmark.nt"
Shorthand
A shorthand is a short name for a class in Iguana which can be used in the configuration instead of the complete class name: e.g. instead of
storages:
- className: "org.aksw.iguana.rp.storage.impl.NTFileStorage"
you can use the shortname NTFileStorage:
storages:
- className: "NTFileStorage"
For a full map of the Shorthands have a look at Shorthand-Mapping