Configuration
The configuration file for a benchmark suite can either be .yaml
-file or a .json
-file.
YAML is recommended and all examples will be presented as YAML.
Example
The following example shows a basic configuration for a benchmark suite as an introduction.
dataset:
- name: "sp2b" # for documentation purposes
connections:
- name: "fuseki"
endpoint: "http://localhost:3030/sparql"
dataset: "sp2b"
tasks:
- type: "stresstest" # stresstest the endpoint
workers:
- type: "SPARQLProtocolWorker" # this worker type sends SPARQL queries over HTTP with the SPARQL protocol
number: 2 # generate 2 workers with the same configuration
connection: "fuseki" # the endpoint to which the workers are sending the queries to
queries:
path: "./example/suite/queries.txt" # the file with the queries
format: "one-per-line" # the format of the queries
completionTarget:
number: 1 # each worker stops after executing all queries once
timeout: "3 min" # a query will time out after 3 minutes
acceptHeader: "application/sparql-results+json" # the expected content type of the HTTP response (HTTP Accept header)
parseResults: false
# calculate queries per second only for successful queries and the queries per second with a penalty for failed queries
metrics:
- type: "PQPS"
penalty: 180000 # in milliseconds (3 minutes)
- type: "QPS"
# store the results in an n-triples file and in CSV files
storages:
- type: "rdf file"
path: "./results/result.nt"
- type: "csv file"
directory: "./results/"
This configuration defines a benchmark suite that stresstests a triplestore with two workers.
The triplestore is named fuseki
and is located at http://localhost:3030/sparql
.
The dataset, that is used for the benchmark, is named sp2b
.
During the stresstest the workers will send SPARQL queries
that are located in the file ./example/suite/queries.txt
to the triplestore.
They will stop after they have executed all queries once, which is defined by the completionTarget
-property.
After the queries have been executed, two metrics are calculated based on the results.
The first metric is the PQPS
-metric, which calculates the queries per second with a penalty for failed queries.
The second metric is the QPS
-metric, which calculates the queries per second only for successful queries.
The results are stored in an RDF file at ./results/result.nt
and in CSV files in the directory ./results/
.
Structure
The configuration file consists of the following six sections: - Datasets - Connections - Tasks - Response-Body-Processors - Metrics - Storages
Each section holds an array of their respective items. Each item type will be defined further in this documentation. The order of the sections is not important. The general structure of a suite configuration may look like this:
tasks:
- # item 1
- # item 2
- # ...
storages:
- # item 1
- # item 2
- # ...
datasets:
- # item 1
- # item 2
- # ...
connections:
- # item 1
- # item 2
- # ...
responseBodyProcessors:
- # item 1
- # item 2
- # ...
metrics:
- # item 1
- # item 2
- # ...
Durations
Durations are used to define time spans in the configuration.
They can be used for the timeout
-property of the workers or the response body processors or for the completionTarget
-property of the tasks.
Duration values can be defined as a XSD duration string or as a string with a number and a unit.
The following units are supported:
- s
or sec
or secs
for seconds
- m
or min
or mins
for minutes
- h
or hr
or hrs
for hours
- d
or day
or days
for days
Some examples for duration values:
timeout: "2S" # 2 seconds
timeout: "10s" # 10 seconds
timeout: "PT10S" # 10 seconds
Tasks
The tasks are the core of the benchmark suite.
They define the actual process of the benchmarking suite
and are executed from top to bottom in the order they are defined in the configuration.
At the moment, the stresstest
is the only implemented task.
The stresstest
-task queries specified endpoints with the given queries and evaluates the performance of the endpoint
by measuring the time each query execution took.
After the execution of the queries, the task calculates the required metrics based on the measurements.
The tasks are explained in more detail in the Tasks documentation.
Storages
The storages define where and how the results of the benchmarking suite are stored.
There are three types of storages that are supported at the moment:
- rdf file
- csv file
- triplestore
Each storage type will be explained in more detail in the Storages documentation.
Datasets
The datasets that have been used for the benchmark can be defined here. Right now, this is only used for documentation purposes. For example, you might want to know which dataset was loaded into a triplestore at the time a stresstest was executed.
The datasets are therefore later on referenced in the connections
-property
to document which dataset has been loaded into which endpoint.
Properties
Each dataset entry has the following properties:
property | required | description | example |
---|---|---|---|
name | yes | This is a descriptive name for the dataset. | "sp2b" |
file | no | File path of the dataset. (not used for anything at the moment) | "./datasets/sp2b.nt" |
Example
datasets:
- name: "sp2b"
file: "./datasets/sp2b.nt"
connections:
- name: "fuseki"
endpoint: "https://localhost:3030/query"
dataset: "sp2b"
As already mentioned, the datasets
-property is only used for documentation.
The information about the datasets will be stored in the results.
For the csv storage, the above configuration might result with the following task-configuration.csv
-file:
taskID | connection | version | dataset |
---|---|---|---|
http://iguana-benchmark.eu/resource/1699354119-3273189568/0 | fuseki | v2 | sp2b |
The resulting triples for the rdf file storage might look like this:
ires:fuseki a iont:Connection ;
rdfs:label "fuseki" ;
iprop:dataset ires:sp2b .
ires:sp2b a iont:Dataset ;
rdfs:label "sp2b" .
Connections
The connections are used to define the endpoints for the triplestores.
The defined connections can later be used in the tasks
-configuration
to specify the endpoints for the benchmarking process.
Properties
property | required | description | example |
---|---|---|---|
name | yes | This is a descriptive name for the connection. (needs to be unique) | "fuseki" |
version | no | This serves to document the version of the connection. It has no functional property. |
"v1.0.1" |
dataset | no | This serves to document the dataset, that has been loaded into the specified connection. It has no functional property. (needs to reference an already defined dataset in datasets ) |
"sp2b" |
endpoint | yes | An URI at which the endpoint is located. | "http://localhost:3030/query" |
authentication | no | Basic authentication data for the connection. | see below |
updateEndpoint | no | An URI at which an additional update-endpoint might be located. This is useful for triplestores that have separate endpoints for update queries. |
"http://localhost:3030/update" |
updateAuthentication | no | Basic Authentication data for the updateEndpoint. | see below |
Iguana only supports the HTTP basic authentication for now. The authentication properties are objects that are defined as follows:
property | required | description | example |
---|---|---|---|
user | yes | The user name. | "admin" |
password | yes | The password of the user. | "password" |
Example
datasets:
- name: "wikidata"
connections:
- name: "fuseki"
endpoint: "https://localhost:3030/query"
- name: "tentris"
version: "v0.4.0"
dataset: "wikidata" # needs to reference an existing definition in datasets
endpoint: "https://localhost:9080/query"
authentication:
user: "admin"
password: "password"
updateEndpoint: "https://localhost:8080/update"
updateAuthentication:
user: "updateUser"
password: "123"
Response-Body-Processor
The response body processors are used
to process the response bodies that are received for each query from the benchmarked endpoints.
The processors extract relevant information from the response bodies and store them in the results.
Processors are defined by the content type of the response body they process.
At the moment, only the application/sparql-results+json
content type is supported.
The response body processors are explained in more detail in the Response-Body-Processor documentation.
Metrics
Metrics are used to compare the performance of the benchmarked endpoints. The metrics are calculated from the results of the benchmarking tasks. Depending on the type of the metric, they are calculated for each query, for each worker, or for the whole task.
Each metric will be explained in more detail in the Metrics documentation.
Basic Example
datasets:
- name: "sp2b"
connections:
- name: "fuseki"
dataset: "sp2b"
endpoint: "http://localhost:3030/sp2b"
tasks:
- type: "stresstest"
workers:
- number: 2
type: "SPARQLProtocolWorker"
parseResults: true
acceptHeader: "application/sparql-results+json"
queries:
path: "./example/suite/queries/"
format: "folder"
completionTarget:
number: 1
connection: "fuseki"
timeout: "2S"
responseBodyProcessors:
- contentType: "application/sparql-results+json"
threads: 1
metrics:
- type: "PQPS"
penalty: 100
- type: "QPS"
storages:
- type: "rdf file"
path: "./results/result.nt"
- type: "csv file"
directory: "./results/"