Queries

Benchmarks often involve running a series of queries against a database and measuring their performances. The query handler in Iguana is responsible for loading and selecting queries for the benchmarking process.

Inside the stresstest task, the query handler is configured with the queries property. Every worker instance of the same worker configuration will use the same query handler. The queries property is an object that contains the following properties:

property	required	default	description	example
path	yes		The path to the queries. It can be a file or a folder.	`./example/suite/queries/`
format	no	`one-per-line`	The format of the queries.	`folder` or `separator` or `one-per-line`
separator	no	`""`	The separator that should be used if the format is set to `separator`.	`\n###\n`
caching	no	`true`	If set to `true`, the queries will be cached into memory. If set to `false`, the queries will be read from the file system every time they are needed.	`false`
order	no	`linear`	The order in which the queries are executed. If set to `linear` the queries will be executed in their order inside the file. If `format` is set to `folder`, queries will be sorted by their file name first.	`random` or `linear`
seed	no	`0`	The seed for the random number generator that selects the queries. If multiple workers use the same query handler, their seed will be the sum of the given seed and their worker id.	`12345`
lang	no	`SPARQL`	Not used for anything at the moment.
template	no		If set, queries from `path` will be treated as query templates. See Query Templates for more information.

Format

One-per-line

The one-per-line format is the default format. In this format, every query is written in a single line inside one file.

In this example, the queries are written in a single file, each query in a single line:

SELECT DISTINCT * WHERE { ?s ?p ?o }
SELECT DISTINCT ?s ?p ?o WHERE { ?s ?p ?o }

Folder

It is possible to write every query in a separate file and put them all in a folder. Queries will be sorted by their file name before they are read.

In this example, the queries are written in separate files inside the folder ./example/suite/queries/:

./example/suite/queries/
├── query1.txt
└── query2.txt

The file query1.txt contains the following query:

SELECT DISTINCT * 
WHERE { 
    ?s ?p ?o 
}

The file query2.txt contains the following query:

SELECT DISTINCT ?s ?p ?o 
WHERE { 
    ?s ?p ?o 
}

Separator

It is possible to write every query in a single file and separate them with a separator. The separator can be set with the separator property. Iguana will then split the file into queries based on the separator. If the separator property is set to an empty string "" (default) the queries will be separated by an empty line. The separator string can also contain escape sequences like \n or \t.

In this example, the queries inside this file are separated by a line consisting of the string ###:

SELECT DISTINCT * 
WHERE { 
    ?s ?p ?o 
}
###
SELECT DISTINCT ?s ?p ?o 
WHERE { 
    ?s ?p ?o 
}

The separator property should be set to "\n###\n". (be aware of different line endings on different operating systems)

Huge Query Strings

When working with large queries (Queries that are larger than 2³¹ Bytes or ~2GB), it is important to consider that only the request types post query and update query support large queries.

Example

tasks:
  - type: "stresstest"
    workers:
    - type: "SPARQLProtocolWorker"
      queries:
        path: "./example/suite/queries.txt"
        format: "separator"
        separator: "\n###\n"
        caching: false
        order: "random"
        seed: 12345
        lang: "SPARQL"
      # ... additional worker properties

Query Templates

Query templates are queries containing placeholders for some terms. Replacement candidates are identified by querying a given endpoint. This is done in a way that the resulting queries will yield results against endpoints with the same data.

The placeholders are written in the form of %%[a-zA-Z0-9_]+%%, which means that any character sequence consisting of letters, numbers, and underscores, enclosed by %% will be interpreted as a placeholder. The query templates originated from WatDiv, where the placeholders are of similar form. If the placeholder name is equal to a variable name in the query, the placeholder will not be assigned the same variable name during candidate generation.

Query templates and normal queries can be mixed in the same file or folder.

An exemplary template: SELECT * WHERE {?s %%var1%% ?o . ?o <http://exa.com> %%var2%%}

This template will then be converted to: SELECT ?var1 ?var2 WHERE {?s ?var1 ?o . ?o <http://exa.com> ?var2}

The SELECT query will then be requested from the given sparql endpoint (e.g DBpedia). The solutions for this query are used to instantiate the template. The results may look like the following: - SELECT * WHERE {?s <http://prop/1> ?o . ?o <http://exa.com> "123"} - SELECT * WHERE {?s <http://prop/1> ?o . ?o <http://exa.com> "12"} - SELECT * WHERE {?s <http://prop/2> ?o . ?o <http://exa.com> "1234"}

Configuration

The template attribute has the following properties:

property	required	default	description	example
endpoint	yes		The endpoint to query.	`http://dbpedia.org/sparql`
limit	no	`2000`	The maximum number of instances per query template.	`100`
save	no	`true`	If set to `true`, query instances will be saved in a separate file.	`false`

If the save attribute is set to true, the instances will be saved in a separate file in the same directory as the query templates. If the query templates are stored in a folder, the instances will be saved in the parent directory.

Example of query configuration with query templates:

queries:
  path: "./example/suite/queries/"
  format: "folder" 
  template:
    endpoint: "http://dbpedia.org/sparql"
    limit: 100
    save: true