Another RDF Testing Environment Made Incredibly Simple with python
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
Matteo Lissandrini 0b46689b47
checksum queries
3 years ago
artemis adding checksum functionality 3 years ago
ci transformed into the artemis-py module 3 years ago
data initial commit; 3 years ago
docs transformed into the artemis-py module 3 years ago
queries checksum queries 3 years ago
system_files fixed installation procedure 3 years ago
tests transformed into the artemis-py module 3 years ago
.gitignore ignore jena dir 3 years ago
CHANGELOG.md transformed into the artemis-py module 3 years ago
LICENSE transformed into the artemis-py module 3 years ago
README.md fixed installation procedure 3 years ago
pytest.ini transformed into the artemis-py module 3 years ago
requirements.txt transformed into the artemis-py module 3 years ago
run.sh final queries bonsai 3 years ago
setup.py transformed into the artemis-py module 3 years ago

README.md

ARTEMIS Py

Another RDF Testing Environment Made Incredibly Simple with python

Installation

With package manager pip from pypi

Installable via pip:

pip install artemis-py

manual

  1. Clone git repo

    git clone git@bitbucket.org:aaurdfexq/artemis-py.git
    
  2. Enter directory of cloned repo

    cd artemis-py
    
  3. Create a Virtual env

    python -m venv .qenv
    source .qenv/bin/activate
    
  4. Now install package

    python setup.py install
    

For some additional tips see section below.

Configuration and Supported Systems

Currently this has been tested with:

  • AllegroGraph
  • AnzoGraph
  • GraphDB
  • Jena
  • Stardog
  • Virtuoso

This tool can be extended to other systems, it should suffice to add the appropriate enty in the config.json file.

The configuration file looks like the following:

{
   "server": "localhost",
   "port": "5820",
   "systems": {
      "graphdb": {
         "endpoint": {
            "url": "http://{server}:{port}/repositories/{name}",
            "auth:": {
               "user": "",
               "pwd": "",
               "method": ""
            }
         },
         "graph": {
            "bonsai": "",
            "dbpedia": "",
         }
      }
   },
   "datasets": {
      "bonsai": true
   }
}

Note that the config file uses python format strings, e.g., {server} is used at runtime to correctly format URLs.

For instructions on how to setup various systems we provide some instructions below

Usage

Full usage, also with artemis-run -h


Commands:
    artemis-run

Usage:

artemis-run [-h] -s SYSTEM -d DATASET [-r REPEAT] [-c CONFIG]
            [--skip-warmup] [--skip-queries] [--dry-run]

Options:

  -h, --help            show this help message and exit
  -s SYSTEM, --system SYSTEM
                        name of system to test
  -d DATASET, --dataset DATASET
                        name of dataset to test
  -r REPEAT, --repeat REPEAT
                        number of times to repeat a query
  -c CONFIG, --config CONFIG
                        config file
  --skip-warmup         skips parsing and executing warmup queries
  --skip-queries        skips parsing and executing test queries
  --dry-run             parse queries but does not execute them

Example usage

  1. Run test with Anzograph on bonsai dataset repeating each query 4 times

    artemis-run -s anzograph -d bonsai --repeat 4
    
  2. Run a dry-run, this will just check if all the config and query files are parsing correctly

    artemis-run -s anzograph -d bonsai --dry-run
    

Systems

Stardog

docker pull stardog/stardog
mkdir -p stardog/data/db_meta/

** Stardog requires a licence, usually when you run it the first time with -it **

cp system_files/stardog.system.properties stardog/data/db_meta/system.properties
cp system_files/stardog-license-key.bin stardog/
docker run -d --name=stardog \
 -v ${PWD}/stardog:/var/opt/stardog \
 -v ${PWD}/data:/data \
 -p 5820:5820 \
 -e STARDOG_SERVER_JAVA_ARGS="-Xmx64g -Xms12g -XX:MaxDirectMemorySize=2g" \
 stardog/stardog

Authentication: Stardog user is admin and password is admin

ADD Data

Bonsai

docker exec -it stardog /bin/bash

/opt/stardog/bin/stardog-admin db create \
   -v -n bonsai  @http://rdf.bonsai.uno /data/bonsai/*.ttl /data/bonsai/*.gz

Yago4

docker exec -it stardog /bin/bash

/opt/stardog/bin/stardog-admin db create \
   -v -n yago  @http://yago-knowledge.org /data/yago/import/*.nt

Configure query timout

docker stop stardog
sudo cp system_files/stardog.system.properties ./stardog/data/db_meta/stardog.properties
docker start stardog
docker exec -it stardog /bin/bash -c '/opt/stardog/bin/stardog-admin metadata set -o query.timeout=1h -- bonsai'
docker exec -it stardog /bin/bash -c '/opt/stardog/bin/stardog-admin metadata set -o query.timeout=1h -- yago'

GraphDB

git clone https://github.com/Ontotext-AD/graphdb-docker.git
cd graphdb-docker
wget  http://download.ontotext.com/owlim/e6b943ee-b176-11ea-a3a3-42843b1b6b38/graphdb-free-9.3.1-dist.zip -P free-edition/
make free VERSION=9.3.1
cd ..
mkdir -p graphdb/conf/
cp system_files/graphdb.repository-config.ttl system_files/graphdb.properties graphdb/conf/

docker run -t -d --name=graphdb \
        -v ${PWD}/graphdb:/opt/graphdb/home  \
        -v ${PWD}/data:/data \
        -p 5820:7200  \
        --entrypoint "/bin/sh" \
        ontotext/graphdb:9.3.1-free

NOTE: always restart GraphDB manually after docker stop/start

Change config based on the database name

cp system_files/graphdb.repository-config.ttl graphdb/conf/

ADD Data

docker exec -it graphdb /bin/bash
rm /opt/graphdb/dist/conf/graphdb.properties
ln -s /opt/graphdb/home/conf/graphdb.properties /opt/graphdb/dist/conf/graphdb.properties

Bonsai

preload -f -c /opt/graphdb/home/conf/graphdb.repository-config.ttl \
        /data/bonsai/*.ttl /data/bonsai/*.gz

Yago

preload -f -c /opt/graphdb/home/conf/graphdb.repository-config.ttl \
        /data/yago/import/*.nt
exit

Then restart the DB

docker exec -d graphdb \
       /opt/graphdb/dist/bin/graphdb \
       -Dgraphdb.home=/opt/graphdb/home \
       -Dgraphdb.global.page.cache=true -Xmx64g -Xms12g

sudo chmod 777 -R graphdb

Virtuoso

docker pull openlink/virtuoso-opensource-7:latest
mkdir -p virtuoso/database
cp  system_files/virtuoso.ini virtuoso/database/virtuoso.ini

mkdir -p virtuoso/settings

docker run -t -d --name virtuoso \
        -v `pwd`/virtuoso/database:/database \
        -v `pwd`/virtuoso/settings:/settings \
        --env DBA_PASSWORD=admin \
        -v `pwd`/data:/import \
        -p 1111:1111 -p 5820:8890 -i \
        openlink/virtuoso-opensource-7:latest

ADD Data

docker exec -it virtuoso /bin/bash

echo "delete from DB.DBA.load_list;" > /settings/load.isql

Bonsai

cd /import/bonsai

for i in `ls -1 --color=never`
do
echo "ld_dir ('/import/bonsai', '"${i}"', 'http://rdf.bonsai.uno');" >> /settings/load.isql
done

Yago

cd /import/yago/import

for i in `ls -1 --color=never`
do
echo "ld_dir ('/import/yago/import', '"${i}"', 'http://yago-knowledge.org');" >> /settings/load.isql
done
echo "rdf_loader_run ();" >> /settings/load.isql
echo "checkpoint;" >> /settings/load.isql

/opt/virtuoso-opensource/bin/isql exec="LOAD /settings/load.isql"
exit

Jena

docker pull --disable-content-trust stain/jena-fuseki

mkdir jena

docker run -d --name jena \
       -p 5820:3030 -e ADMIN_PASSWORD=admin -e JVM_ARGS='-Xmx64g -Xms12g -XX:MaxDirectMemorySize=2g' \
       -v`pwd`/jena:/fuseki \
       -v`pwd`/data:/staging \
       stain/jena-fuseki \
       ./fuseki-server --port=3030

Authentication: Jena requires the DIGEST HTTP method for authentication, user is admin and password is admin

ADD Data

The following is to instantiate a new database

First load the data

ARGS1='JVM_ARGS="-Xmx64g -Xms12g -XX:MaxDirectMemorySize=2g";'

**Bonsai

ARGS2='java $JVM_ARGS -cp $FUSEKI_HOME/fuseki-server.jar tdb2.tdbloader --loc $FUSEKI_BASE/databases/bonsai'
ARGS3='--loader=phased --graph="http://rdf.bonsai.uno" /staging/bonsai/*{ttl,gz}'

Yago

ARGS2='java $JVM_ARGS -cp $FUSEKI_HOME/fuseki-server.jar tdb2.tdbloader --loc $FUSEKI_BASE/databases/yago'
ARGS3='--loader=phased --graph="http://yago-knowledge.org" /staging/yago/import/*nt'

LOAD COMMAND

docker exec -it jena  /bin/bash \
    -c "${ARGS1} ${ARGS2} ${ARGS3}" 
DBNAME=bonsai
DBNAME=yago
curl 'http://localhost:5820/$/datasets' -H "Authorization: Basic $(echo -n admin:admin | base64)" \
    -H 'Content-Type: application/x-www-form-urlencoded; charset=UTF-8' --data "dbName=${DBNAME}&dbType=tdb2"

Do not use the -e FUSEKI_DATASET_1=bonsai to the docker command, that will use TDB instead of TDB2 and will make it impossible to load the data from command line.

Then stop fuseki, remove the locks and restart

docker stop jena
sudo rm jena/databases/${DBNAME}/tdb.lock  jena/databases/${DBNAME}/Data-0001/tdb.lock  jena/system/tdb.lock
docker start jena

AnzoGraph

AnzoGraph DB for commercial and non-commercial use for databases up to 16 GB, or about 100 million triples in typical use cases. AnzoGraph is ALL IN MEMORY, it requires 4x memory as size of the graph? System error. Contact Cambridge Semantics Support. Reference: Load failed: Out of memory - Insufficient memory: 65331119

docker pull cambridgesemantics/anzograph:latest
mkdir -p anzograph/persistence

docker run -t -d --name anzograph \
    -p 5820:8080 -p 443:8443  \
    -v${PWD}/anzograph/persistence:/opt/anzograph/persistence \
    -v${PWD}/data:/import  \
    -v${PWD}/system_files/anzograph.load.ttl.rq:/load.ttl.rq  \
    -v${PWD}/system_files/anzograph.load.nt.gz.rq:/load.nt.gz.rq  \
    -v${PWD}/system_files/anzograph.load.nt.rq:/load.nt.rq  \
    -v${PWD}/system_files/anzograph.license:/license  \
    cambridgesemantics/anzograph:latest

sudo chmod 777 -R anzograph

Authentication: user is admin and password is Passw0rd1

Upgrade licence

  1. Open https://customercenter.cambridgesemantics.com/products/anzograph/license.html

  2. Select licence

  3. Get local server ID:

    docker exec -it  anzograph  /bin/bash
    sleep 360
    azgctl -getlicenseid | grep 'property_license' | cut -f 2 -d' '
    
  4. Fill in the form

  5. Get link to account via email...

  6. Copy the license and overwrite it to system_files/anzograph.license

  7. Register licence (inside the previous docker exec)

     azgctl -license "$(cat /license)"
    

ADD Data

Again inside the previous docker exec

Bonsai

mkdir /data.nt.gz
mkdir /data.ttl
cp -v /import/bonsai/*.ttl /data.ttl/
cp -v /import/bonsai/*.nt.gz /data.nt.gz/

sed s@GRAPH_URI@http://rdf.bonsai.uno@ /load.ttl.rq >  /load.bonsai.ttl.rq
sed s@GRAPH_URI@http://rdf.bonsai.uno@ /load.nt.gz.rq > /load.bonsai.nt.gz.rq

azgi -f /load.bonsai.ttl.rq
azgi -f /load.bonsai.nt.gz.rq

Yago

mkdir /data.nt
cp -v /import/yago/import/*.nt /data.nt/
sed s@GRAPH_URI@http://yago-knowledge.org@ /load.nt.rq > /load.yago.nt.rq

azgi -f /load.yago.nt.rq

rm -v /data.nt/*

LOAD WITH 'global' dir:/data.ttl INTO GRAPH http://rdf.bonsai.uno


#### Notes

Anzo uses OWL information for query optimization.
For instnace, you have an ontology that claims that a predicate is a FunctionalProperty (singleton per subject), 
if in the data there are multiple values for the singleton then an error is thrown.

If you cannot fix the ontology, set 

`enable_mergejoin_unique=false` 

in the settings.conf 

additionally set

`enable_owlstats=false.`

to disable all reasoning based on OWL.



### AllegroGraph


```bash
docker pull franzinc/agraph

mkdir -p ./allegrograph/data/
mkdir -p ./allegrograph/files/

Activate Licence


cat ./system_files/allegrograph.agraph.cfg > ./allegrograph/files/agraph.cfg
cat ./system_files/allegro.licence >> ./allegrograph/files/agraph.cfg

docker run -d --name agraph \
     -v ${PWD}/allegrograph/data:/agraph/data \
     -v ${PWD}/allegrograph/files/agraph.cfg:/agraph/etc/agraph.cfg \
     -v ${PWD}/data:/import \
     -e AGRAPH_SUPER_USER=admin \
     -e AGRAPH_SUPER_PASSWORD=admin \
     -p 5785-5820:10000-10035 \
     --shm-size 1g   \
     franzinc/agraph

Add a user:

docker exec agraph  /bin/bash -c  'agtool user add admin admin'
docker exec agraph  /bin/bash -c  'agtool user permissions admin super'

ADD Data

docker stop agraph 

Bonsai

read -r -d '' CONFIG <<'EOF'
<Catalog bonsai>
  Main /agraph/data/bonsai
</Catalog>
EOF

echo "" >>  ./allegrograph/files/agraph.cfg
echo "${CONFIG}" >>  ./allegrograph/files/agraph.cfg

docker start agraph

docker exec agraph  /bin/bash -c  'agtool load bonsai /import/bonsai/*{ttl,gz}'

Yago

read -r -d '' CONFIG <<'EOF'


<Catalog bonsai>
  Main /agraph/data/yago
</Catalog>
EOF

echo "" >>  ./allegrograph/files/agraph.cfg
echo "${CONFIG}" >>  ./allegrograph/files/agraph.cfg

docker start agraph

docker exec agraph  /bin/bash -c  'agtool load yago /import/yago/import/*nt'

License triple limit (5000000) exceeded (5003555), by attempting to add 23457 triples.


Additional tips

If you are doing a manual install and encounter problems, see the following

On Ubuntu make sure to use python3 latest version of pip as well

sudo update-alternatives --install /usr/bin/python python /usr/bin/python3 10
pip install --upgrade pip

Make sure python v3 and pip v3 are used

python --version
pip --version

Install dependencies manually

pip install rdflib
pip install SPARQLWrapper