|
3 years ago | |
---|---|---|
artemis | 3 years ago | |
ci | 3 years ago | |
data | 3 years ago | |
docs | 3 years ago | |
queries | 3 years ago | |
system_files | 3 years ago | |
tests | 3 years ago | |
.gitignore | 3 years ago | |
CHANGELOG.md | 3 years ago | |
LICENSE | 3 years ago | |
README.md | 3 years ago | |
pytest.ini | 3 years ago | |
requirements.txt | 3 years ago | |
run.sh | 3 years ago | |
setup.py | 3 years ago |
README.md
ARTEMIS Py
Another RDF Testing Environment Made Incredibly Simple with python
Installation
With package manager pip from pypi
Installable via pip
:
pip install artemis-py
manual
-
Clone git repo
git clone git@bitbucket.org:aaurdfexq/artemis-py.git
-
Enter directory of cloned repo
cd artemis-py
-
Create a Virtual env
python -m venv .qenv source .qenv/bin/activate
-
Now install package
python setup.py install
For some additional tips see section below.
Configuration and Supported Systems
Currently this has been tested with:
- AllegroGraph
- AnzoGraph
- GraphDB
- Jena
- Stardog
- Virtuoso
This tool can be extended to other systems, it should suffice to add the appropriate enty in the config.json
file.
The configuration file looks like the following:
{
"server": "localhost",
"port": "5820",
"systems": {
"graphdb": {
"endpoint": {
"url": "http://{server}:{port}/repositories/{name}",
"auth:": {
"user": "",
"pwd": "",
"method": ""
}
},
"graph": {
"bonsai": "",
"dbpedia": "",
}
}
},
"datasets": {
"bonsai": true
}
}
Note that the config file uses python format strings, e.g., {server}
is used at runtime to correctly format URLs.
For instructions on how to setup various systems we provide some instructions below
Usage
Full usage, also with artemis-run -h
Commands:
artemis-run
Usage:
artemis-run [-h] -s SYSTEM -d DATASET [-r REPEAT] [-c CONFIG]
[--skip-warmup] [--skip-queries] [--dry-run]
Options:
-h, --help show this help message and exit
-s SYSTEM, --system SYSTEM
name of system to test
-d DATASET, --dataset DATASET
name of dataset to test
-r REPEAT, --repeat REPEAT
number of times to repeat a query
-c CONFIG, --config CONFIG
config file
--skip-warmup skips parsing and executing warmup queries
--skip-queries skips parsing and executing test queries
--dry-run parse queries but does not execute them
Example usage
-
Run test with Anzograph on
bonsai
dataset repeating each query 4 timesartemis-run -s anzograph -d bonsai --repeat 4
-
Run a dry-run, this will just check if all the config and query files are parsing correctly
artemis-run -s anzograph -d bonsai --dry-run
Systems
Stardog
docker pull stardog/stardog
mkdir -p stardog/data/db_meta/
** Stardog requires a licence, usually when you run it the first time with -it
**
cp system_files/stardog.system.properties stardog/data/db_meta/system.properties
cp system_files/stardog-license-key.bin stardog/
docker run -d --name=stardog \
-v ${PWD}/stardog:/var/opt/stardog \
-v ${PWD}/data:/data \
-p 5820:5820 \
-e STARDOG_SERVER_JAVA_ARGS="-Xmx64g -Xms12g -XX:MaxDirectMemorySize=2g" \
stardog/stardog
Authentication: Stardog user is admin
and password is admin
ADD Data
Bonsai
docker exec -it stardog /bin/bash
/opt/stardog/bin/stardog-admin db create \
-v -n bonsai @http://rdf.bonsai.uno /data/bonsai/*.ttl /data/bonsai/*.gz
Yago4
docker exec -it stardog /bin/bash
/opt/stardog/bin/stardog-admin db create \
-v -n yago @http://yago-knowledge.org /data/yago/import/*.nt
Configure query timout
docker stop stardog
sudo cp system_files/stardog.system.properties ./stardog/data/db_meta/stardog.properties
docker start stardog
docker exec -it stardog /bin/bash -c '/opt/stardog/bin/stardog-admin metadata set -o query.timeout=1h -- bonsai'
docker exec -it stardog /bin/bash -c '/opt/stardog/bin/stardog-admin metadata set -o query.timeout=1h -- yago'
GraphDB
git clone https://github.com/Ontotext-AD/graphdb-docker.git
cd graphdb-docker
wget http://download.ontotext.com/owlim/e6b943ee-b176-11ea-a3a3-42843b1b6b38/graphdb-free-9.3.1-dist.zip -P free-edition/
make free VERSION=9.3.1
cd ..
mkdir -p graphdb/conf/
cp system_files/graphdb.repository-config.ttl system_files/graphdb.properties graphdb/conf/
docker run -t -d --name=graphdb \
-v ${PWD}/graphdb:/opt/graphdb/home \
-v ${PWD}/data:/data \
-p 5820:7200 \
--entrypoint "/bin/sh" \
ontotext/graphdb:9.3.1-free
NOTE: always restart GraphDB manually after docker stop/start
Change config based on the database name
cp system_files/graphdb.repository-config.ttl graphdb/conf/
ADD Data
docker exec -it graphdb /bin/bash
rm /opt/graphdb/dist/conf/graphdb.properties
ln -s /opt/graphdb/home/conf/graphdb.properties /opt/graphdb/dist/conf/graphdb.properties
Bonsai
preload -f -c /opt/graphdb/home/conf/graphdb.repository-config.ttl \
/data/bonsai/*.ttl /data/bonsai/*.gz
Yago
preload -f -c /opt/graphdb/home/conf/graphdb.repository-config.ttl \
/data/yago/import/*.nt
exit
Then restart the DB
docker exec -d graphdb \
/opt/graphdb/dist/bin/graphdb \
-Dgraphdb.home=/opt/graphdb/home \
-Dgraphdb.global.page.cache=true -Xmx64g -Xms12g
sudo chmod 777 -R graphdb
Virtuoso
docker pull openlink/virtuoso-opensource-7:latest
mkdir -p virtuoso/database
cp system_files/virtuoso.ini virtuoso/database/virtuoso.ini
mkdir -p virtuoso/settings
docker run -t -d --name virtuoso \
-v `pwd`/virtuoso/database:/database \
-v `pwd`/virtuoso/settings:/settings \
--env DBA_PASSWORD=admin \
-v `pwd`/data:/import \
-p 1111:1111 -p 5820:8890 -i \
openlink/virtuoso-opensource-7:latest
ADD Data
docker exec -it virtuoso /bin/bash
echo "delete from DB.DBA.load_list;" > /settings/load.isql
Bonsai
cd /import/bonsai
for i in `ls -1 --color=never`
do
echo "ld_dir ('/import/bonsai', '"${i}"', 'http://rdf.bonsai.uno');" >> /settings/load.isql
done
Yago
cd /import/yago/import
for i in `ls -1 --color=never`
do
echo "ld_dir ('/import/yago/import', '"${i}"', 'http://yago-knowledge.org');" >> /settings/load.isql
done
echo "rdf_loader_run ();" >> /settings/load.isql
echo "checkpoint;" >> /settings/load.isql
/opt/virtuoso-opensource/bin/isql exec="LOAD /settings/load.isql"
exit
Jena
docker pull --disable-content-trust stain/jena-fuseki
mkdir jena
docker run -d --name jena \
-p 5820:3030 -e ADMIN_PASSWORD=admin -e JVM_ARGS='-Xmx64g -Xms12g -XX:MaxDirectMemorySize=2g' \
-v`pwd`/jena:/fuseki \
-v`pwd`/data:/staging \
stain/jena-fuseki \
./fuseki-server --port=3030
Authentication: Jena requires the DIGEST
HTTP
method for authentication, user is admin
and password is admin
ADD Data
The following is to instantiate a new database
First load the data
ARGS1='JVM_ARGS="-Xmx64g -Xms12g -XX:MaxDirectMemorySize=2g";'
**Bonsai
ARGS2='java $JVM_ARGS -cp $FUSEKI_HOME/fuseki-server.jar tdb2.tdbloader --loc $FUSEKI_BASE/databases/bonsai'
ARGS3='--loader=phased --graph="http://rdf.bonsai.uno" /staging/bonsai/*{ttl,gz}'
Yago
ARGS2='java $JVM_ARGS -cp $FUSEKI_HOME/fuseki-server.jar tdb2.tdbloader --loc $FUSEKI_BASE/databases/yago'
ARGS3='--loader=phased --graph="http://yago-knowledge.org" /staging/yago/import/*nt'
LOAD COMMAND
docker exec -it jena /bin/bash \
-c "${ARGS1} ${ARGS2} ${ARGS3}"
DBNAME=bonsai
DBNAME=yago
curl 'http://localhost:5820/$/datasets' -H "Authorization: Basic $(echo -n admin:admin | base64)" \
-H 'Content-Type: application/x-www-form-urlencoded; charset=UTF-8' --data "dbName=${DBNAME}&dbType=tdb2"
Do not use the -e FUSEKI_DATASET_1=bonsai
to the docker command, that will use TDB instead of TDB2
and will make it impossible to load the data from command line.
Then stop fuseki, remove the locks and restart
docker stop jena
sudo rm jena/databases/${DBNAME}/tdb.lock jena/databases/${DBNAME}/Data-0001/tdb.lock jena/system/tdb.lock
docker start jena
AnzoGraph
AnzoGraph DB for commercial and non-commercial use for databases up to 16 GB, or about 100 million triples in typical use cases. AnzoGraph is ALL IN MEMORY, it requires 4x memory as size of the graph? System error. Contact Cambridge Semantics Support. Reference: Load failed: Out of memory - Insufficient memory: 65331119
docker pull cambridgesemantics/anzograph:latest
mkdir -p anzograph/persistence
docker run -t -d --name anzograph \
-p 5820:8080 -p 443:8443 \
-v${PWD}/anzograph/persistence:/opt/anzograph/persistence \
-v${PWD}/data:/import \
-v${PWD}/system_files/anzograph.load.ttl.rq:/load.ttl.rq \
-v${PWD}/system_files/anzograph.load.nt.gz.rq:/load.nt.gz.rq \
-v${PWD}/system_files/anzograph.load.nt.rq:/load.nt.rq \
-v${PWD}/system_files/anzograph.license:/license \
cambridgesemantics/anzograph:latest
sudo chmod 777 -R anzograph
Authentication: user is admin
and password is Passw0rd1
Upgrade licence
-
Open https://customercenter.cambridgesemantics.com/products/anzograph/license.html
-
Select licence
-
Get local server ID:
docker exec -it anzograph /bin/bash sleep 360 azgctl -getlicenseid | grep 'property_license' | cut -f 2 -d' '
-
Fill in the form
-
Get link to account via email...
-
Copy the license and overwrite it to
system_files/anzograph.license
-
Register licence (inside the previous
docker exec
)azgctl -license "$(cat /license)"
ADD Data
Again inside the previous docker exec
Bonsai
mkdir /data.nt.gz
mkdir /data.ttl
cp -v /import/bonsai/*.ttl /data.ttl/
cp -v /import/bonsai/*.nt.gz /data.nt.gz/
sed s@GRAPH_URI@http://rdf.bonsai.uno@ /load.ttl.rq > /load.bonsai.ttl.rq
sed s@GRAPH_URI@http://rdf.bonsai.uno@ /load.nt.gz.rq > /load.bonsai.nt.gz.rq
azgi -f /load.bonsai.ttl.rq
azgi -f /load.bonsai.nt.gz.rq
Yago
mkdir /data.nt
cp -v /import/yago/import/*.nt /data.nt/
sed s@GRAPH_URI@http://yago-knowledge.org@ /load.nt.rq > /load.yago.nt.rq
azgi -f /load.yago.nt.rq
rm -v /data.nt/*
LOAD WITH 'global' dir:/data.ttl INTO GRAPH http://rdf.bonsai.uno
#### Notes
Anzo uses OWL information for query optimization.
For instnace, you have an ontology that claims that a predicate is a FunctionalProperty (singleton per subject),
if in the data there are multiple values for the singleton then an error is thrown.
If you cannot fix the ontology, set
`enable_mergejoin_unique=false`
in the settings.conf
additionally set
`enable_owlstats=false.`
to disable all reasoning based on OWL.
### AllegroGraph
```bash
docker pull franzinc/agraph
mkdir -p ./allegrograph/data/
mkdir -p ./allegrograph/files/
Activate Licence
cat ./system_files/allegrograph.agraph.cfg > ./allegrograph/files/agraph.cfg
cat ./system_files/allegro.licence >> ./allegrograph/files/agraph.cfg
docker run -d --name agraph \
-v ${PWD}/allegrograph/data:/agraph/data \
-v ${PWD}/allegrograph/files/agraph.cfg:/agraph/etc/agraph.cfg \
-v ${PWD}/data:/import \
-e AGRAPH_SUPER_USER=admin \
-e AGRAPH_SUPER_PASSWORD=admin \
-p 5785-5820:10000-10035 \
--shm-size 1g \
franzinc/agraph
Add a user:
docker exec agraph /bin/bash -c 'agtool user add admin admin'
docker exec agraph /bin/bash -c 'agtool user permissions admin super'
ADD Data
docker stop agraph
Bonsai
read -r -d '' CONFIG <<'EOF'
<Catalog bonsai>
Main /agraph/data/bonsai
</Catalog>
EOF
echo "" >> ./allegrograph/files/agraph.cfg
echo "${CONFIG}" >> ./allegrograph/files/agraph.cfg
docker start agraph
docker exec agraph /bin/bash -c 'agtool load bonsai /import/bonsai/*{ttl,gz}'
Yago
read -r -d '' CONFIG <<'EOF'
<Catalog bonsai>
Main /agraph/data/yago
</Catalog>
EOF
echo "" >> ./allegrograph/files/agraph.cfg
echo "${CONFIG}" >> ./allegrograph/files/agraph.cfg
docker start agraph
docker exec agraph /bin/bash -c 'agtool load yago /import/yago/import/*nt'
License triple limit (5000000) exceeded (5003555), by attempting to add 23457 triples.
Additional tips
If you are doing a manual install and encounter problems, see the following
On Ubuntu make sure to use python3 latest version of pip as well
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3 10
pip install --upgrade pip
Make sure python v3 and pip v3 are used
python --version
pip --version
Install dependencies manually
pip install rdflib
pip install SPARQLWrapper