Source code for the Knowledge Layer concerned with data from Nordjyske Medier.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Zaph 9874438f7d
Update generate-cov.ps1
1 year ago
environment Final commit 1 year ago
extractor Fixed an issue with word freq 1 year ago
knox_util Fixed small bug in overriden print() function, now actually prints the content of the message variable instead of printing "message", when parser arguments are undefined. 1 year ago
labels Merge branch 'master' into develop 1 year ago
loader Removed imports and added better description 1 year ago
preproc Removed imports and added better description 1 year ago
rdf Removed imports and added better description 1 year ago
rest Refactored the 2 functions "send_triples_to_db" and "send_word_count_to_db" into a single function as they were identical to the point that the difference were in the parameter names and the documentation. 1 year ago
tests Fixed a test 1 year ago
turtleParser prefix -> @prefix 1 year ago
.gitattributes Added separate vector paths 1 year ago
.gitignore Removed imports and added better description 1 year ago
.ontology.ttl Fixed the ontology file so its correct and parsable 1 year ago
.travis.yml Updated Travis configuration to the new repo with knox modules 1 year ago
LICENSE Initial commit 1 year ago
Makefile created initial makefile 1 year ago
README.md Updated readme 1 year ago
app.py Fixed an issue with word freq 1 year ago
generate-cov.ps1 Update generate-cov.ps1 1 year ago
kgDisplay.py Displaying RDF tuples in a KG. 1 year ago
logging.conf Added commit version 1 year ago
pytest.ini Added more information to pytest.ini 1 year ago
requirements_dev.txt - Updated the requirements_dev.txt to include the word frequency from group D 1 year ago
requirements_prod.txt Updated the requirements for installing the prod environment 1 year ago
sample.env Final commit 1 year ago
test.txt Added group 1 json example 1 year ago

README.md

P5-Project

Repository covering the knowledge layer for group Knox18

Installing dependencies

Requires Python 3.8.x (64-Bit)

  1. Install the requirements:
    • python3 -m pip install -r requirements.txt
  2. Install a model for spacy
    • Small (16MB) - python3 -m spacy download da_core_news_sm
      • Does not contain word vectors
    • Medium (46MB) - python3 -m spacy download da_core_news_md
    • REQUIRED - Large (546MB) - python3 -m spacy download da_core_news_lg

Adding Knox specific packages

In order to install Knox specific packages in your projects, you must first add the knox package repository to your pip indexes. The index can either be specified on every pip install, or be configured in a configuration file.

To simply specify an extra index on pip install, run the the following command:

pip install --extra-index-url https://repos.knox.aau.dk your packages here

To add the repostory to your repository indexes, paste the following into your pip configuration file

[global]
extra-index-url = https://repos.knox.cs.aau.dk/

Cofiguration files (Windows)

For Windows the following configuration files should be available. If they are not available you can create them.

  • ~/pip/pip.ini
  • ./venv/pip.ini

Configuration files (Linux)

For Linux the following configuration files should be available. If they are not available you can create them.

  • ~/.pip/pip.conf
  • ./venv/pip.conf

Env variables

In the root of the project a .env file should be created. This contains all the environment/configuration values for the python program. In the .env the variables are defined as so:
variable="value”
An example is: RDF_OUTPUT_FOLDER="./rdf_output/"

Variable Name Description
INPUT_DIRECTORY The relative path (from project root) or absolute path to where the publication files are located, need to be suffixed with an /
OUTPUT_DIRECTORY The relative path (from project root) or absolute path to where the processed publications will be moved, need to be suffixed with an /
ERROR_DIRECTORY The relative path (from project root) or absolute path to where the files raising an exception will be moved, need to be suffixed with an /
RDF_OUTPUT_FOLDER The relative path (from project root) or absolute path to where the outputted RDF file will be created, need to be suffixed with an /
OUTPUT_FORMAT The format for the generated RDF file, example is “turtle”, possible formats can be found at “https://rdflib.readthedocs.io/en/stable/plugin_parsers.html
OUTPUT_FILE_NAME The file name of the RDF output
ONTOLOGY_FILEPATH The path to the ontology file
TRIPLE_DATA_ENDPOINT The triple data endpoint for the REST API for the Data Layer, needs the whole information as string, example: “http://127.0.0.1:8080/update
WORD_COUNT_DATA_ENDPOINT The word count endpoint for the REST API for the Data Layer, needs the whole information as string, example: “http://127.0.0.1:8080/wordCountData
``