Source code for the Knowledge Layer concerned with data from Nordjyske Medier.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Zaph 9874438f7d
Update generate-cov.ps1
2 years ago
environment Final commit 2 years ago
extractor Fixed an issue with word freq 2 years ago
knox_util Fixed small bug in overriden print() function, now actually prints the content of the message variable instead of printing "message", when parser arguments are undefined. 2 years ago
labels Merge branch 'master' into develop 2 years ago
loader Removed imports and added better description 2 years ago
preproc Removed imports and added better description 2 years ago
rdf Removed imports and added better description 2 years ago
rest Refactored the 2 functions "send_triples_to_db" and "send_word_count_to_db" into a single function as they were identical to the point that the difference were in the parameter names and the documentation. 2 years ago
tests Fixed a test 2 years ago
turtleParser prefix -> @prefix 2 years ago
.gitattributes Added separate vector paths 2 years ago
.gitignore Removed imports and added better description 2 years ago
.ontology.ttl Fixed the ontology file so its correct and parsable 2 years ago
.travis.yml Updated Travis configuration to the new repo with knox modules 2 years ago
LICENSE Initial commit 3 years ago
Makefile created initial makefile 2 years ago
README.md Updated readme 2 years ago
app.py Fixed an issue with word freq 2 years ago
generate-cov.ps1 Update generate-cov.ps1 2 years ago
kgDisplay.py Displaying RDF tuples in a KG. 3 years ago
logging.conf Added commit version 2 years ago
pytest.ini Added more information to pytest.ini 3 years ago
requirements_dev.txt - Updated the requirements_dev.txt to include the word frequency from group D 2 years ago
requirements_prod.txt Updated the requirements for installing the prod environment 2 years ago
sample.env Final commit 2 years ago
test.txt Added group 1 json example 2 years ago

README.md

P5-Project

Repository covering the knowledge layer for group Knox18

Installing dependencies

Requires Python 3.8.x (64-Bit)

  1. Install the requirements:
    • python3 -m pip install -r requirements.txt
  2. Install a model for spacy
    • Small (16MB) - python3 -m spacy download da_core_news_sm
      • Does not contain word vectors
    • Medium (46MB) - python3 -m spacy download da_core_news_md
    • REQUIRED - Large (546MB) - python3 -m spacy download da_core_news_lg

Adding Knox specific packages

In order to install Knox specific packages in your projects, you must first add the knox package repository to your pip indexes. The index can either be specified on every pip install, or be configured in a configuration file.

To simply specify an extra index on pip install, run the the following command:

pip install --extra-index-url https://repos.knox.aau.dk your packages here

To add the repostory to your repository indexes, paste the following into your pip configuration file

[global]
extra-index-url = https://repos.knox.cs.aau.dk/

Cofiguration files (Windows)

For Windows the following configuration files should be available. If they are not available you can create them.

  • ~/pip/pip.ini
  • ./venv/pip.ini

Configuration files (Linux)

For Linux the following configuration files should be available. If they are not available you can create them.

  • ~/.pip/pip.conf
  • ./venv/pip.conf

Env variables

In the root of the project a .env file should be created. This contains all the environment/configuration values for the python program. In the .env the variables are defined as so:
variable="value"
An example is: RDF_OUTPUT_FOLDER="./rdf_output/"

Variable Name Description
INPUT_DIRECTORY The relative path (from project root) or absolute path to where the publication files are located, need to be suffixed with an /
OUTPUT_DIRECTORY The relative path (from project root) or absolute path to where the processed publications will be moved, need to be suffixed with an /
ERROR_DIRECTORY The relative path (from project root) or absolute path to where the files raising an exception will be moved, need to be suffixed with an /
RDF_OUTPUT_FOLDER The relative path (from project root) or absolute path to where the outputted RDF file will be created, need to be suffixed with an /
OUTPUT_FORMAT The format for the generated RDF file, example is "turtle", possible formats can be found at "https://rdflib.readthedocs.io/en/stable/plugin_parsers.html"
OUTPUT_FILE_NAME The file name of the RDF output
ONTOLOGY_FILEPATH The path to the ontology file
TRIPLE_DATA_ENDPOINT The triple data endpoint for the REST API for the Data Layer, needs the whole information as string, example: "http://127.0.0.1:8080/update"
WORD_COUNT_DATA_ENDPOINT The word count endpoint for the REST API for the Data Layer, needs the whole information as string, example: "http://127.0.0.1:8080/wordCountData"
``