lodstorage package

Submodules

lodstorage.csv module

class lodstorage.csv.CSV(name)[source]

Bases: LOD

helper for converting data in csv format to list of dicts (LoD) and vice versa

Constructor

static fixTypes(lod: list)[source]

fixes the types of the given LoD.

static fromCSV(csvString: str, fields: Optional[list] = None, delimiter=',', quoting=2, **kwargs)[source]

convert given csv string to list of dicts (LOD)

Parameters
  • csvStr (str) – csv string that should be converted to LOD

  • headerNames (list) – Names of the headers that should be used. If None it is assumed that the header is given.

Returns

list of dicts (LoD) containing the content of the given csv string

static readFile(filename: str) str[source]

Reads the given filename and returns it as string :param filename: Name of the file that should be returned as string

Returns

Content of the file as string

static restoreFromCSVFile(filePath: str, headerNames: Optional[list] = None, withPostfix: bool = False)[source]

restore LOD from given csv file

Parameters
  • filePath (str) – file name

  • headerNames (list) – Names of the headers that should be used. If None it is assumed that the header is given.

  • withPostfix (bool) – If False the file type is appended to given filePath. Otherwise file type MUST be given with filePath.

Returns

list of dicts (LoD) containing the content of the given csv file

static storeToCSVFile(lod: list, filePath: str, withPostfix: bool = False)[source]

converts the given lod to CSV file.

Parameters
  • lod (list) – lod that should be converted to csv file

  • filePath (str) – file name the csv should be stored to

  • withPostfix (bool) – If False the file type is appended to given filePath. Otherwise file type MUST be given with filePath.

Returns

csv string of the given lod

static toCSV(lod: list, includeFields: Optional[list] = None, excludeFields: Optional[list] = None, delimiter=',', quoting=2, **kwargs)[source]

converts the given lod to CSV string. For details about the csv dialect parameters see https://docs.python.org/3/library/csv.html#csv-fmt-params

Parameters
  • lod (list) – lod that should be converted to csv string

  • includeFields (list) – list of fields that should be included in the csv (positive list)

  • excludeFields (list) – list of fields that should be excluded from the csv (negative list)

  • kwargs – csv dialect parameters

Returns

csv string of the given lod

static writeFile(content: str, filename: str) str[source]

Write the given str to the given filename :param content: string that should be written into the file :type content: str :param filename: Name of the file the given str should be written to

Returns

Nothing

lodstorage.entity module

Created on 2020-08-19

@author: wf

class lodstorage.entity.EntityManager(name, entityName, entityPluralName: str, listName: Optional[str] = None, clazz=None, tableName: Optional[str] = None, primaryKey: Optional[str] = None, config=None, handleInvalidListTypes=False, filterInvalidListTypes=False, listSeparator='⇹', debug=False)[source]

Bases: YamlAbleMixin, JsonPickleMixin, JSONAbleList

generic entity manager

Constructor

Parameters
  • name (string) – name of this eventManager

  • entityName (string) – entityType to be managed e.g. Country

  • entityPluralName (string) – plural of the the entityType e.g. Countries

  • config (StorageConfig) – the configuration to be used if None a default configuration will be used

  • handleInvalidListTypes (bool) – True if invalidListTypes should be converted or filtered

  • filterInvalidListTypes (bool) – True if invalidListTypes should be deleted

  • listSeparator (str) – the symbol to use as a list separator

  • debug (boolean) – override debug setting when default of config is used via config=None

fromCache(force: bool = False, getListOfDicts=None, append=False, sampleRecordCount=- 1)[source]

get my entries from the cache or from the callback provided

Parameters
  • force (bool) – force ignoring the cache

  • getListOfDicts (callable) – a function to call for getting the data

  • append (bool) – True if records should be appended

  • sampleRecordCount (int) – the number of records to analyze for type information

Returns

the list of Dicts and as a side effect setting self.cacheFile

fromStore(cacheFile=None, setList: bool = True) list[source]

restore me from the store :param cacheFile: the cacheFile to use if None use the pre configured cachefile :type cacheFile: String :param setList: if True set my list with the data from the cache file :type setList: bool

Returns

list of dicts or JSON entitymanager

Return type

list

getCacheFile(config=None, mode=StoreMode.SQL)[source]

get the cache file for this event manager :param config: if None get the cache for my mode :type config: StorageConfig :param mode: the storeMode to use :type mode: StoreMode

getLoD()[source]

Return the LoD of the entities in the list

Returns

a list of Dicts

Return type

list

getSQLDB(cacheFile)[source]

get the SQL database for the given cacheFile

Parameters

cacheFile (string) – the file to get the SQL db from

initSQLDB(sqldb, listOfDicts=None, withCreate: bool = True, withDrop: bool = True, sampleRecordCount=- 1)[source]

initialize my sql DB

Parameters
  • listOfDicts (list) – the list of dicts to analyze for type information

  • withDrop (boolean) – true if the existing Table should be dropped

  • withCreate (boolean) – true if the create Table command should be executed - false if only the entityInfo should be returned

  • sampleRecordCount (int) – the number of records to analyze for type information

Returns

the entity information such as CREATE Table command

Return type

EntityInfo

isCached()[source]

check whether there is a file containing cached data for me

removeCacheFile()[source]

remove my cache file

setNone(record, fields)[source]

make sure the given fields in the given record are set to none :param record: the record to work on :type record: dict :param fields: the list of fields to set to None :type fields: list

showProgress(msg)[source]

display a progress message

Parameters

msg (string) – the message to display

store(limit=10000000, batchSize=250, append=False, fixNone=True, sampleRecordCount=- 1) str[source]

store my list of dicts

Parameters
  • limit (int) – maximum number of records to store per batch

  • batchSize (int) – size of batch for storing

  • append (bool) – True if records should be appended

  • fixNone (bool) – if True make sure the dicts are filled with None references for each record

  • sampleRecordCount (int) – the number of records to analyze for type information

Returns

The cachefile being used

Return type

str

storeLoD(listOfDicts, limit=10000000, batchSize=250, cacheFile=None, append=False, fixNone=True, sampleRecordCount=1) str[source]

store my entities

Parameters
  • listOfDicts (list) – the list of dicts to store

  • limit (int) – maximum number of records to store

  • batchSize (int) – size of batch for storing

  • cacheFile (string) – the name of the storage e.g path to JSON or sqlite3 file

  • append (bool) – True if records should be appended

  • fixNone (bool) – if True make sure the dicts are filled with None references for each record

  • sampleRecordCount (int) – the number of records to analyze for type information

Returns

The cachefile being used

Return type

str

storeMode()[source]

return my store mode

lodstorage.jsonable module

This module has a class JSONAble for serialization of tables/list of dicts to and from JSON encoding

Created on 2020-09-03

@author: wf

class lodstorage.jsonable.JSONAble[source]

Bases: object

mixin to allow classes to be JSON serializable see

Constructor

asJSON(asString=True, data=None)[source]

recursively return my dict elements

Parameters

asString (boolean) – if True return my result as a string

checkExtension(jsonFile: str, extension: str = '.json') str[source]

make sure the jsonFile has the given extension e.g. “.json”

Parameters

jsonFile (str) – the jsonFile name - potentially without “.json” suffix

Returns

the jsonFile name with “.json” as an extension guaranteed

Return type

str

fromDict(data: dict)[source]

initialize me from the given data

Parameters

data (dict) – the dictionary to initialize me from

fromJson(jsonStr)[source]

initialize me from the given JSON string

Parameters

jsonStr (str) – the JSON string

getJSONValue(v)[source]

get the value of the given v as JSON

Parameters

v (object) – the value to get

Returns

the the value making sure objects are return as dicts

getJsonTypeSamples()[source]

does my class provide a “getSamples” method?

static getJsonTypeSamplesForClass(cls)[source]

return the type samples for the given class

Returns

a list of dict that specify the types by example

Return type

list

classmethod getPluralname()[source]
static readJsonFromFile(jsonFilePath)[source]

read json string from the given jsonFilePath

Parameters

jsonFilePath (string) – the path of the file where to read the result from

Returns

the JSON string read from the file

reprDict(srcDict)[source]

get the given srcDict as new dict with fields being converted with getJSONValue

Parameters

scrcDict (dict) – the source dictionary

Returns

dict: the converted dictionary

restoreFromJsonFile(jsonFile: str)[source]

restore me from the given jsonFile

Parameters

jsonFile (string) – the jsonFile to restore me from

static singleQuoteToDoubleQuote(singleQuoted, useRegex=False)[source]

convert a single quoted string to a double quoted one

Parameters
  • singleQuoted (str) –

    a single quoted string e.g.

    {‘cities’: [{‘name’: “Upper Hell’s Gate”}]}

  • useRegex (boolean) – True if a regular expression shall be used for matching

Returns

the double quoted version of the string

Return type

string

static singleQuoteToDoubleQuoteUsingBracketLoop(singleQuoted)[source]

convert a single quoted string to a double quoted one using a regular expression

Parameters
  • singleQuoted (string) – a single quoted string e.g. {‘cities’: [{‘name’: “Upper Hell’s Gate”}]}

  • useRegex (boolean) – True if a regular expression shall be used for matching

Returns

the double quoted version of the string e.g.

Return type

string

static singleQuoteToDoubleQuoteUsingRegex(singleQuoted)[source]

convert a single quoted string to a double quoted one using a regular expression

Parameters
  • singleQuoted (string) – a single quoted string e.g. {‘cities’: [{‘name’: “Upper Hell’s Gate”}]}

  • useRegex (boolean) – True if a regular expression shall be used for matching

Returns

the double quoted version of the string e.g.

Return type

string

static storeJsonToFile(jsonStr, jsonFilePath)[source]

store the given json string to the given jsonFilePath

Parameters
  • jsonStr (string) – the string to store

  • jsonFilePath (string) – the path of the file where to store the result

storeToJsonFile(jsonFile: str, extension: str = '.json', limitToSampleFields: bool = False)[source]

store me to the given jsonFile

Parameters
  • jsonFile (str) – the JSON file name (optionally without extension)

  • exension (str) – the extension to use if not part of the jsonFile name

  • limitToSampleFields (bool) – If True the returned JSON is limited to the attributes/fields that are present in the samples. Otherwise all attributes of the object will be included. Default is False.

toJSON(limitToSampleFields: bool = False)[source]
Parameters

limitToSampleFields (bool) – If True the returned JSON is limited to the attributes/fields that are present in the samples. Otherwise all attributes of the object will be included. Default is False.

Returns

a recursive JSON dump of the dicts of my objects

toJsonAbleValue(v)[source]

return the JSON able value of the given value v :param v: the value to convert :type v: object

class lodstorage.jsonable.JSONAbleList(listName: Optional[str] = None, clazz=None, tableName: Optional[str] = None, initList: bool = True, handleInvalidListTypes=False, filterInvalidListTypes=False)[source]

Bases: JSONAble

Container class

Constructor

Parameters
  • listName (str) – the name of the list attribute to be used for storing the List

  • clazz (class) – a class to be used for Object relational mapping (if any)

  • tableName (str) – the name of the “table” to be used

  • initList (bool) – True if the list should be initialized

  • handleInvalidListTypes (bool) – True if invalidListTypes should be converted or filtered

  • filterInvalidListTypes (bool) – True if invalidListTypes should be deleted

asJSON(asString=True)[source]

recursively return my dict elements

Parameters

asString (boolean) – if True return my result as a string

fromJson(jsonStr, types=None)[source]

initialize me from the given JSON string

Parameters
  • jsonStr (str) – the JSON string

  • fixType (Types) – the types to be fixed

fromLoD(lod, append: bool = True, debug: bool = False)[source]

load my entityList from the given list of dicts

Parameters
  • lod (list) – the list of dicts to load

  • append (bool) – if True append to my existing entries

Returns

a list of errors (if any)

Return type

list

getJsonData()[source]

get my Jsondata

getList()[source]

get my list

getLoDfromJson(jsonStr: str, types=None, listName: Optional[str] = None)[source]

get a list of Dicts form the given JSON String

Parameters
  • jsonStr (str) – the JSON string

  • fixType (Types) – the types to be fixed

Returns

a list of dicts

Return type

list

getLookup(attrName: str, withDuplicates: bool = False)[source]

create a lookup dictionary by the given attribute name

Parameters
  • attrName (str) – the attribute to lookup

  • withDuplicates (bool) – whether to retain single values or lists

Returns

a dictionary for lookup or a tuple dictionary,list of duplicates depending on withDuplicates

readLodFromJsonFile(jsonFile: str, extension: str = '.json')[source]

read the list of dicts from the given jsonFile

Parameters

jsonFile (string) – the jsonFile to read from

Returns

a list of dicts

Return type

list

readLodFromJsonStr(jsonStr) list[source]

restore me from the given jsonStr

Parameters

storeFilePrefix (string) – the prefix for the JSON file name

restoreFromJsonFile(jsonFile: str) list[source]

read my list of dicts and restore it

restoreFromJsonStr(jsonStr: str) list[source]

restore me from the given jsonStr

Parameters

jsonStr (str) – the json string to restore me from

setListFromLoD(lod: list) list[source]

set my list from the given list of dicts

Parameters

lod (list) –

Returns

a list of dicts if no clazz is set

otherwise a list of objects

Return type

list

toJsonAbleValue(v)[source]

make sure we don’t store our meta information clazz, tableName and listName but just the list we are holding

class lodstorage.jsonable.JSONAbleSettings[source]

Bases: object

settings for JSONAble - put in a separate class so they would not be serialized

indent = 4

regular expression to be used for conversion from singleQuote to doubleQuote see https://stackoverflow.com/a/50257217/1497139

singleQuoteRegex = re.compile("(?<!\\\\)'")
class lodstorage.jsonable.Types(name: str, warnOnUnsupportedTypes=True, debug=False)[source]

Bases: JSONAble

holds entity meta Info

Variables

name(string) – entity name = table name

Constructor

Parameters
  • name (str) – the name of the type map

  • warnOnUnsupportedTypes (bool) – if TRUE warn if an item value has an unsupported type

  • debug (bool) – if True - debugging information should be shown

addType(listName, field, valueType)[source]

add the python type for the given field to the typeMap

Parameters
  • listName (string) – the name of the list of the field

  • field (string) – the name of the field

  • valueType (type) – the python type of the field

fixListOfDicts(typeMap, listOfDicts)[source]

fix the type in the given list of Dicts

fixTypes(lod: list, listName: str)[source]

fix the types in the given data structure

Parameters
  • lod (list) – a list of dicts

  • listName (str) – the types to lookup by list name

static forTable(instance, listName: str, warnOnUnsupportedTypes: bool = True, debug=False)[source]

get the types for the list of Dicts (table) in the given instance with the given listName :param instance: the instance to inspect :type instance: object :param listName: the list of dicts to inspect :type listName: string :param warnOnUnsupportedTypes: if TRUE warn if an item value has an unsupported type :type warnOnUnsupportedTypes: bool :param debug: True if debuggin information should be shown :type debug: bool

Returns

a types object

Return type

Types

getType(typeName)[source]

get the type for the given type name

getTypes(listName: str, sampleRecords: list, limit: int = 10)[source]

determine the types for the given sample records

Parameters
  • listName (str) – the name of the list

  • sampleRecords (list) – a list of items

  • limit (int) – the maximum number of items to check

getTypesForItems(listName: str, items: list, warnOnNone: bool = False)[source]

get the types for the given items side effect is setting my types

Parameters
  • listName (str) – the name of the list

  • items (list) – a list of items

  • warnOnNone (bool) – if TRUE warn if an item value is None

typeName2Type = {'bool': <class 'bool'>, 'date': <class 'datetime.date'>, 'datetime': <class 'datetime.datetime'>, 'float': <class 'float'>, 'int': <class 'int'>, 'str': <class 'str'>}

lodstorage.jsonpicklemixin module

class lodstorage.jsonpicklemixin.JsonPickleMixin[source]

Bases: object

allow reading and writing derived objects from a jsonpickle file

asJsonPickle() str[source]

convert me to JSON

Returns

a JSON String with my JSON representation

Return type

str

static checkExtension(jsonFile: str, extension: str = '.json') str[source]

make sure the jsonFile has the given extension e.g. “.json”

Parameters

jsonFile (str) – the jsonFile name - potentially without “.json” suffix

Returns

the jsonFile name with “.json” as an extension guaranteed

Return type

str

debug = False
static readJsonPickle(jsonFileName, extension='.jsonpickle')[source]
Parameters
  • jsonFileName (str) – name of the file (optionally without “.json” postfix)

  • extension (str) – default file extension

writeJsonPickle(jsonFileName: str, extension: str = '.jsonpickle')[source]

write me to the json file with the given name (optionally without postfix)

Parameters
  • jsonFileName (str) – name of the file (optionally without “.json” postfix)

  • extension (str) – default file extension

lodstorage.lod module

Created on 2021-01-31

@author: wf

class lodstorage.lod.LOD(name)[source]

Bases: object

list of Dict aka Table

Constructor

static addLookup(lookup, duplicates, record, value, withDuplicates: bool)[source]

add a single lookup result

Parameters
  • lookup (dict) – the lookup map

  • duplicates (list) – the list of duplicates

  • record (dict) – the current record

  • value (object) – the current value to lookup

  • withDuplicates (bool) – if True duplicates should be allowed and lists returned if False a separate duplicates

  • created (list is) –

static filterFields(lod: list, fields: list, reverse: bool = False)[source]

filter the given LoD with the given list of fields by either limiting the LoD to the fields or removing the fields contained in the list depending on the state of the reverse parameter

Parameters
  • lod (list) – list of dicts from which the fields should be excluded

  • fields (list) – list of fields that should be excluded from the lod

  • reverse (bool) – If True limit dict to the list of given fields. Otherwise exclude the fields from the dict.

Returns

LoD

static getFields(listOfDicts, sampleCount: Optional[int] = None)[source]
static getLookup(lod: list, attrName: str, withDuplicates: bool = False)[source]

create a lookup dictionary by the given attribute name for the given list of dicts

Parameters
  • lod (list) – the list of dicts to get the lookup dictionary for

  • attrName (str) – the attribute to lookup

  • withDuplicates (bool) – whether to retain single values or lists

Returns

a dictionary for lookup

classmethod handleListTypes(lod, doFilter=False, separator=',')[source]

handle list types in the given list of dicts

Parameters
  • cls – this class

  • lod (list) – a list of dicts

  • doFilter (bool) – True if records containing lists value items should be filtered

  • separator (str) – the separator to use when converting lists

static intersect(listOfDict1, listOfDict2, key=None)[source]

get the intersection of the two lists of Dicts by the given key

static setNone(record, fields)[source]

make sure the given fields in the given record are set to none :param record: the record to work on :type record: dict :param fields: the list of fields to set to None :type fields: list

static setNone4List(listOfDicts, fields)[source]

set the given fields to None for the records in the given listOfDicts if they are not set :param listOfDicts: the list of records to work on :type listOfDicts: list :param fields: the list of fields to set to None :type fields: list

static sortKey(d, key=None)[source]

get the sort key for the given dict d with the given key

lodstorage.mwTable module

Created on 2020-08-21

@author: wf

class lodstorage.mwTable.MediaWikiTable(wikiTable=True, colFormats=None, sortable=True, withNewLines=False)[source]

Bases: object

helper for https://www.mediawiki.org/wiki/Help:Tables

Constructor

addHeader(record)[source]

add the given record as a “sample” header

addRow4Dict(record)[source]
asWikiMarkup()[source]

convert me to MediaWiki markup

Returns

the MediWiki Markup for this table

Return type

string

fromListOfDicts(listOfDicts)[source]
noneReplace(value)[source]

lodstorage.plot module

Created on 2020-07-05

@author: wf

class lodstorage.plot.Plot(valueList, title, xlabel=None, ylabel=None, gformat='.png', fontsize=12, plotdir=None, debug=False)[source]

Bases: object

create Plot based on counters see https://stackoverflow.com/questions/19198920/using-counter-in-python-to-build-histogram

Constructor

barchart(mode='show')[source]

barchart based histogram for the given counter

hist(mode='show')[source]

create histogram for the given counter

showDebug()[source]
showMe(mode='show', close=True)[source]

show me in the given mode

titleMe()[source]

set my title and labels

lodstorage.query module

Created on 2020-08-22

@author: wf

class lodstorage.query.Endpoint[source]

Bases: JSONAble

a query endpoint

constructor for setting defaults

static getSamples()[source]
class lodstorage.query.EndpointManager[source]

Bases: object

manages a set of SPARQL endpoints

static getEndpointNames(endpointPath=None) list[source]

Returns a list of all available endpoint names

static getEndpoints(endpointPath=None)[source]

get the queries for thee given queries Path

class lodstorage.query.Format(value)[source]

Bases: Enum

the supported formats for the results to be delivered

csv = 'csv'
github = 'github'
json = 'json'
latex = 'latex'
mediawiki = 'mediawiki'
tsv = 'tsv'
xml = 'xml'
class lodstorage.query.Query(name: str, query: str, lang='sparql', endpoint: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, prefixes=None, tryItUrl: Optional[str] = None, formats: Optional[list] = None, debug=False)[source]

Bases: object

a Query e.g. for SPAQRL

constructor :param name: the name/label of the query :type name: string :param query: the native Query text e.g. in SPARQL :type query: string :param lang: the language of the query e.g. SPARQL :type lang: string :param endpoint: the endpoint url to use :type endpoint: string :param title: the header/title of the query :type title: string :param description: the description of the query :type description: string :param prefixes: list of prefixes to be resolved :type prefixes: list :param tryItUrl: the url of a “tryit” webpage :type tryItUrl: str :param formats: key,value pairs of ValueFormatters to be applied :type formats: list :param debug: true if debug mode should be switched on :type debug: boolean

addFormatCallBack(callback)[source]
asWikiMarkup(listOfDicts)[source]

convert the given listOfDicts result to MediaWiki markup

Parameters

listOfDicts (list) – the list of Dicts to convert to MediaWiki markup

Returns

the markup

Return type

string

asWikiSourceMarkup()[source]

convert me to Mediawiki markup for syntax highlighting using the “source” tag

Returns

the Markup

Return type

string

asYaml()[source]
documentQueryResult(qlod: list, limit=None, tablefmt: str = 'mediawiki', tryItUrl: Optional[str] = None, withSourceCode=True, **kwArgs)[source]

document the given query results - note that a copy of the whole list is going to be created for being able to format

Parameters
  • qlod – the list of dicts result

  • limit (int) – the maximum number of records to display in result tabulate

  • tablefmt (str) – the table format to use

  • tryItUrl – the “try it!” url to show

  • withSourceCode (bool) – if True document the source code

Returns

the documentation tabular text for the given parameters

Return type

str

formatWithValueFormatters(lod, tablefmt: str)[source]

format the given list of Dicts with the ValueFormatters

convert the given url and title to a link for the given tablefmt

Parameters
  • url (str) – the url to convert

  • title (str) – the title to show

  • tablefmt (str) – the table format to use

getTryItUrl(baseurl: str)[source]

return the “try it!” url for the given baseurl

Parameters

baseurl (str) – the baseurl to used

Returns

the “try it!” url for the given query

Return type

str

preFormatWithCallBacks(lod, tablefmt: str)[source]

run the configured call backs to pre-format the given list of dicts for the given tableformat

Parameters
  • lod (list) – the list of dicts to handle

  • tablefmt (str) – the table format (according to tabulate) to apply

convert url prefixes to link according to the given table format TODO - refactor as preFormat callback

Parameters
  • lod (list) – the list of dicts to convert

  • prefix (str) – the prefix to strip

  • tablefmt (str) – the tabulate tableformat to use

class lodstorage.query.QueryManager(lang: Optional[str] = None, debug=False, queriesPath=None)[source]

Bases: object

manages pre packaged Queries

Constructor :param lang: the language to use for the queries sql or sparql :type lang: string :param debug: True if debug information should be shown :type debug: boolean

static getQueries(queriesPath=None)[source]

get the queries for thee given queries Path

class lodstorage.query.QueryResultDocumentation(query, title: str, tablefmt: str, tryItMarkup: str, sourceCodeHeader: str, sourceCode: str, resultHeader: str, result: str)[source]

Bases: object

documentation of a query result

constructor

Parameters
  • query (Query) – the query to be documented

  • title (str) – the title markup

  • tablefmt (str) – the tableformat that has been used

  • tryItMarkup – the “try it!” markup to show

  • sourceCodeHeader (str) – the header title to use for the sourceCode

  • sourceCode (str) – the sourceCode

  • resultCodeHeader (str) – the header title to use for the result

  • result (str) – the result header

asText()[source]

return my text representation

Returns

description, sourceCodeHeader, sourceCode, tryIt link and result table

Return type

str

static uniCode2Latex(text: str, withConvert: bool = False) str[source]

converts unicode text to latex and fixes UTF-8 chars for latex in a certain range:

₀:$_0$ … ₉:$_9$

see https://github.com/phfaist/pylatexenc/issues/72

Parameters
  • text (str) – the string to fix

  • withConvert (bool) – if unicode to latex libary conversion should be used

Returns

latex presentation of UTF-8 char

Return type

str

class lodstorage.query.QuerySyntaxHighlight(query, highlightFormat: str = 'html')[source]

Bases: object

Syntax highlighting for queries with pygments

construct me for the given query and highlightFormat

Parameters
  • query (Query) – the query to do the syntax highlighting for

  • highlightFormat (str) – the highlight format to be used

highlight()[source]
Returns

the result of the syntax highlighting with pygments

Return type

str

class lodstorage.query.ValueFormatter(name: str, formatString: str, regexps: Optional[list] = None)[source]

Bases: object

a value Formatter

constructor

Parameters
  • fstring (str) – the format String to use

  • regexps (list) – the regular expressions to apply

applyFormat(record, key, resultFormat: Format)[source]

apply the given format to the given record

Parameters
  • record (dict) – the record to handle

  • key (str) – the property key

  • resultFormat (str) – the resultFormat Style to apply

formatsPath = '/Users/wf/Documents/pyworkspace/pyLoDStorage/lodstorage/../sampledata/formats.yaml'
classmethod fromDict(name: str, record: dict)[source]

create a ValueFormatter from the given dict

classmethod getFormats(formatsPath: Optional[str] = None) dict[source]

get the available ValueFormatters

Parameters

formatsPath (str) – the path to the yaml file to read the format specs from

Returns

a map for ValueFormatters by formatter Name

Return type

dict

home = '/Users/wf'
valueFormats = None
class lodstorage.query.YamlPath[source]

Bases: object

static getPaths(yamlFileName: str, yamlPath: Optional[str] = None)[source]

lodstorage.sample module

Created on 2020-08-24

@author: wf

class lodstorage.sample.Cities(load=False)[source]

Bases: JSONAbleList

Constructor

Parameters
  • listName (str) – the name of the list attribute to be used for storing the List

  • clazz (class) – a class to be used for Object relational mapping (if any)

  • tableName (str) – the name of the “table” to be used

  • initList (bool) – True if the list should be initialized

  • handleInvalidListTypes (bool) – True if invalidListTypes should be converted or filtered

  • filterInvalidListTypes (bool) – True if invalidListTypes should be deleted

class lodstorage.sample.Royal[source]

Bases: JSONAble

i am a single Royal

Constructor

classmethod getSamples()[source]
class lodstorage.sample.Royals(load=False)[source]

Bases: JSONAbleList

a non ORM Royals list

Constructor

Parameters
  • listName (str) – the name of the list attribute to be used for storing the List

  • clazz (class) – a class to be used for Object relational mapping (if any)

  • tableName (str) – the name of the “table” to be used

  • initList (bool) – True if the list should be initialized

  • handleInvalidListTypes (bool) – True if invalidListTypes should be converted or filtered

  • filterInvalidListTypes (bool) – True if invalidListTypes should be deleted

class lodstorage.sample.RoyalsORMList(load=False)[source]

Bases: JSONAbleList

Constructor

Parameters
  • listName (str) – the name of the list attribute to be used for storing the List

  • clazz (class) – a class to be used for Object relational mapping (if any)

  • tableName (str) – the name of the “table” to be used

  • initList (bool) – True if the list should be initialized

  • handleInvalidListTypes (bool) – True if invalidListTypes should be converted or filtered

  • filterInvalidListTypes (bool) – True if invalidListTypes should be deleted

class lodstorage.sample.Sample[source]

Bases: object

Sample dataset generator

Constructor

cityList = None
static dob(isoDateString)[source]

get the date of birth from the given iso date state

static getCities()[source]

get a list of cities

static getCountries()[source]
static getRoyals()[source]
static getRoyalsInstances()[source]
static getSample(size)[source]

lodstorage.schema module

Created on 2021-01-26

@author: wf

class lodstorage.schema.Schema(name: str, title: str)[source]

Bases: object

a relational Schema

Constructor

Parameters
  • name (str) – the name of the schema

  • title (str) – the title of the schema

static generalizeColumn(tableList, colName: str)[source]

remove the column with the given name from all tables in the tablelist and return it

Parameters
  • tableList (list) – a list of Tables

  • colName (string) – the name of the column to generalize

Returns

the column having been generalized and removed

Return type

string

static getGeneral(tableList, name: str, debug: bool = False)[source]

derive a general table from the given table list :param tableList: a list of tables :type tableList: list :param name: name of the general table :type name: str :param debug: True if column names should be shown :type debug: bool

Returns

at table dict for the generalized table

static getGeneralViewDDL(tableList, name: str, debug=False) str[source]

get the DDL statement to create a general view

Parameters
  • tableList – the list of tables

  • name (str) – the name of the view

  • debug (bool) – True if debug should be set

class lodstorage.schema.SchemaManager(schemaDefs=None, baseUrl: Optional[str] = None)[source]

Bases: object

a manager for schemas

constructor
Args:

schemaDefs(dict): a dictionary of schema names baseUrl(str): the base url to use for links

lodstorage.sparql module

Created on 2020-08-14

@author: wf

class lodstorage.sparql.SPARQL(url, mode='query', debug=False, typedLiterals=False, profile=False, agent='PyLodStorage', method='POST')[source]

Bases: object

wrapper for SPARQL e.g. Apache Jena, Virtuoso, Blazegraph

Variables
  • url – full endpoint url (including mode)

  • mode – ‘query’ or ‘update’

  • debug – True if debugging is active

  • typedLiterals – True if INSERT should be done with typedLiterals

  • profile(boolean) – True if profiling / timing information should be displayed

  • sparql – the SPARQLWrapper2 instance to be used

  • method(str) – the HTTP method to be used ‘POST’ or ‘GET’

Constructor a SPARQL wrapper

Parameters
  • url (string) – the base URL of the endpoint - the mode query/update is going to be appended

  • mode (string) – ‘query’ or ‘update’

  • debug (bool) – True if debugging is to be activated

  • typedLiterals (bool) – True if INSERT should be done with typedLiterals

  • profile (boolean) – True if profiling / timing information should be displayed

  • agent (string) – the User agent to use

  • method (string) – the HTTP method to be used ‘POST’ or ‘GET’

asListOfDicts(records, fixNone: bool = False, sampleCount: Optional[int] = None)[source]

convert SPARQL result back to python native

Parameters
  • record (list) – the list of bindings

  • fixNone (bool) – if True add None values for empty columns in Dict

  • sampleCount (int) – the number of samples to check

Returns

a list of Dicts

Return type

list

controlChars = ['\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07', '\x08', '\t', '\n', '\x0b', '\x0c', '\r', '\x0e', '\x0f', '\x10', '\x11', '\x12', '\x13', '\x14', '\x15', '\x16', '\x17', '\x18', '\x19', '\x1a', '\x1b', '\x1c', '\x1d', '\x1e', '\x1f']
static controlEscape(s)[source]

escape control characters

see https://stackoverflow.com/a/9778992/1497139

getFirst(qLod: list, attr: str)[source]

get the column attr of the first row of the given qLod list

Parameters
  • qLod (list) – the list of dicts (returned by a query)

  • attr (str) – the attribute to retrieve

Returns

the value

Return type

object

getLocalName(name)[source]

retrieve valid localname from a string based primary key https://www.w3.org/TR/sparql11-query/#prefNames

Parameters

name (string) – the name to convert

Returns

a valid local name

Return type

string

getResults(jsonResult)[source]

get the result from the given jsonResult

Parameters

jsonResult – the JSON encoded result

Returns

the list of bindings

Return type

list

getValue(sparqlQuery: str, attr: str)[source]

get the value for the given SPARQL query using the given attr

Parameters
  • sparql (SPARQL) – the SPARQL endpoint to ge the value for

  • sparqlQuery (str) – the SPARQL query to run

  • attr (str) – the attribute to get

getValues(sparqlQuery: str, attrList: list)[source]

get Values for the given sparlQuery and attribute list

insert(insertCommand)[source]

run an insert

Parameters

insertCommand (string) – the SPARQL INSERT command

Returns

a response

insertListOfDicts(listOfDicts, entityType, primaryKey, prefixes, limit=None, batchSize=None, profile=False)[source]

insert the given list of dicts mapping datatypes

Parameters
  • entityType (string) – the entityType to use as a

  • primaryKey (string) – the name of the primary key attribute to use

  • prefix (string) – any PREFIX statements to be used

  • limit (int) – maximum number of records to insert

  • batchSize (int) – number of records to send per request

Returns

a list of errors which should be empty on full success

datatype maping according to https://www.w3.org/TR/xmlschema-2/#built-in-datatypes

mapped from https://docs.python.org/3/library/stdtypes.html

compare to https://www.w3.org/2001/sw/rdb2rdf/directGraph/ http://www.bobdc.com/blog/json2rdf/ https://www.w3.org/TR/json-ld11-api/#data-round-tripping https://stackoverflow.com/questions/29030231/json-to-rdf-xml-file-in-python

insertListOfDictsBatch(listOfDicts, entityType, primaryKey, prefixes, title='batch', batchIndex=None, total=None, startTime=None)[source]

insert a Batch part of listOfDicts

Parameters
  • entityType (string) – the entityType to use as a

  • primaryKey (string) – the name of the primary key attribute to use

  • prefix (string) – any PREFIX statements to be used

  • title (string) – the title to display for the profiling (if any)

  • batchIndex (int) – the start index of the current batch

  • total (int) – the total number of records for all batches

  • starttime (datetime) – the start of the batch processing

Returns

a list of errors which should be empty on full success

printErrors(errors)[source]

print the given list of errors

Parameters

errors (list) – a list of error strings

Returns

True if the list is empty else false

Return type

boolean

query(queryString, method='POST')[source]

get a list of results for the given query

Parameters
  • queryString (string) – the SPARQL query to execute

  • method (string) – the method eg. POST to use

Returns

list of bindings

Return type

list

queryAsListOfDicts(queryString, fixNone: bool = False, sampleCount: Optional[int] = None)[source]

get a list of dicts for the given query (to allow round-trip results for insertListOfDicts)

Parameters
  • queryString (string) – the SPARQL query to execute

  • fixNone (bool) – if True add None values for empty columns in Dict

  • sampleCount (int) – the number of samples to check

Returns

a list ofDicts

Return type

list

rawQuery(queryString, method='POST')[source]

query with the given query string

Parameters
  • queryString (string) – the SPARQL query to be performed

  • method (string) – POST or GET - POST is mandatory for update queries

Returns

the raw query result as bindings

Return type

list

static strToDatetime(value, debug=False)[source]

convert a string to a datetime :param value: the value to convert :type value: str

Returns

the datetime

Return type

datetime

lodstorage.sql module

Created on 2020-08-24

@author: wf

class lodstorage.sql.EntityInfo(sampleRecords, name, primaryKey=None, debug=False)[source]

Bases: object

holds entity meta Info

Variables
  • name(string) – entity name = table name

  • primaryKey(string) – the name of the primary key column

  • typeMap(dict) – maps column names to python types

  • debug(boolean) – True if debug information should be shown

construct me from the given name and primary key

Parameters
  • name (string) – the name of the entity

  • primaryKey (string) – the name of the primary key column

  • debug (boolean) – True if debug information should be shown

addType(column, valueType, sqlType)[source]

add the python type for the given column to the typeMap

Parameters
  • column (string) – the name of the column

  • valueType (type) – the python type of the column

fixDates(resultList)[source]

fix date entries in the given resultList by parsing the date content e.g. converting ‘1926-04-21’ back to datetime.date(1926, 4, 21)

Parameters

resultList (list) – the list of records to be fixed

getCreateTableCmd(sampleRecords)[source]

get the CREATE TABLE DDL command for the given sample records

Parameters

sampleRecords (list) – a list of Dicts of sample Records

Returns

CREATE TABLE DDL command for this entity info

Return type

string

Example:

CREATE TABLE Person(name TEXT PRIMARY KEY,born DATE,numberInLine INTEGER,wikidataurl TEXT,age FLOAT,ofAge BOOLEAN)
getInsertCmd()[source]

get the INSERT command for this entityInfo

Returns

the INSERT INTO SQL command for his entityInfo e.g.

Example:

INSERT INTO Person (name,born,numberInLine,wikidataurl,age,ofAge) values (?,?,?,?,?,?).
class lodstorage.sql.SQLDB(dbname: str = ':memory:', connection=None, check_same_thread=True, timeout=5, debug=False, errorDebug=False)[source]

Bases: object

Structured Query Language Database wrapper

Variables
  • dbname(string) – name of the database

  • debug(boolean) – True if debug info should be provided

  • errorDebug(boolean) – True if debug info should be provided on errors (should not be used for production since it might reveal data)

Construct me for the given dbname and debug

Parameters
  • dbname (string) – name of the database - default is a RAM based database

  • connection (Connection) – an optional connection to be reused

  • check_same_thread (boolean) – True if object handling needs to be on the same thread see https://stackoverflow.com/a/48234567/1497139

  • timeout (float) – number of seconds for connection timeout

  • debug (boolean) – if True switch on debug

  • errorDebug (boolean) – True if debug info should be provided on errors (should not be used for production since it might reveal data)

RAM = ':memory:'
backup(backupDB, action='Backup', profile=False, showProgress: int = 200, doClose=True)[source]

create backup of this SQLDB to the given backup db

see https://stackoverflow.com/a/59042442/1497139

Parameters
  • backupDB (string) – the path to the backupdb or SQLDB.RAM for in memory

  • action (string) – the action to display

  • profile (boolean) – True if timing information shall be shown

  • showProgress (int) – show progress at each showProgress page (0=show no progress)

backupProgress(status, remaining, total)[source]
close()[source]

close my connection

copyTo(copyDB, profile=True)[source]

copy my content to another database

Parameters
  • copyDB (Connection) – the target database

  • profile (boolean) – if True show profile information

createTable(listOfRecords, entityName: str, primaryKey: Optional[str] = None, withCreate: bool = True, withDrop: bool = False, sampleRecordCount=1, failIfTooFew=True)[source]

derive Data Definition Language CREATE TABLE command from list of Records by examining first recorda as defining sample record and execute DDL command

auto detect column types see e.g. https://stackoverflow.com/a/57072280/1497139

Parameters
  • listOfRecords (list) – a list of Dicts

  • entityName (string) – the entity / table name to use

  • primaryKey (string) – the key/column to use as a primary key

  • withDrop (boolean) – true if the existing Table should be dropped

  • withCreate (boolean) – true if the create Table command should be executed - false if only the entityInfo should be returned

  • sampleRecords (int) – number of sampleRecords expected and to be inspected

  • failIfTooFew (boolean) – raise an Exception if to few sampleRecords else warn only

Returns

meta data information for the created table

Return type

EntityInfo

execute(ddlCmd)[source]

execute the given Data Definition Command

Parameters

ddlCmd (string) – e.g. a CREATE TABLE or CREATE View command

executeDump(connection, dump, title, maxErrors=100, errorDisplayLimit=12, profile=True)[source]

execute the given dump for the given connection

Parameters
  • connection (Connection) – the sqlite3 connection to use

  • dump (string) – the SQL commands for the dump

  • title (string) – the title of the dump

  • maxErrors (int) – maximum number of errors to be tolerated before stopping and doing a rollback

  • profile (boolean) – True if profiling information should be shown

Returns

a list of errors

getDebugInfo(record, index, executeMany)[source]

get the debug info for the given record at the given index depending on the state of executeMany

Parameters
  • record (dict) – the record to show

  • index (int) – the index of the record

  • executeMany (boolean) – if True the record may be valid else not

getTableDict(tableType='table')[source]

get the schema information from this database as a dict

Parameters

tableType (str) – table or view

Returns

Lookup map of tables with columns also being converted to dict

Return type

dict

getTableList(tableType='table')[source]

get the schema information from this database

Parameters

tableType (str) – table or view

Returns

a list as derived from PRAGMA table_info

Return type

list

logError(msg)[source]

log the given error message to stderr

Parameters

msg (str) – the error messsage to display

progress(action, status, remaining, total)[source]

show progress

query(sqlQuery, params=None)[source]

run the given sqlQuery and return a list of Dicts

Parameters
  • sqlQuery (string) – the SQL query to be executed

  • params (tuple) – the query params, if any

Returns

a list of Dicts

Return type

list

queryAll(entityInfo, fixDates=True)[source]

query all records for the given entityName/tableName

Parameters
  • entityName (string) – name of the entity/table to qury

  • fixDates (boolean) – True if date entries should be returned as such and not as strings

queryGen(sqlQuery, params=None)[source]

run the given sqlQuery a a generator for dicts

Parameters
  • sqlQuery (string) – the SQL query to be executed

  • params (tuple) – the query params, if any

Returns

a generator of dicts

static restore(backupDB, restoreDB, profile=False, showProgress=200, debug=False)[source]

restore the restoreDB from the given backup DB

Parameters
  • backupDB (string) – path to the backupDB e.g. backup.db

  • restoreDB (string) – path to the restoreDB or in Memory SQLDB.RAM

  • profile (boolean) – True if timing information should be shown

  • showProgress (int) – show progress at each showProgress page (0=show no progress)

restoreProgress(status, remaining, total)[source]
showDump(dump, limit=10)[source]

show the given dump up to the given limit

Parameters
  • dump (string) – the SQL dump to show

  • limit (int) – the maximum number of lines to display

store(listOfRecords, entityInfo, executeMany=False, fixNone=False)[source]

store the given list of records based on the given entityInfo

Parameters
  • listOfRecords (list) – the list of Dicts to be stored

  • entityInfo (EntityInfo) – the meta data to be used for storing

  • executeMany (bool) – if True the insert command is done with many/all records at once

  • fixNone (bool) – if True make sure empty columns in the listOfDict are filled with “None” values

lodstorage.storageconfig module

Created on 2020-08-29

@author: wf

class lodstorage.storageconfig.StorageConfig(mode=StoreMode.SQL, cacheRootDir: Optional[str] = None, cacheDirName: str = 'lodstorage', cacheFile=None, withShowProgress=True, profile=True, debug=False, errorDebug=True)[source]

Bases: object

a storage configuration

Constructor

Parameters
  • mode (StoreMode) – the storage mode e.g. sql

  • cacheRootDir (str) – the cache root directory to use - if None the home directory will be used

  • cacheFile (string) – the common cacheFile to use (if any)

  • withShowProgress (boolean) – True if progress should be shown

  • profile (boolean) – True if timing / profiling information should be shown

  • debug (boolean) – True if debugging information should be shown

  • errorDebug (boolean) – True if debug info should be provided on errors (should not be used for production since it might reveal data)

getCachePath(ensureExists=True) str[source]

get the path to the default cache

Parameters

name (str) – the name of the cache to use

static getDefault(debug=False)[source]
static getJSON(debug=False)[source]
static getJsonPickle(debug=False)[source]
static getSPARQL(prefix, endpoint, host, debug=False)[source]
static getSQL(debug=False)[source]
static getYaml(debug=False)[source]
class lodstorage.storageconfig.StoreMode(value)[source]

Bases: Enum

possible supported storage modes

JSON = 2
JSONPICKLE = 1
SPARQL = 4
SQL = 3
YAML = 5

lodstorage.tabulateCounter module

Created on 2021-06-13

@author: wf

class lodstorage.tabulateCounter.TabulateCounter(counter)[source]

Bases: object

helper for tabulating Counters

Constructor

mostCommonTable(headers=['#', 'key', 'count', '%'], tablefmt='pretty', limit=50)[source]

get the most common Table

lodstorage.uml module

Created on 2020-09-04

@author: wf

class lodstorage.uml.UML(debug=False)[source]

Bases: object

UML diagrams via plantuml

Constructor :param debug: True if debug information should be shown :type debug: boolean

mergeSchema(schemaManager, tableList, title=None, packageName=None, generalizeTo=None, withSkin=True)[source]

merge Schema and tableList to PlantUml notation

Parameters
  • schemaManager (SchemaManager) – a schema manager to be used

  • tableList (list) – the tableList list of Dicts from getTableList() to convert

  • title (string) – optional title to be added

  • packageName (string) – optional packageName to be added

  • generalizeTo (string) – optional name of a general table to be derived

  • withSkin (boolean) – if True add default BITPlan skin parameters

Returns

the Plantuml notation for the entities in columns of the given tablelist

Return type

string

skinparams = "\n' BITPlan Corporate identity skin params\n' Copyright (c) 2015-2020 BITPlan GmbH\n' see http://wiki.bitplan.com/PlantUmlSkinParams#BITPlanCI\n' skinparams generated by com.bitplan.restmodelmanager\nskinparam note {\n  BackGroundColor #FFFFFF\n  FontSize 12\n  ArrowColor #FF8000\n  BorderColor #FF8000\n  FontColor black\n  FontName Technical\n}\nskinparam component {\n  BackGroundColor #FFFFFF\n  FontSize 12\n  ArrowColor #FF8000\n  BorderColor #FF8000\n  FontColor black\n  FontName Technical\n}\nskinparam package {\n  BackGroundColor #FFFFFF\n  FontSize 12\n  ArrowColor #FF8000\n  BorderColor #FF8000\n  FontColor black\n  FontName Technical\n}\nskinparam usecase {\n  BackGroundColor #FFFFFF\n  FontSize 12\n  ArrowColor #FF8000\n  BorderColor #FF8000\n  FontColor black\n  FontName Technical\n}\nskinparam activity {\n  BackGroundColor #FFFFFF\n  FontSize 12\n  ArrowColor #FF8000\n  BorderColor #FF8000\n  FontColor black\n  FontName Technical\n}\nskinparam classAttribute {\n  BackGroundColor #FFFFFF\n  FontSize 12\n  ArrowColor #FF8000\n  BorderColor #FF8000\n  FontColor black\n  FontName Technical\n}\nskinparam interface {\n  BackGroundColor #FFFFFF\n  FontSize 12\n  ArrowColor #FF8000\n  BorderColor #FF8000\n  FontColor black\n  FontName Technical\n}\nskinparam class {\n  BackGroundColor #FFFFFF\n  FontSize 12\n  ArrowColor #FF8000\n  BorderColor #FF8000\n  FontColor black\n  FontName Technical\n}\nskinparam object {\n  BackGroundColor #FFFFFF\n  FontSize 12\n  ArrowColor #FF8000\n  BorderColor #FF8000\n  FontColor black\n  FontName Technical\n}\nhide Circle\n' end of skinparams '\n"
tableListToPlantUml(tableList, title=None, packageName=None, generalizeTo=None, withSkin=True)[source]

convert tableList to PlantUml notation

Parameters
  • tableList (list) – the tableList list of Dicts from getTableList() to convert

  • title (string) – optional title to be added

  • packageName (string) – optional packageName to be added

  • generalizeTo (string) – optional name of a general table to be derived

  • withSkin (boolean) – if True add default BITPlan skin parameters

Returns

the Plantuml notation for the entities in columns of the given tablelist

Return type

string

lodstorage.yamlablemixin module

class lodstorage.yamlablemixin.YamlAbleMixin[source]

Bases: object

allow reading and writing derived objects from a yaml file

debug = False
static readYaml(name)[source]
writeYaml(name)[source]

Module contents