lodstorage package¶
Submodules¶
lodstorage.csv module¶
- class lodstorage.csv.CSV(name)[source]¶
Bases:
LOD
helper for converting data in csv format to list of dicts (LoD) and vice versa
Constructor
- static fromCSV(csvString: str, fields: Optional[list] = None, delimiter=',', quoting=2, **kwargs)[source]¶
convert given csv string to list of dicts (LOD)
- Parameters
csvStr (str) – csv string that should be converted to LOD
headerNames (list) – Names of the headers that should be used. If None it is assumed that the header is given.
- Returns
list of dicts (LoD) containing the content of the given csv string
- static readFile(filename: str) str [source]¶
Reads the given filename and returns it as string :param filename: Name of the file that should be returned as string
- Returns
Content of the file as string
- static restoreFromCSVFile(filePath: str, headerNames: Optional[list] = None, withPostfix: bool = False)[source]¶
restore LOD from given csv file
- Parameters
filePath (str) – file name
headerNames (list) – Names of the headers that should be used. If None it is assumed that the header is given.
withPostfix (bool) – If False the file type is appended to given filePath. Otherwise file type MUST be given with filePath.
- Returns
list of dicts (LoD) containing the content of the given csv file
- static storeToCSVFile(lod: list, filePath: str, withPostfix: bool = False)[source]¶
converts the given lod to CSV file.
- Parameters
lod (list) – lod that should be converted to csv file
filePath (str) – file name the csv should be stored to
withPostfix (bool) – If False the file type is appended to given filePath. Otherwise file type MUST be given with filePath.
- Returns
csv string of the given lod
- static toCSV(lod: list, includeFields: Optional[list] = None, excludeFields: Optional[list] = None, delimiter=',', quoting=2, **kwargs)[source]¶
converts the given lod to CSV string. For details about the csv dialect parameters see https://docs.python.org/3/library/csv.html#csv-fmt-params
- Parameters
lod (list) – lod that should be converted to csv string
includeFields (list) – list of fields that should be included in the csv (positive list)
excludeFields (list) – list of fields that should be excluded from the csv (negative list)
kwargs – csv dialect parameters
- Returns
csv string of the given lod
lodstorage.entity module¶
Created on 2020-08-19
@author: wf
- class lodstorage.entity.EntityManager(name, entityName, entityPluralName: str, listName: Optional[str] = None, clazz=None, tableName: Optional[str] = None, primaryKey: Optional[str] = None, config=None, handleInvalidListTypes=False, filterInvalidListTypes=False, listSeparator='⇹', debug=False)[source]¶
Bases:
YamlAbleMixin
,JsonPickleMixin
,JSONAbleList
generic entity manager
Constructor
- Parameters
name (string) – name of this eventManager
entityName (string) – entityType to be managed e.g. Country
entityPluralName (string) – plural of the the entityType e.g. Countries
config (StorageConfig) – the configuration to be used if None a default configuration will be used
handleInvalidListTypes (bool) – True if invalidListTypes should be converted or filtered
filterInvalidListTypes (bool) – True if invalidListTypes should be deleted
listSeparator (str) – the symbol to use as a list separator
debug (boolean) – override debug setting when default of config is used via config=None
- fromCache(force: bool = False, getListOfDicts=None, append=False, sampleRecordCount=- 1)[source]¶
get my entries from the cache or from the callback provided
- Parameters
force (bool) – force ignoring the cache
getListOfDicts (callable) – a function to call for getting the data
append (bool) – True if records should be appended
sampleRecordCount (int) – the number of records to analyze for type information
- Returns
the list of Dicts and as a side effect setting self.cacheFile
- fromStore(cacheFile=None, setList: bool = True) list [source]¶
restore me from the store :param cacheFile: the cacheFile to use if None use the pre configured cachefile :type cacheFile: String :param setList: if True set my list with the data from the cache file :type setList: bool
- Returns
list of dicts or JSON entitymanager
- Return type
list
- getCacheFile(config=None, mode=StoreMode.SQL)[source]¶
get the cache file for this event manager :param config: if None get the cache for my mode :type config: StorageConfig :param mode: the storeMode to use :type mode: StoreMode
- getLoD()[source]¶
Return the LoD of the entities in the list
- Returns
a list of Dicts
- Return type
list
- getSQLDB(cacheFile)[source]¶
get the SQL database for the given cacheFile
- Parameters
cacheFile (string) – the file to get the SQL db from
- initSQLDB(sqldb, listOfDicts=None, withCreate: bool = True, withDrop: bool = True, sampleRecordCount=- 1)[source]¶
initialize my sql DB
- Parameters
listOfDicts (list) – the list of dicts to analyze for type information
withDrop (boolean) – true if the existing Table should be dropped
withCreate (boolean) – true if the create Table command should be executed - false if only the entityInfo should be returned
sampleRecordCount (int) – the number of records to analyze for type information
- Returns
the entity information such as CREATE Table command
- Return type
- setNone(record, fields)[source]¶
make sure the given fields in the given record are set to none :param record: the record to work on :type record: dict :param fields: the list of fields to set to None :type fields: list
- showProgress(msg)[source]¶
display a progress message
- Parameters
msg (string) – the message to display
- store(limit=10000000, batchSize=250, append=False, fixNone=True, sampleRecordCount=- 1) str [source]¶
store my list of dicts
- Parameters
limit (int) – maximum number of records to store per batch
batchSize (int) – size of batch for storing
append (bool) – True if records should be appended
fixNone (bool) – if True make sure the dicts are filled with None references for each record
sampleRecordCount (int) – the number of records to analyze for type information
- Returns
The cachefile being used
- Return type
str
- storeLoD(listOfDicts, limit=10000000, batchSize=250, cacheFile=None, append=False, fixNone=True, sampleRecordCount=1) str [source]¶
store my entities
- Parameters
listOfDicts (list) – the list of dicts to store
limit (int) – maximum number of records to store
batchSize (int) – size of batch for storing
cacheFile (string) – the name of the storage e.g path to JSON or sqlite3 file
append (bool) – True if records should be appended
fixNone (bool) – if True make sure the dicts are filled with None references for each record
sampleRecordCount (int) – the number of records to analyze for type information
- Returns
The cachefile being used
- Return type
str
lodstorage.jsonable module¶
This module has a class JSONAble for serialization of tables/list of dicts to and from JSON encoding
Created on 2020-09-03
@author: wf
- class lodstorage.jsonable.JSONAble[source]¶
Bases:
object
mixin to allow classes to be JSON serializable see
Constructor
- asJSON(asString=True, data=None)[source]¶
recursively return my dict elements
- Parameters
asString (boolean) – if True return my result as a string
- checkExtension(jsonFile: str, extension: str = '.json') str [source]¶
make sure the jsonFile has the given extension e.g. “.json”
- Parameters
jsonFile (str) – the jsonFile name - potentially without “.json” suffix
- Returns
the jsonFile name with “.json” as an extension guaranteed
- Return type
str
- fromDict(data: dict)[source]¶
initialize me from the given data
- Parameters
data (dict) – the dictionary to initialize me from
- fromJson(jsonStr)[source]¶
initialize me from the given JSON string
- Parameters
jsonStr (str) – the JSON string
- getJSONValue(v)[source]¶
get the value of the given v as JSON
- Parameters
v (object) – the value to get
- Returns
the the value making sure objects are return as dicts
- static getJsonTypeSamplesForClass(cls)[source]¶
return the type samples for the given class
- Returns
a list of dict that specify the types by example
- Return type
list
- static readJsonFromFile(jsonFilePath)[source]¶
read json string from the given jsonFilePath
- Parameters
jsonFilePath (string) – the path of the file where to read the result from
- Returns
the JSON string read from the file
- reprDict(srcDict)[source]¶
get the given srcDict as new dict with fields being converted with getJSONValue
- Parameters
scrcDict (dict) – the source dictionary
- Returns
dict: the converted dictionary
- restoreFromJsonFile(jsonFile: str)[source]¶
restore me from the given jsonFile
- Parameters
jsonFile (string) – the jsonFile to restore me from
- static singleQuoteToDoubleQuote(singleQuoted, useRegex=False)[source]¶
convert a single quoted string to a double quoted one
- Parameters
singleQuoted (str) –
a single quoted string e.g.
{‘cities’: [{‘name’: “Upper Hell’s Gate”}]}
useRegex (boolean) – True if a regular expression shall be used for matching
- Returns
the double quoted version of the string
- Return type
string
- static singleQuoteToDoubleQuoteUsingBracketLoop(singleQuoted)[source]¶
convert a single quoted string to a double quoted one using a regular expression
- Parameters
singleQuoted (string) – a single quoted string e.g. {‘cities’: [{‘name’: “Upper Hell’s Gate”}]}
useRegex (boolean) – True if a regular expression shall be used for matching
- Returns
the double quoted version of the string e.g.
- Return type
string
- static singleQuoteToDoubleQuoteUsingRegex(singleQuoted)[source]¶
convert a single quoted string to a double quoted one using a regular expression
- Parameters
singleQuoted (string) – a single quoted string e.g. {‘cities’: [{‘name’: “Upper Hell’s Gate”}]}
useRegex (boolean) – True if a regular expression shall be used for matching
- Returns
the double quoted version of the string e.g.
- Return type
string
- static storeJsonToFile(jsonStr, jsonFilePath)[source]¶
store the given json string to the given jsonFilePath
- Parameters
jsonStr (string) – the string to store
jsonFilePath (string) – the path of the file where to store the result
- storeToJsonFile(jsonFile: str, extension: str = '.json', limitToSampleFields: bool = False)[source]¶
store me to the given jsonFile
- Parameters
jsonFile (str) – the JSON file name (optionally without extension)
exension (str) – the extension to use if not part of the jsonFile name
limitToSampleFields (bool) – If True the returned JSON is limited to the attributes/fields that are present in the samples. Otherwise all attributes of the object will be included. Default is False.
- toJSON(limitToSampleFields: bool = False)[source]¶
- Parameters
limitToSampleFields (bool) – If True the returned JSON is limited to the attributes/fields that are present in the samples. Otherwise all attributes of the object will be included. Default is False.
- Returns
a recursive JSON dump of the dicts of my objects
- class lodstorage.jsonable.JSONAbleList(listName: Optional[str] = None, clazz=None, tableName: Optional[str] = None, initList: bool = True, handleInvalidListTypes=False, filterInvalidListTypes=False)[source]¶
Bases:
JSONAble
Container class
Constructor
- Parameters
listName (str) – the name of the list attribute to be used for storing the List
clazz (class) – a class to be used for Object relational mapping (if any)
tableName (str) – the name of the “table” to be used
initList (bool) – True if the list should be initialized
handleInvalidListTypes (bool) – True if invalidListTypes should be converted or filtered
filterInvalidListTypes (bool) – True if invalidListTypes should be deleted
- asJSON(asString=True)[source]¶
recursively return my dict elements
- Parameters
asString (boolean) – if True return my result as a string
- fromJson(jsonStr, types=None)[source]¶
initialize me from the given JSON string
- Parameters
jsonStr (str) – the JSON string
fixType (Types) – the types to be fixed
- fromLoD(lod, append: bool = True, debug: bool = False)[source]¶
load my entityList from the given list of dicts
- Parameters
lod (list) – the list of dicts to load
append (bool) – if True append to my existing entries
- Returns
a list of errors (if any)
- Return type
list
- getLoDfromJson(jsonStr: str, types=None, listName: Optional[str] = None)[source]¶
get a list of Dicts form the given JSON String
- Parameters
jsonStr (str) – the JSON string
fixType (Types) – the types to be fixed
- Returns
a list of dicts
- Return type
list
- getLookup(attrName: str, withDuplicates: bool = False)[source]¶
create a lookup dictionary by the given attribute name
- Parameters
attrName (str) – the attribute to lookup
withDuplicates (bool) – whether to retain single values or lists
- Returns
a dictionary for lookup or a tuple dictionary,list of duplicates depending on withDuplicates
- readLodFromJsonFile(jsonFile: str, extension: str = '.json')[source]¶
read the list of dicts from the given jsonFile
- Parameters
jsonFile (string) – the jsonFile to read from
- Returns
a list of dicts
- Return type
list
- readLodFromJsonStr(jsonStr) list [source]¶
restore me from the given jsonStr
- Parameters
storeFilePrefix (string) – the prefix for the JSON file name
- restoreFromJsonStr(jsonStr: str) list [source]¶
restore me from the given jsonStr
- Parameters
jsonStr (str) – the json string to restore me from
- class lodstorage.jsonable.JSONAbleSettings[source]¶
Bases:
object
settings for JSONAble - put in a separate class so they would not be serialized
- indent = 4¶
regular expression to be used for conversion from singleQuote to doubleQuote see https://stackoverflow.com/a/50257217/1497139
- singleQuoteRegex = re.compile("(?<!\\\\)'")¶
- class lodstorage.jsonable.Types(name: str, warnOnUnsupportedTypes=True, debug=False)[source]¶
Bases:
JSONAble
holds entity meta Info
- Variables
name(string) – entity name = table name
Constructor
- Parameters
name (str) – the name of the type map
warnOnUnsupportedTypes (bool) – if TRUE warn if an item value has an unsupported type
debug (bool) – if True - debugging information should be shown
- addType(listName, field, valueType)[source]¶
add the python type for the given field to the typeMap
- Parameters
listName (string) – the name of the list of the field
field (string) – the name of the field
valueType (type) – the python type of the field
- fixTypes(lod: list, listName: str)[source]¶
fix the types in the given data structure
- Parameters
lod (list) – a list of dicts
listName (str) – the types to lookup by list name
- static forTable(instance, listName: str, warnOnUnsupportedTypes: bool = True, debug=False)[source]¶
get the types for the list of Dicts (table) in the given instance with the given listName :param instance: the instance to inspect :type instance: object :param listName: the list of dicts to inspect :type listName: string :param warnOnUnsupportedTypes: if TRUE warn if an item value has an unsupported type :type warnOnUnsupportedTypes: bool :param debug: True if debuggin information should be shown :type debug: bool
- Returns
a types object
- Return type
- getTypes(listName: str, sampleRecords: list, limit: int = 10)[source]¶
determine the types for the given sample records
- Parameters
listName (str) – the name of the list
sampleRecords (list) – a list of items
limit (int) – the maximum number of items to check
- getTypesForItems(listName: str, items: list, warnOnNone: bool = False)[source]¶
get the types for the given items side effect is setting my types
- Parameters
listName (str) – the name of the list
items (list) – a list of items
warnOnNone (bool) – if TRUE warn if an item value is None
- typeName2Type = {'bool': <class 'bool'>, 'date': <class 'datetime.date'>, 'datetime': <class 'datetime.datetime'>, 'float': <class 'float'>, 'int': <class 'int'>, 'str': <class 'str'>}¶
lodstorage.jsonpicklemixin module¶
- class lodstorage.jsonpicklemixin.JsonPickleMixin[source]¶
Bases:
object
allow reading and writing derived objects from a jsonpickle file
- asJsonPickle() str [source]¶
convert me to JSON
- Returns
a JSON String with my JSON representation
- Return type
str
- static checkExtension(jsonFile: str, extension: str = '.json') str [source]¶
make sure the jsonFile has the given extension e.g. “.json”
- Parameters
jsonFile (str) – the jsonFile name - potentially without “.json” suffix
- Returns
the jsonFile name with “.json” as an extension guaranteed
- Return type
str
- debug = False¶
lodstorage.lod module¶
Created on 2021-01-31
@author: wf
- class lodstorage.lod.LOD(name)[source]¶
Bases:
object
list of Dict aka Table
Constructor
- static addLookup(lookup, duplicates, record, value, withDuplicates: bool)[source]¶
add a single lookup result
- Parameters
lookup (dict) – the lookup map
duplicates (list) – the list of duplicates
record (dict) – the current record
value (object) – the current value to lookup
withDuplicates (bool) – if True duplicates should be allowed and lists returned if False a separate duplicates
created (list is) –
- static filterFields(lod: list, fields: list, reverse: bool = False)[source]¶
filter the given LoD with the given list of fields by either limiting the LoD to the fields or removing the fields contained in the list depending on the state of the reverse parameter
- Parameters
lod (list) – list of dicts from which the fields should be excluded
fields (list) – list of fields that should be excluded from the lod
reverse (bool) – If True limit dict to the list of given fields. Otherwise exclude the fields from the dict.
- Returns
LoD
- static getLookup(lod: list, attrName: str, withDuplicates: bool = False)[source]¶
create a lookup dictionary by the given attribute name for the given list of dicts
- Parameters
lod (list) – the list of dicts to get the lookup dictionary for
attrName (str) – the attribute to lookup
withDuplicates (bool) – whether to retain single values or lists
- Returns
a dictionary for lookup
- classmethod handleListTypes(lod, doFilter=False, separator=',')[source]¶
handle list types in the given list of dicts
- Parameters
cls – this class
lod (list) – a list of dicts
doFilter (bool) – True if records containing lists value items should be filtered
separator (str) – the separator to use when converting lists
- static intersect(listOfDict1, listOfDict2, key=None)[source]¶
get the intersection of the two lists of Dicts by the given key
- static setNone(record, fields)[source]¶
make sure the given fields in the given record are set to none :param record: the record to work on :type record: dict :param fields: the list of fields to set to None :type fields: list
lodstorage.mwTable module¶
Created on 2020-08-21
@author: wf
- class lodstorage.mwTable.MediaWikiTable(wikiTable=True, colFormats=None, sortable=True, withNewLines=False)[source]¶
Bases:
object
helper for https://www.mediawiki.org/wiki/Help:Tables
Constructor
lodstorage.plot module¶
Created on 2020-07-05
@author: wf
- class lodstorage.plot.Plot(valueList, title, xlabel=None, ylabel=None, gformat='.png', fontsize=12, plotdir=None, debug=False)[source]¶
Bases:
object
create Plot based on counters see https://stackoverflow.com/questions/19198920/using-counter-in-python-to-build-histogram
Constructor
lodstorage.query module¶
Created on 2020-08-22
@author: wf
- class lodstorage.query.Endpoint[source]¶
Bases:
JSONAble
a query endpoint
constructor for setting defaults
- class lodstorage.query.EndpointManager[source]¶
Bases:
object
manages a set of SPARQL endpoints
- class lodstorage.query.Format(value)[source]¶
Bases:
Enum
the supported formats for the results to be delivered
- csv = 'csv'¶
- github = 'github'¶
- json = 'json'¶
- latex = 'latex'¶
- mediawiki = 'mediawiki'¶
- tsv = 'tsv'¶
- xml = 'xml'¶
- class lodstorage.query.Query(name: str, query: str, lang='sparql', endpoint: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, prefixes=None, tryItUrl: Optional[str] = None, formats: Optional[list] = None, debug=False)[source]¶
Bases:
object
a Query e.g. for SPAQRL
constructor :param name: the name/label of the query :type name: string :param query: the native Query text e.g. in SPARQL :type query: string :param lang: the language of the query e.g. SPARQL :type lang: string :param endpoint: the endpoint url to use :type endpoint: string :param title: the header/title of the query :type title: string :param description: the description of the query :type description: string :param prefixes: list of prefixes to be resolved :type prefixes: list :param tryItUrl: the url of a “tryit” webpage :type tryItUrl: str :param formats: key,value pairs of ValueFormatters to be applied :type formats: list :param debug: true if debug mode should be switched on :type debug: boolean
- asWikiMarkup(listOfDicts)[source]¶
convert the given listOfDicts result to MediaWiki markup
- Parameters
listOfDicts (list) – the list of Dicts to convert to MediaWiki markup
- Returns
the markup
- Return type
string
- asWikiSourceMarkup()[source]¶
convert me to Mediawiki markup for syntax highlighting using the “source” tag
- Returns
the Markup
- Return type
string
- documentQueryResult(qlod: list, limit=None, tablefmt: str = 'mediawiki', tryItUrl: Optional[str] = None, withSourceCode=True, **kwArgs)[source]¶
document the given query results - note that a copy of the whole list is going to be created for being able to format
- Parameters
qlod – the list of dicts result
limit (int) – the maximum number of records to display in result tabulate
tablefmt (str) – the table format to use
tryItUrl – the “try it!” url to show
withSourceCode (bool) – if True document the source code
- Returns
the documentation tabular text for the given parameters
- Return type
str
- formatWithValueFormatters(lod, tablefmt: str)[source]¶
format the given list of Dicts with the ValueFormatters
- getLink(url, title, tablefmt)[source]¶
convert the given url and title to a link for the given tablefmt
- Parameters
url (str) – the url to convert
title (str) – the title to show
tablefmt (str) – the table format to use
- getTryItUrl(baseurl: str)[source]¶
return the “try it!” url for the given baseurl
- Parameters
baseurl (str) – the baseurl to used
- Returns
the “try it!” url for the given query
- Return type
str
- preFormatWithCallBacks(lod, tablefmt: str)[source]¶
run the configured call backs to pre-format the given list of dicts for the given tableformat
- Parameters
lod (list) – the list of dicts to handle
tablefmt (str) – the table format (according to tabulate) to apply
- prefixToLink(lod: list, prefix: str, tablefmt: str)[source]¶
convert url prefixes to link according to the given table format TODO - refactor as preFormat callback
- Parameters
lod (list) – the list of dicts to convert
prefix (str) – the prefix to strip
tablefmt (str) – the tabulate tableformat to use
- class lodstorage.query.QueryManager(lang: Optional[str] = None, debug=False, queriesPath=None)[source]¶
Bases:
object
manages pre packaged Queries
Constructor :param lang: the language to use for the queries sql or sparql :type lang: string :param debug: True if debug information should be shown :type debug: boolean
- class lodstorage.query.QueryResultDocumentation(query, title: str, tablefmt: str, tryItMarkup: str, sourceCodeHeader: str, sourceCode: str, resultHeader: str, result: str)[source]¶
Bases:
object
documentation of a query result
constructor
- Parameters
query (Query) – the query to be documented
title (str) – the title markup
tablefmt (str) – the tableformat that has been used
tryItMarkup – the “try it!” markup to show
sourceCodeHeader (str) – the header title to use for the sourceCode
sourceCode (str) – the sourceCode
resultCodeHeader (str) – the header title to use for the result
result (str) – the result header
- asText()[source]¶
return my text representation
- Returns
description, sourceCodeHeader, sourceCode, tryIt link and result table
- Return type
str
- static uniCode2Latex(text: str, withConvert: bool = False) str [source]¶
converts unicode text to latex and fixes UTF-8 chars for latex in a certain range:
₀:$_0$ … ₉:$_9$
see https://github.com/phfaist/pylatexenc/issues/72
- Parameters
text (str) – the string to fix
withConvert (bool) – if unicode to latex libary conversion should be used
- Returns
latex presentation of UTF-8 char
- Return type
str
- class lodstorage.query.QuerySyntaxHighlight(query, highlightFormat: str = 'html')[source]¶
Bases:
object
Syntax highlighting for queries with pygments
construct me for the given query and highlightFormat
- Parameters
query (Query) – the query to do the syntax highlighting for
highlightFormat (str) – the highlight format to be used
- class lodstorage.query.ValueFormatter(name: str, formatString: str, regexps: Optional[list] = None)[source]¶
Bases:
object
a value Formatter
constructor
- Parameters
fstring (str) – the format String to use
regexps (list) – the regular expressions to apply
- applyFormat(record, key, resultFormat: Format)[source]¶
apply the given format to the given record
- Parameters
record (dict) – the record to handle
key (str) – the property key
resultFormat (str) – the resultFormat Style to apply
- formatsPath = '/Users/wf/Documents/pyworkspace/pyLoDStorage/lodstorage/../sampledata/formats.yaml'¶
- classmethod getFormats(formatsPath: Optional[str] = None) dict [source]¶
get the available ValueFormatters
- Parameters
formatsPath (str) – the path to the yaml file to read the format specs from
- Returns
a map for ValueFormatters by formatter Name
- Return type
dict
- home = '/Users/wf'¶
- valueFormats = None¶
lodstorage.sample module¶
Created on 2020-08-24
@author: wf
- class lodstorage.sample.Cities(load=False)[source]¶
Bases:
JSONAbleList
Constructor
- Parameters
listName (str) – the name of the list attribute to be used for storing the List
clazz (class) – a class to be used for Object relational mapping (if any)
tableName (str) – the name of the “table” to be used
initList (bool) – True if the list should be initialized
handleInvalidListTypes (bool) – True if invalidListTypes should be converted or filtered
filterInvalidListTypes (bool) – True if invalidListTypes should be deleted
- class lodstorage.sample.Royals(load=False)[source]¶
Bases:
JSONAbleList
a non ORM Royals list
Constructor
- Parameters
listName (str) – the name of the list attribute to be used for storing the List
clazz (class) – a class to be used for Object relational mapping (if any)
tableName (str) – the name of the “table” to be used
initList (bool) – True if the list should be initialized
handleInvalidListTypes (bool) – True if invalidListTypes should be converted or filtered
filterInvalidListTypes (bool) – True if invalidListTypes should be deleted
- class lodstorage.sample.RoyalsORMList(load=False)[source]¶
Bases:
JSONAbleList
Constructor
- Parameters
listName (str) – the name of the list attribute to be used for storing the List
clazz (class) – a class to be used for Object relational mapping (if any)
tableName (str) – the name of the “table” to be used
initList (bool) – True if the list should be initialized
handleInvalidListTypes (bool) – True if invalidListTypes should be converted or filtered
filterInvalidListTypes (bool) – True if invalidListTypes should be deleted
lodstorage.schema module¶
Created on 2021-01-26
@author: wf
- class lodstorage.schema.Schema(name: str, title: str)[source]¶
Bases:
object
a relational Schema
Constructor
- Parameters
name (str) – the name of the schema
title (str) – the title of the schema
- static generalizeColumn(tableList, colName: str)[source]¶
remove the column with the given name from all tables in the tablelist and return it
- Parameters
tableList (list) – a list of Tables
colName (string) – the name of the column to generalize
- Returns
the column having been generalized and removed
- Return type
string
- static getGeneral(tableList, name: str, debug: bool = False)[source]¶
derive a general table from the given table list :param tableList: a list of tables :type tableList: list :param name: name of the general table :type name: str :param debug: True if column names should be shown :type debug: bool
- Returns
at table dict for the generalized table
lodstorage.sparql module¶
Created on 2020-08-14
@author: wf
- class lodstorage.sparql.SPARQL(url, mode='query', debug=False, typedLiterals=False, profile=False, agent='PyLodStorage', method='POST')[source]¶
Bases:
object
wrapper for SPARQL e.g. Apache Jena, Virtuoso, Blazegraph
- Variables
url – full endpoint url (including mode)
mode – ‘query’ or ‘update’
debug – True if debugging is active
typedLiterals – True if INSERT should be done with typedLiterals
profile(boolean) – True if profiling / timing information should be displayed
sparql – the SPARQLWrapper2 instance to be used
method(str) – the HTTP method to be used ‘POST’ or ‘GET’
Constructor a SPARQL wrapper
- Parameters
url (string) – the base URL of the endpoint - the mode query/update is going to be appended
mode (string) – ‘query’ or ‘update’
debug (bool) – True if debugging is to be activated
typedLiterals (bool) – True if INSERT should be done with typedLiterals
profile (boolean) – True if profiling / timing information should be displayed
agent (string) – the User agent to use
method (string) – the HTTP method to be used ‘POST’ or ‘GET’
- asListOfDicts(records, fixNone: bool = False, sampleCount: Optional[int] = None)[source]¶
convert SPARQL result back to python native
- Parameters
record (list) – the list of bindings
fixNone (bool) – if True add None values for empty columns in Dict
sampleCount (int) – the number of samples to check
- Returns
a list of Dicts
- Return type
list
- controlChars = ['\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07', '\x08', '\t', '\n', '\x0b', '\x0c', '\r', '\x0e', '\x0f', '\x10', '\x11', '\x12', '\x13', '\x14', '\x15', '\x16', '\x17', '\x18', '\x19', '\x1a', '\x1b', '\x1c', '\x1d', '\x1e', '\x1f']¶
- getFirst(qLod: list, attr: str)[source]¶
get the column attr of the first row of the given qLod list
- Parameters
qLod (list) – the list of dicts (returned by a query)
attr (str) – the attribute to retrieve
- Returns
the value
- Return type
object
- getLocalName(name)[source]¶
retrieve valid localname from a string based primary key https://www.w3.org/TR/sparql11-query/#prefNames
- Parameters
name (string) – the name to convert
- Returns
a valid local name
- Return type
string
- getResults(jsonResult)[source]¶
get the result from the given jsonResult
- Parameters
jsonResult – the JSON encoded result
- Returns
the list of bindings
- Return type
list
- getValue(sparqlQuery: str, attr: str)[source]¶
get the value for the given SPARQL query using the given attr
- Parameters
sparql (SPARQL) – the SPARQL endpoint to ge the value for
sparqlQuery (str) – the SPARQL query to run
attr (str) – the attribute to get
- getValues(sparqlQuery: str, attrList: list)[source]¶
get Values for the given sparlQuery and attribute list
- insert(insertCommand)[source]¶
run an insert
- Parameters
insertCommand (string) – the SPARQL INSERT command
- Returns
a response
- insertListOfDicts(listOfDicts, entityType, primaryKey, prefixes, limit=None, batchSize=None, profile=False)[source]¶
insert the given list of dicts mapping datatypes
- Parameters
entityType (string) – the entityType to use as a
primaryKey (string) – the name of the primary key attribute to use
prefix (string) – any PREFIX statements to be used
limit (int) – maximum number of records to insert
batchSize (int) – number of records to send per request
- Returns
a list of errors which should be empty on full success
datatype maping according to https://www.w3.org/TR/xmlschema-2/#built-in-datatypes
mapped from https://docs.python.org/3/library/stdtypes.html
compare to https://www.w3.org/2001/sw/rdb2rdf/directGraph/ http://www.bobdc.com/blog/json2rdf/ https://www.w3.org/TR/json-ld11-api/#data-round-tripping https://stackoverflow.com/questions/29030231/json-to-rdf-xml-file-in-python
- insertListOfDictsBatch(listOfDicts, entityType, primaryKey, prefixes, title='batch', batchIndex=None, total=None, startTime=None)[source]¶
insert a Batch part of listOfDicts
- Parameters
entityType (string) – the entityType to use as a
primaryKey (string) – the name of the primary key attribute to use
prefix (string) – any PREFIX statements to be used
title (string) – the title to display for the profiling (if any)
batchIndex (int) – the start index of the current batch
total (int) – the total number of records for all batches
starttime (datetime) – the start of the batch processing
- Returns
a list of errors which should be empty on full success
- printErrors(errors)[source]¶
print the given list of errors
- Parameters
errors (list) – a list of error strings
- Returns
True if the list is empty else false
- Return type
boolean
- query(queryString, method='POST')[source]¶
get a list of results for the given query
- Parameters
queryString (string) – the SPARQL query to execute
method (string) – the method eg. POST to use
- Returns
list of bindings
- Return type
list
- queryAsListOfDicts(queryString, fixNone: bool = False, sampleCount: Optional[int] = None)[source]¶
get a list of dicts for the given query (to allow round-trip results for insertListOfDicts)
- Parameters
queryString (string) – the SPARQL query to execute
fixNone (bool) – if True add None values for empty columns in Dict
sampleCount (int) – the number of samples to check
- Returns
a list ofDicts
- Return type
list
lodstorage.sql module¶
Created on 2020-08-24
@author: wf
- class lodstorage.sql.EntityInfo(sampleRecords, name, primaryKey=None, debug=False)[source]¶
Bases:
object
holds entity meta Info
- Variables
name(string) – entity name = table name
primaryKey(string) – the name of the primary key column
typeMap(dict) – maps column names to python types
debug(boolean) – True if debug information should be shown
construct me from the given name and primary key
- Parameters
name (string) – the name of the entity
primaryKey (string) – the name of the primary key column
debug (boolean) – True if debug information should be shown
- addType(column, valueType, sqlType)[source]¶
add the python type for the given column to the typeMap
- Parameters
column (string) – the name of the column
valueType (type) – the python type of the column
- fixDates(resultList)[source]¶
fix date entries in the given resultList by parsing the date content e.g. converting ‘1926-04-21’ back to datetime.date(1926, 4, 21)
- Parameters
resultList (list) – the list of records to be fixed
- getCreateTableCmd(sampleRecords)[source]¶
get the CREATE TABLE DDL command for the given sample records
- Parameters
sampleRecords (list) – a list of Dicts of sample Records
- Returns
CREATE TABLE DDL command for this entity info
- Return type
string
Example:
CREATE TABLE Person(name TEXT PRIMARY KEY,born DATE,numberInLine INTEGER,wikidataurl TEXT,age FLOAT,ofAge BOOLEAN)
- class lodstorage.sql.SQLDB(dbname: str = ':memory:', connection=None, check_same_thread=True, timeout=5, debug=False, errorDebug=False)[source]¶
Bases:
object
Structured Query Language Database wrapper
- Variables
dbname(string) – name of the database
debug(boolean) – True if debug info should be provided
errorDebug(boolean) – True if debug info should be provided on errors (should not be used for production since it might reveal data)
Construct me for the given dbname and debug
- Parameters
dbname (string) – name of the database - default is a RAM based database
connection (Connection) – an optional connection to be reused
check_same_thread (boolean) – True if object handling needs to be on the same thread see https://stackoverflow.com/a/48234567/1497139
timeout (float) – number of seconds for connection timeout
debug (boolean) – if True switch on debug
errorDebug (boolean) – True if debug info should be provided on errors (should not be used for production since it might reveal data)
- RAM = ':memory:'¶
- backup(backupDB, action='Backup', profile=False, showProgress: int = 200, doClose=True)[source]¶
create backup of this SQLDB to the given backup db
see https://stackoverflow.com/a/59042442/1497139
- Parameters
backupDB (string) – the path to the backupdb or SQLDB.RAM for in memory
action (string) – the action to display
profile (boolean) – True if timing information shall be shown
showProgress (int) – show progress at each showProgress page (0=show no progress)
- copyTo(copyDB, profile=True)[source]¶
copy my content to another database
- Parameters
copyDB (Connection) – the target database
profile (boolean) – if True show profile information
- createTable(listOfRecords, entityName: str, primaryKey: Optional[str] = None, withCreate: bool = True, withDrop: bool = False, sampleRecordCount=1, failIfTooFew=True)[source]¶
derive Data Definition Language CREATE TABLE command from list of Records by examining first recorda as defining sample record and execute DDL command
auto detect column types see e.g. https://stackoverflow.com/a/57072280/1497139
- Parameters
listOfRecords (list) – a list of Dicts
entityName (string) – the entity / table name to use
primaryKey (string) – the key/column to use as a primary key
withDrop (boolean) – true if the existing Table should be dropped
withCreate (boolean) – true if the create Table command should be executed - false if only the entityInfo should be returned
sampleRecords (int) – number of sampleRecords expected and to be inspected
failIfTooFew (boolean) – raise an Exception if to few sampleRecords else warn only
- Returns
meta data information for the created table
- Return type
- execute(ddlCmd)[source]¶
execute the given Data Definition Command
- Parameters
ddlCmd (string) – e.g. a CREATE TABLE or CREATE View command
- executeDump(connection, dump, title, maxErrors=100, errorDisplayLimit=12, profile=True)[source]¶
execute the given dump for the given connection
- Parameters
connection (Connection) – the sqlite3 connection to use
dump (string) – the SQL commands for the dump
title (string) – the title of the dump
maxErrors (int) – maximum number of errors to be tolerated before stopping and doing a rollback
profile (boolean) – True if profiling information should be shown
- Returns
a list of errors
- getDebugInfo(record, index, executeMany)[source]¶
get the debug info for the given record at the given index depending on the state of executeMany
- Parameters
record (dict) – the record to show
index (int) – the index of the record
executeMany (boolean) – if True the record may be valid else not
- getTableDict(tableType='table')[source]¶
get the schema information from this database as a dict
- Parameters
tableType (str) – table or view
- Returns
Lookup map of tables with columns also being converted to dict
- Return type
dict
- getTableList(tableType='table')[source]¶
get the schema information from this database
- Parameters
tableType (str) – table or view
- Returns
a list as derived from PRAGMA table_info
- Return type
list
- logError(msg)[source]¶
log the given error message to stderr
- Parameters
msg (str) – the error messsage to display
- query(sqlQuery, params=None)[source]¶
run the given sqlQuery and return a list of Dicts
- Parameters
sqlQuery (string) – the SQL query to be executed
params (tuple) – the query params, if any
- Returns
a list of Dicts
- Return type
list
- queryAll(entityInfo, fixDates=True)[source]¶
query all records for the given entityName/tableName
- Parameters
entityName (string) – name of the entity/table to qury
fixDates (boolean) – True if date entries should be returned as such and not as strings
- queryGen(sqlQuery, params=None)[source]¶
run the given sqlQuery a a generator for dicts
- Parameters
sqlQuery (string) – the SQL query to be executed
params (tuple) – the query params, if any
- Returns
a generator of dicts
- static restore(backupDB, restoreDB, profile=False, showProgress=200, debug=False)[source]¶
restore the restoreDB from the given backup DB
- Parameters
backupDB (string) – path to the backupDB e.g. backup.db
restoreDB (string) – path to the restoreDB or in Memory SQLDB.RAM
profile (boolean) – True if timing information should be shown
showProgress (int) – show progress at each showProgress page (0=show no progress)
- showDump(dump, limit=10)[source]¶
show the given dump up to the given limit
- Parameters
dump (string) – the SQL dump to show
limit (int) – the maximum number of lines to display
- store(listOfRecords, entityInfo, executeMany=False, fixNone=False)[source]¶
store the given list of records based on the given entityInfo
- Parameters
listOfRecords (list) – the list of Dicts to be stored
entityInfo (EntityInfo) – the meta data to be used for storing
executeMany (bool) – if True the insert command is done with many/all records at once
fixNone (bool) – if True make sure empty columns in the listOfDict are filled with “None” values
lodstorage.storageconfig module¶
Created on 2020-08-29
@author: wf
- class lodstorage.storageconfig.StorageConfig(mode=StoreMode.SQL, cacheRootDir: Optional[str] = None, cacheDirName: str = 'lodstorage', cacheFile=None, withShowProgress=True, profile=True, debug=False, errorDebug=True)[source]¶
Bases:
object
a storage configuration
Constructor
- Parameters
mode (StoreMode) – the storage mode e.g. sql
cacheRootDir (str) – the cache root directory to use - if None the home directory will be used
cacheFile (string) – the common cacheFile to use (if any)
withShowProgress (boolean) – True if progress should be shown
profile (boolean) – True if timing / profiling information should be shown
debug (boolean) – True if debugging information should be shown
errorDebug (boolean) – True if debug info should be provided on errors (should not be used for production since it might reveal data)
lodstorage.uml module¶
Created on 2020-09-04
@author: wf
- class lodstorage.uml.UML(debug=False)[source]¶
Bases:
object
UML diagrams via plantuml
Constructor :param debug: True if debug information should be shown :type debug: boolean
- mergeSchema(schemaManager, tableList, title=None, packageName=None, generalizeTo=None, withSkin=True)[source]¶
merge Schema and tableList to PlantUml notation
- Parameters
schemaManager (SchemaManager) – a schema manager to be used
tableList (list) – the tableList list of Dicts from getTableList() to convert
title (string) – optional title to be added
packageName (string) – optional packageName to be added
generalizeTo (string) – optional name of a general table to be derived
withSkin (boolean) – if True add default BITPlan skin parameters
- Returns
the Plantuml notation for the entities in columns of the given tablelist
- Return type
string
- skinparams = "\n' BITPlan Corporate identity skin params\n' Copyright (c) 2015-2020 BITPlan GmbH\n' see http://wiki.bitplan.com/PlantUmlSkinParams#BITPlanCI\n' skinparams generated by com.bitplan.restmodelmanager\nskinparam note {\n BackGroundColor #FFFFFF\n FontSize 12\n ArrowColor #FF8000\n BorderColor #FF8000\n FontColor black\n FontName Technical\n}\nskinparam component {\n BackGroundColor #FFFFFF\n FontSize 12\n ArrowColor #FF8000\n BorderColor #FF8000\n FontColor black\n FontName Technical\n}\nskinparam package {\n BackGroundColor #FFFFFF\n FontSize 12\n ArrowColor #FF8000\n BorderColor #FF8000\n FontColor black\n FontName Technical\n}\nskinparam usecase {\n BackGroundColor #FFFFFF\n FontSize 12\n ArrowColor #FF8000\n BorderColor #FF8000\n FontColor black\n FontName Technical\n}\nskinparam activity {\n BackGroundColor #FFFFFF\n FontSize 12\n ArrowColor #FF8000\n BorderColor #FF8000\n FontColor black\n FontName Technical\n}\nskinparam classAttribute {\n BackGroundColor #FFFFFF\n FontSize 12\n ArrowColor #FF8000\n BorderColor #FF8000\n FontColor black\n FontName Technical\n}\nskinparam interface {\n BackGroundColor #FFFFFF\n FontSize 12\n ArrowColor #FF8000\n BorderColor #FF8000\n FontColor black\n FontName Technical\n}\nskinparam class {\n BackGroundColor #FFFFFF\n FontSize 12\n ArrowColor #FF8000\n BorderColor #FF8000\n FontColor black\n FontName Technical\n}\nskinparam object {\n BackGroundColor #FFFFFF\n FontSize 12\n ArrowColor #FF8000\n BorderColor #FF8000\n FontColor black\n FontName Technical\n}\nhide Circle\n' end of skinparams '\n"¶
- tableListToPlantUml(tableList, title=None, packageName=None, generalizeTo=None, withSkin=True)[source]¶
convert tableList to PlantUml notation
- Parameters
tableList (list) – the tableList list of Dicts from getTableList() to convert
title (string) – optional title to be added
packageName (string) – optional packageName to be added
generalizeTo (string) – optional name of a general table to be derived
withSkin (boolean) – if True add default BITPlan skin parameters
- Returns
the Plantuml notation for the entities in columns of the given tablelist
- Return type
string