aleph Package

Contents

aleph Package

aleph Package

  • Query workflow ————————————————————–

To query Aleph, just create one of the Queries - ISBNQuery for example and put it into SearchRequest wrapper. Then encode it by calling toAMQPMessage() and send the message to the Aleph’s exchange.

— isbnq = ISBNQuery(“80-251-0225-4”) request = SearchRequest(isbnq)

amqp.send(
message = serialize(request), properties = ”..”, exchange = “ALEPH’S_EXCHANGE”

)

and you will get back AMQP message, and after decoding with fromAMQPMessage() also SearchResult.

If you want to just get count of how many items is there in Aleph, just wrap the ISBNQuery with CountRequest (you should use this instead of just calling len() to SearchResult.records - it doesn’t put that much load to Aleph):

— isbnq = ISBNQuery(“80-251-0225-4”) request = CountRequest(isbnq)

# rest is same.. —

and you will get back (after decoding) CountResult.

Here is ASCII flow diagram for you:

ISBNQuery —-. ,–> CountResult AuthorQuery —-| | `- num_of_records PublisherQuery —-| | GenericQuery —-| ISBNValidationRequest |–> SearchResult

| | `- AlephRecord

V | |

Count/SearchRequest | |–> ISBNValidationResult
| | `- ISBN

V | |

serialize()<———-‘ deserialize()
^

V Client |

AMQPMessage ——> AMQP ——-> AMQPMessage
^

V | | ^ V | | ^ V |

AMQPMessage <—— AMQP <——– AMQPMessage
Service ^

V |

reactToAMQPMessage() ............... magic_happens()

Neat, isn’t it?

  • Export workflow ————————————————————-

TODO: implement, then write docstring

class aleph.__init__.AuthorQuery[source]

Bases: aleph.__init__.AuthorQuery, aleph.__init__._QueryTemplate

Query Aleph to get books by Author.

class aleph.__init__.GenericQuery[source]

Bases: aleph.__init__.GenericQuery, aleph.__init__._QueryTemplate

Used for generic queries to aleph.

For details of base/phrase/.. parameters, see aleph.py : searchInAleph().

This is used mainly if you want to search by your own parameters and don’t want to use prepared wrappers (AuthorQuery/ISBNQuery/..).

class aleph.__init__.ISBNQuery[source]

Bases: aleph.__init__.ISBNQuery, aleph.__init__._QueryTemplate

Query Aleph to get books by ISBN.

Note: ISBN is not unique, so you can get back lot of books with same ISBN.
Some books also have two or more ISBNs.
class aleph.__init__.PublisherQuery[source]

Bases: aleph.__init__.PublisherQuery, aleph.__init__._QueryTemplate

Query Aleph to get books by Publisher.

aleph.__init__.deserialize(data)[source]

Deserialize classes from JSON data.

aleph.__init__.iiOfAny(instance, classes)[source]

Returns true, if instance is instance of any (iiOfAny) of the classes.

This function doesn’t use isinstance() check, it just compares the classnames.

This can be generaly dangerous, but it is really useful when you are comparing class serialized in one module and deserialized in another.

This causes, that module paths in class internals are different and isinstance() and type() comparsions thus fails.

Use this function instead, if you wan’t to check what type is your deserialized message.

instance – class instance you want to know the type classes – list of classes, or just the class you want to compare - func

automatically retypes nonlist/nontuple parameters to list
aleph.__init__.reactToAMQPMessage(message, response_callback, UUID)[source]

React to given AMQPMessage. Return data thru given callback function.

message – message encoded in JSON by serialize() response_callback – function taking exactly ONE parameter - message’s body

with response. Function take care of sending the response over AMQP.

Returns result of response_callback() call.

Raise:
ValueError if bad type of message structure is given.
TODO:
React to Export requests.
aleph.__init__.serialize(data)[source]

Serialize class hierarchy into JSON.

aleph Module

Aleph X-Service wrapper.

This module allows you to query Aleph’s X-Services module (Aleph server is defined by ALEPH_URL in settings.py).

There are two levels of abstraction:

  • Lowlevel —————————————————————–

You can use this functions to access Aleph:

searchInAleph(base, phrase, considerSimilar, field) getDocumentIDs(aleph_search_result, [number_of_docs]) downloadMARCXML(doc_id, library) downloadMARCOAI(doc_id, base)

Workflow:

Aleph works in strange way, that he won’t allow you to access desired information directly.

You have to create search request by calling searchInAleph() first, which will return dictionary with few imporant informations about session.

This dictionary can be later used as parameter to function getDocumentIDs(), which will give you list of DocumentID named tuples.

Named tuples are used, because to access your document, you won’t need just document ID number, but also library ID string.

Depending on your system, there may be just only one accessible library, or mutiple ones, and then you will be glad, that you get both of this informations together.

DocumentID can be used as parameter to downloadMARCXML().

Lets look at some code:

— ids = getDocumentIDs(searchInAleph(“nkc”, “test”, False, “wrd”)) for id_num, library in ids:

XML = downloadMARCXML(id_num, library)

# processDocument(XML)

  • Highlevel —————————————————————-

So far, there are only getter wrappers:

getISBNsIDs() getAuthorsBooksIDs() getPublishersBooksIDs()

And counting functions (they are one request to aleph faster than just counting results from getters):

getISBNCount() getAuthorsBooksCount() getPublishersBooksCount()
  • Other noteworthy properties ———————————————-

Properties VALID_ALEPH_BASES and VALID_ALEPH_FIELDS can be specific only to our library, but sadly, I dont really know.

List of valid bases can be obtained by calling _getListOfBases(), which returns list of strings.

There is also defined exception tree - see AlephException docstring for details.

TODO:
multiple bases in one request? disable valid fields/bases checking? _getListOfFields()
exception aleph.aleph.AlephException(message)[source]

Bases: exceptions.Exception

Exception tree:

  • AlephException |- InvalidAlephBaseException |- InvalidAlephFieldException |- LibraryNotFoundException `- DocumentNotFoundException
class aleph.aleph.DocumentID

Bases: tuple

DocumentID(id, library, base)

__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

__getstate__()

Exclude the OrderedDict from pickling

__repr__()

Return a nicely formatted representation string

base

Alias for field number 2

id

Alias for field number 0

library

Alias for field number 1

exception aleph.aleph.DocumentNotFoundException(message)[source]

Bases: aleph.aleph.AlephException

exception aleph.aleph.InvalidAlephBaseException(message)[source]

Bases: aleph.aleph.AlephException

exception aleph.aleph.InvalidAlephFieldException(message)[source]

Bases: aleph.aleph.AlephException

exception aleph.aleph.LibraryNotFoundException(message)[source]

Bases: aleph.aleph.AlephException

aleph.aleph.downloadMARCOAI(doc_id, base)[source]

Download MARC OAI document with given doc_id from given (logical) base.

Funny part is, that some documents can be obtained only with this function in their full text.

doc_id – document id (you will get this from getDocumentIDs()) base – base from which you want to download Aleph document - this seems to

be duplicite with searchInAleph()’ parameters, but it’s just somethin Aleph’s X-Services want, so ..

Returns: MARC XML unicode string.

Raise:
InvalidAlephBaseException DocumentNotFoundException
aleph.aleph.downloadMARCXML(doc_id, library)[source]

Download MARC XML document with given doc_id from given library.

doc_id – document id (you will get this from getDocumentIDs()) library – NKC01 in our case, but don’t worry, getDocumentIDs() adds

library specification into DocumentID named tuple.

Returns: MARC XML unicode string.

Raise:
LibraryNotFoundException DocumentNotFoundException
aleph.aleph.getAuthorsBooksCount(author, base='nkc')[source]
aleph.aleph.getAuthorsBooksIDs(author, base='nkc')[source]
aleph.aleph.getDocumentIDs(aleph_search_result, number_of_docs=-1)[source]

Return list of DocumentID named tuples to given ‘aleph_search_result’.

aleph_search_result – dict returned from searchInAleph() number_of_docs – how many DocumentIDs from set given by

aleph_search_result should be returned, default -1 for all of them.

Returned DocumentID can be used as parameters to downloadMARCXML().

Raise:
AlephException
aleph.aleph.getISBNCount(isbn, base='nkc')[source]
aleph.aleph.getISBNsIDs(isbn, base='nkc')[source]
aleph.aleph.getPublishersBooksCount(publisher, base='nkc')[source]
aleph.aleph.getPublishersBooksIDs(publisher, base='nkc')[source]
aleph.aleph.searchInAleph(base, phrase, considerSimilar, field)[source]

Send request to the aleph search engine.

Request itself is pretty useless, but it can be later used as parameter for getAlephRecords(), which can fetch records from Aleph.

phrase – what do you want to search base – which database you want to use field – where you want to look considerSimilar – fuzzy search, which is not working at all, so don’t

use it
Returns:aleph_search_record, which is dictionary consisting from those fields
error (optional) – present if there was some form of error no_entries (int) – number of entries that can be fetch from aleph no_records (int) – no idea what is this, but it is always >= than
no_entries

set_number (int) – important - something like ID of your request session-id (str) – used to count users for licensing purposes

example

{
‘session-id’: ‘YLI54HBQJESUTS678YYUNKEU4BNAUJDKA914GMF39J6K89VSCB’, ‘set_number’: 36520, ‘no_records’: 1, ‘no_entries’: 1

}

Raise:
AlephException InvalidAlephBaseException InvalidAlephFieldException
TODO:
  • support multiple phrases in one request

convertors Module

This module exists to provide ability to convert from AMQP data structures to Aleph’s data structures.

It can convert MARCXMLRecord to EPublication simplified data structure.

It can also serialize any namedtuple to JSON.

aleph.convertors.fromJSON(json_data)[source]

Convert JSON string back to python structures.

This is necessary, because standard JSON module can’t serialize namedtuples.

aleph.convertors.toEPublication(marcxml)[source]

Convert MARCXMLRecord object to EPublication named tuple (see __init__.py).

marcxml – MARCXMLRecord instance OR string (with <record> tag)

Returns EPublication named tuple.

aleph.convertors.toJSON(structure)[source]

Convert structure to json.

This is necessary, because standard JSON module can’t serialize namedtuples.

export Module

This module is used to put data to Aleph.

It is based on custom made webform, which is currently used to report new books by publishers.

Source code of this form is not available at this moment (it was created by third party), but it is possible, that it will be in future. This will highly depend on number of people, which will use this project.

Most important function from this module is exportEPublication(epub), which will do everything, that is needed to do, to export EPublication structure to the Aleph.

This whole module is highly dependent on processes, which are defined as import processes at the Czech National Library.

If you want to use export ability in your library, you should rewrite this and take care, that you are sending data somewhere, where someone will process them. Otherwise, you can fill your library’s database with crap.

exception aleph.export.ExportException(message)[source]

Bases: exceptions.Exception

exception aleph.export.ExportRejectedException(message)

Bases: aleph.export.ExportException

exception aleph.export.InvalidISBNException(message)[source]

Bases: aleph.export.ExportException

class aleph.export.PostData(epub)

This class is used to transform data from EPublication to dictionary, which is sent as POST request to Aleph third-party webform.

http://aleph.nkp.cz/F/?func=file&file_name=service-isbn

Class is used, because there is 29 POST parameters with internal dependencies, which need to be processed and validated before they can be passed to webform.

get_POST_data()
aleph.export.exportEPublication(epub)

Send epub EPublication object to Aleph, where it will be processed by librarians.

isbn Module

This module is providing funcionality to validate ISBN checksums and also allows to compute ISBN checksum digits.

aleph.isbn.get_isbn10_checksum(isbn)[source]
aleph.isbn.get_isbn13_checksum(isbn)[source]
aleph.isbn.is_isbn10_valid(isbn)[source]
aleph.isbn.is_isbn13_valid(isbn)[source]
aleph.isbn.is_valid_isbn(isbn)[source]
aleph.isbn.isbn_cleaner(fn)[source]

Decorator for calling other functions from this modules.

marcxml Module

Module for parsing and high-level processing of MARC XML records.

About format and how the class work; Standard MARC record is made from three parts:

leader – binary something, you can probably ignore it controlfileds – marc fields < 10 datafields – important information you actually want

Basic MARC XML scheme uses this structure:

<controlfield tag=”001”>data</controlfield> ... <controlfield tag=”010”>data</controlfield> <datafield tag=”011” ind1=” ” ind2=” “>

<subfield code=”scode”>data</subfield> <subfield code=”a”>data</subfield> <subfield code=”a”>another data, but same code!</subfield> ... <subfield code”scode+”>another data</subfield>

</datafield> ... <datafield tag=”999” ind1=” ” ind2=” “> ... </datafield>

</record> —

<leader> is optional and it is parsed into MARCXMLRecord.leader as string.

<controlfield>s are optional and parsed as dictionary into MARCXMLRecord.controlfields, and dictionary for data from example would look like this:

— MARCXMLRecord.controlfields = {

“001”: “data”, ... “010”: “data”

}

<datafield>s are non-optional and are parsed into MARCXMLRecord.datafields, which is little bit more complicated dictionary. Complicated is mainly because tag parameter is not unique, so there can be more <datafield>s with same tag!

scode is always one character (ASCII lowercase), or number.

— MARCXMLRecord.datafields = {

“011”: [{
“ind1”: ” ”, “ind2”: ” ”, “scode”: [“data”], “scode+”: [“another data”]

}],

# real example “928”: [{

“ind1”: “1”, “ind2”: ” ”, “a”: [“Portál”]

}],

“910”: [
{
“ind1”: “1”, “ind2”: ” ”, “a”: [“ABA001”]

}, {

“ind1”: “2”, “ind2”: ” ”, “a”: [“BOA001”], “b”: [“2-1235.975”]

}, {

“ind1”: “3”, “ind2”: ” ”, “a”: [“OLA001”], “b”: [“1-218.844”]

}

]

}

As you can see in 910 record example, sometimes there are multiple records in a list!

NOTICE, THAT RECORDS ARE STORED IN ARRAY, NO MATTER IF IT IS JUST ONE RECORD, OR MULTIPLE RECORDS. SAME APPLY TO SUBFIELDS.

Example above corresponds with this piece of code from real world:

— <datafield tag=”910” ind1=”1” ind2=” “> <subfield code=”a”>ABA001</subfield> </datafield> <datafield tag=”910” ind1=”2” ind2=” “> <subfield code=”a”>BOA001</subfield> <subfield code=”b”>2-1235.975</subfield> </datafield> <datafield tag=”910” ind1=”3” ind2=” “> <subfield code=”a”>OLA001</subfield> <subfield code=”b”>1-218.844</subfield> </datafield> —

  • OAI ———————————————————————-

To prevent things to be too much simple, there is also another type of MARC XML document - OAI format.

OAI documents are little bit different, but almost same in structure.

leader is optional and is stored in MARCXMLRecord.controlfields[“LDR”], but also in MARCXMLRecord.leader for backward compatibility.

<controlfield> is renamed to <fixfield> and its “tag” parameter to “label”.

<datafield> tag is not named datafield, but <varfield>, “tag” parameter is “id” and ind1/ind2 are named i1/i2, but works the same way.

<subfield>s parameter “code” is renamed to “label”.

Real world example:

— <oai_marc> <fixfield id=”LDR”>—–nam-a22——aa4500</fixfield> <fixfield id=”FMT”>BK</fixfield> <fixfield id=”001”>cpk19990652691</fixfield> <fixfield id=”003”>CZ-PrNK</fixfield> <fixfield id=”005”>20130513104801.0</fixfield> <fixfield id=”007”>tu</fixfield> <fixfield id=”008”>990330m19981999xr-af–d——000-1-cze–</fixfield> <varfield id=”015” i1=” ” i2=” “> <subfield label=”a”>cnb000652691</subfield> </varfield> <varfield id=”020” i1=” ” i2=” “> <subfield label=”a”>80-7174-091-8 (sv. 1 : váz.) :</subfield> <subfield label=”c”>Kč 182,00</subfield> </varfield> ... </oai_marc> —

  • Full documentation ——————————————————-

Description of simplified MARCXML schema can be found at http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd

Full description of MARCXML with definition of each element can be found at http://www.loc.gov/standards/marcxml/mrcbxmlfile.dtd (19492 lines of code)

Description of MARC OAI can be found at http://www.openarchives.org/OAI/oai_marc.xsd

class aleph.marcxml.Corporation[source]

Bases: aleph.marcxml.Corporation

Some informations about corporations (fields 110, 610, 710, 810).

Properties:
.name .place .date
class aleph.marcxml.MARCXMLRecord(xml=None)[source]

Class for serialization/deserialization of MARCXML and MARC OAI documents.

This class parses everything between <root> elements. It checks, if there is root element, so please, give it full XML.

Internal format is described in module docstring. You can access internal data directly, or using few handy methods on two different levels of abstraction:

  • No abstraction at all ————————————————

You can choose to access data directly and for this use, there is few important properties:

.leader (string) .oai_marc (bool) .controlfields (dict) .datafields (dict of arrays of dict of arrays of strings ^-^)

.controlfields is simple and easy to use dictionary, where keys are field identificators (string, 3 chars, all chars digits). Value is always string.

.datafields is little bit complicated and it is dictionary, consisting of arrays of dictionaries, which consist from arrays of strings and two special parameters.

It sounds horrible, but it is not that hard to understand:

— .datafields = {

“011”: [“ind1”: ” ”, “ind2”: ” “] # array of 0 or more dicts “012”: [

{
“a”: [“a) subsection value”], “b”: [“b) subsection value”], “ind1”: ” ”, “ind2”: ” “

}, {

“a”: [
“multiple values in a) subsections are possible!”, “another value in a) subsection”

], “c”: [

“subsection identificator is always one character long”

], “ind1”: ” ”, “ind2”: ” “

}

]

Notice ind1/ind2 keywords, which are reserved indicators and used in few cases thru MARC standard.

Dict structure is not that hard to understand, but kinda long to access, so there is also little bit more high-level abstraction access methods.

  • Lowlevel abstraction ————————————————-

To access data little bit easier, there are defined two methods to access and two methods to add data to internal dictionaries:

.addControlField(name, value) .addDataField(name, i1, i2, subfields_dict)

Names imho selfdescribing. subfields_dict is expected en enforced to be dictionary with one character long keys and list of strings as values.

Getters are also simple to use:

.getControlRecord(controlfield) .getDataRecords(datafield, subfield, throw_exceptions)

.getControlRecord() is basically just wrapper over .controlfields and works same way as accessing .controlfields[controlfield]

.getDataRecords(datafield, subfield, throw_exceptions) return list of MarcSubrecord* objects with informations from section datafield subsection subfield.

If throw_exceptions parameter is set to False, method returns empty list instead of throwing KeyError.

*As I said, function returns list of MarcSubrecord objects. They are almost same thing as normal strings (they are actually subclassed strings), but defines few important methods, which can make your life little bit easier:

.getI1() .getI2() .getOtherSubfiedls()

.getOtherSubfiedls() returns dictionary with other subsections, as subfield requested by calling .getDataRecords().

  • Highlevel abstractions ———————————————–

There is also lot of highlevel getters:

.getName() .getSubname() .getPrice() .getPart() .getPartName() .getPublisher() .getPubDate() .getPubOrder() .getFormat() .getPubPlace() .getAuthors() .getCorporations() .getDistributors() .getISBNs() .getBinding() .getOriginals()
addControlField(name, value)[source]
addDataField(name, i1, i2, subfields_dict)[source]

Add new datafield into self.datafields.

name – name of datafield i1 – value of i1/ind1 parameter i2 – value of i2/ind2 parameter subfields_dict – dictionary containing subfields in this format:

{
“field_id”: [“subfield data”,], ... “z”: [“X0456b”]

}

field_id can be only one characted long!

Function takes care of OAI MARC.

getAuthors()[source]

Return list of authors represented as Person objects.

getBinding()[source]
getControlRecord(controlfield)[source]

Return record from given controlfield. Returned type: str.

getCorporations(roles=['dst'])[source]

Return list of Corporation objects specified by roles parameter.

roles – specify which types of corporations you need. Set to
[“any”] for any role, [“dst”] for distributors, etc.. See http://www.loc.gov/marc/relators/relaterm.html for details.
getDataRecords(datafield, subfield, throw_exceptions=True)[source]

Return content of given subfield in datafield.

datafield – String with section name (for example “001”, “100”,
“700”)

subfield – String with subfield name (for example “a”, “1”, etc..) throw_exceptions – If True, KeyError is raised if method couldnt

found given datafield/subfield. If false, blank array [] is returned.

Returns list of MarcSubrecords. MarcSubrecord is practically same thing as string, but has defined .getI1() and .getI2() properties.

Believe me, you will need this, because MARC XML depends on them from time to time (name of authors for example).

getDistributors()[source]

Return list of distributors. Each distributor is represented as Corporation object.

getFormat(undefined='')[source]
getI(num)[source]

Get current name of i1/ind1 parameter based on self.oai_marc.

This method is used mainly internally, but it can be handy if you work with with raw MARC XML object and not using getters.

getISBNs()[source]

Return list of ISBN strings.

getName()[source]
getOriginals()[source]

Return list of original names.

getPart(undefined='')[source]
getPartName(undefined='')[source]
getPrice(undefined='')[source]
getPubDate(undefined='')[source]
getPubOrder(undefined='')[source]
getPubPlace(undefined='')[source]
getPublisher(undefined='')[source]
getSubname(undefined='')[source]
toXML()[source]

Convert object back to XML string.

Returned string should be same as parsed, if everything works as expected.

class aleph.marcxml.MarcSubrecord(arg, ind1, ind2, other_subfields)[source]

Bases: str

This class is used to stored data returned from .getDataRecords() method from MARCXMLRecord.

It looks kinda like overshot, but when you are parsing the MARC XML, values from subrecords, you need to know the context in which the subrecord is put.

Specifically the i1/i2 values, but sometimes is usefull to have acces even to the other subfields from this subrecord.

This class provides this acces thru .getI1()/.getI2() and .getOtherSubfiedls() getters. As a bonus, it is also fully convertable to string, in which case only the value of subrecord is preserved.

getI1()[source]
getI2()[source]
getOtherSubfiedls()[source]
class aleph.marcxml.Person[source]

Bases: aleph.marcxml.Person

This class represents informations about persons as they are defined in MARC standards.

Properties:
.name .second_name .surname .title
aleph.marcxml.resorted(values)[source]

Sort values, but put numbers after alphabetically sorted words.

This function is here for outputs, to be diff-compatible with aleph.

settings Module

Contents