aleph Package¶
aleph Package¶
- Query workflow ————————————————————–
To query Aleph, just create one of the Queries - ISBNQuery for example and put it into SearchRequest wrapper. Then encode it by calling toAMQPMessage() and send the message to the Aleph’s exchange.
— isbnq = ISBNQuery(“80-251-0225-4”) request = SearchRequest(isbnq)
- amqp.send(
- message = serialize(request), properties = ”..”, exchange = “ALEPH’S_EXCHANGE”
)¶
and you will get back AMQP message, and after decoding with fromAMQPMessage() also SearchResult.
If you want to just get count of how many items is there in Aleph, just wrap the ISBNQuery with CountRequest (you should use this instead of just calling len() to SearchResult.records - it doesn’t put that much load to Aleph):
— isbnq = ISBNQuery(“80-251-0225-4”) request = CountRequest(isbnq)
# rest is same.. —
and you will get back (after decoding) CountResult.
Here is ASCII flow diagram for you:
ISBNQuery —-. ,–> CountResult AuthorQuery —-| | `- num_of_records PublisherQuery —-| | GenericQuery —-| ISBNValidationRequest |–> SearchResult
Neat, isn’t it?
- Export workflow ————————————————————-
TODO: implement, then write docstring
- class aleph.__init__.AuthorQuery[source]¶
Bases: aleph.__init__.AuthorQuery, aleph.__init__._QueryTemplate
Query Aleph to get books by Author.
- class aleph.__init__.GenericQuery[source]¶
Bases: aleph.__init__.GenericQuery, aleph.__init__._QueryTemplate
Used for generic queries to aleph.
For details of base/phrase/.. parameters, see aleph.py : searchInAleph().
This is used mainly if you want to search by your own parameters and don’t want to use prepared wrappers (AuthorQuery/ISBNQuery/..).
- class aleph.__init__.ISBNQuery[source]¶
Bases: aleph.__init__.ISBNQuery, aleph.__init__._QueryTemplate
Query Aleph to get books by ISBN.
- Note: ISBN is not unique, so you can get back lot of books with same ISBN.
- Some books also have two or more ISBNs.
- class aleph.__init__.PublisherQuery[source]¶
Bases: aleph.__init__.PublisherQuery, aleph.__init__._QueryTemplate
Query Aleph to get books by Publisher.
- aleph.__init__.iiOfAny(instance, classes)[source]¶
Returns true, if instance is instance of any (iiOfAny) of the classes.
This function doesn’t use isinstance() check, it just compares the classnames.
This can be generaly dangerous, but it is really useful when you are comparing class serialized in one module and deserialized in another.
This causes, that module paths in class internals are different and isinstance() and type() comparsions thus fails.
Use this function instead, if you wan’t to check what type is your deserialized message.
instance – class instance you want to know the type classes – list of classes, or just the class you want to compare - func
automatically retypes nonlist/nontuple parameters to list
- aleph.__init__.reactToAMQPMessage(message, response_callback, UUID)[source]¶
React to given AMQPMessage. Return data thru given callback function.
message – message encoded in JSON by serialize() response_callback – function taking exactly ONE parameter - message’s body
with response. Function take care of sending the response over AMQP.Returns result of response_callback() call.
- Raise:
- ValueError if bad type of message structure is given.
- TODO:
- React to Export requests.
aleph Module¶
Aleph X-Service wrapper.
This module allows you to query Aleph’s X-Services module (Aleph server is defined by ALEPH_URL in settings.py).
There are two levels of abstraction:
- Lowlevel —————————————————————–
You can use this functions to access Aleph:
searchInAleph(base, phrase, considerSimilar, field) getDocumentIDs(aleph_search_result, [number_of_docs]) downloadMARCXML(doc_id, library) downloadMARCOAI(doc_id, base)
Workflow:
Aleph works in strange way, that he won’t allow you to access desired information directly.
You have to create search request by calling searchInAleph() first, which will return dictionary with few imporant informations about session.
This dictionary can be later used as parameter to function getDocumentIDs(), which will give you list of DocumentID named tuples.
Named tuples are used, because to access your document, you won’t need just document ID number, but also library ID string.
Depending on your system, there may be just only one accessible library, or mutiple ones, and then you will be glad, that you get both of this informations together.
DocumentID can be used as parameter to downloadMARCXML().
Lets look at some code:
— ids = getDocumentIDs(searchInAleph(“nkc”, “test”, False, “wrd”)) for id_num, library in ids:
XML = downloadMARCXML(id_num, library)
# processDocument(XML)
—
- Highlevel —————————————————————-
So far, there are only getter wrappers:
getISBNsIDs() getAuthorsBooksIDs() getPublishersBooksIDs()
And counting functions (they are one request to aleph faster than just counting results from getters):
getISBNCount() getAuthorsBooksCount() getPublishersBooksCount()
- Other noteworthy properties ———————————————-
Properties VALID_ALEPH_BASES and VALID_ALEPH_FIELDS can be specific only to our library, but sadly, I dont really know.
List of valid bases can be obtained by calling _getListOfBases(), which returns list of strings.
There is also defined exception tree - see AlephException docstring for details.
- TODO:
- multiple bases in one request? disable valid fields/bases checking? _getListOfFields()
- class aleph.aleph.DocumentID¶
Bases: tuple
DocumentID(id, library, base)
- __getnewargs__()¶
Return self as a plain tuple. Used by copy and pickle.
- __getstate__()¶
Exclude the OrderedDict from pickling
- __repr__()¶
Return a nicely formatted representation string
- base¶
Alias for field number 2
- id¶
Alias for field number 0
- library¶
Alias for field number 1
- exception aleph.aleph.DocumentNotFoundException(message)[source]¶
Bases: aleph.aleph.AlephException
- exception aleph.aleph.InvalidAlephBaseException(message)[source]¶
Bases: aleph.aleph.AlephException
- exception aleph.aleph.InvalidAlephFieldException(message)[source]¶
Bases: aleph.aleph.AlephException
- exception aleph.aleph.LibraryNotFoundException(message)[source]¶
Bases: aleph.aleph.AlephException
- aleph.aleph.downloadMARCOAI(doc_id, base)[source]¶
Download MARC OAI document with given doc_id from given (logical) base.
Funny part is, that some documents can be obtained only with this function in their full text.
doc_id – document id (you will get this from getDocumentIDs()) base – base from which you want to download Aleph document - this seems to
be duplicite with searchInAleph()’ parameters, but it’s just somethin Aleph’s X-Services want, so ..Returns: MARC XML unicode string.
- Raise:
- InvalidAlephBaseException DocumentNotFoundException
- aleph.aleph.downloadMARCXML(doc_id, library)[source]¶
Download MARC XML document with given doc_id from given library.
doc_id – document id (you will get this from getDocumentIDs()) library – NKC01 in our case, but don’t worry, getDocumentIDs() adds
library specification into DocumentID named tuple.Returns: MARC XML unicode string.
- Raise:
- LibraryNotFoundException DocumentNotFoundException
- aleph.aleph.getDocumentIDs(aleph_search_result, number_of_docs=-1)[source]¶
Return list of DocumentID named tuples to given ‘aleph_search_result’.
aleph_search_result – dict returned from searchInAleph() number_of_docs – how many DocumentIDs from set given by
aleph_search_result should be returned, default -1 for all of them.Returned DocumentID can be used as parameters to downloadMARCXML().
- Raise:
- AlephException
- aleph.aleph.searchInAleph(base, phrase, considerSimilar, field)[source]¶
Send request to the aleph search engine.
Request itself is pretty useless, but it can be later used as parameter for getAlephRecords(), which can fetch records from Aleph.
phrase – what do you want to search base – which database you want to use field – where you want to look considerSimilar – fuzzy search, which is not working at all, so don’t
use itReturns: aleph_search_record, which is dictionary consisting from those fields – error (optional) – present if there was some form of error no_entries (int) – number of entries that can be fetch from aleph no_records (int) – no idea what is this, but it is always >= thanno_entriesset_number (int) – important - something like ID of your request session-id (str) – used to count users for licensing purposes
example
- {
- ‘session-id’: ‘YLI54HBQJESUTS678YYUNKEU4BNAUJDKA914GMF39J6K89VSCB’, ‘set_number’: 36520, ‘no_records’: 1, ‘no_entries’: 1
}
- Raise:
- AlephException InvalidAlephBaseException InvalidAlephFieldException
- TODO:
- support multiple phrases in one request
convertors Module¶
This module exists to provide ability to convert from AMQP data structures to Aleph’s data structures.
It can convert MARCXMLRecord to EPublication simplified data structure.
It can also serialize any namedtuple to JSON.
- aleph.convertors.fromJSON(json_data)[source]¶
Convert JSON string back to python structures.
This is necessary, because standard JSON module can’t serialize namedtuples.
export Module¶
This module is used to put data to Aleph.
It is based on custom made webform, which is currently used to report new books by publishers.
Source code of this form is not available at this moment (it was created by third party), but it is possible, that it will be in future. This will highly depend on number of people, which will use this project.
Most important function from this module is exportEPublication(epub), which will do everything, that is needed to do, to export EPublication structure to the Aleph.
This whole module is highly dependent on processes, which are defined as import processes at the Czech National Library.
If you want to use export ability in your library, you should rewrite this and take care, that you are sending data somewhere, where someone will process them. Otherwise, you can fill your library’s database with crap.
- exception aleph.export.ExportRejectedException(message)¶
Bases: aleph.export.ExportException
- exception aleph.export.InvalidISBNException(message)[source]¶
Bases: aleph.export.ExportException
- class aleph.export.PostData(epub)¶
This class is used to transform data from EPublication to dictionary, which is sent as POST request to Aleph third-party webform.
http://aleph.nkp.cz/F/?func=file&file_name=service-isbn
Class is used, because there is 29 POST parameters with internal dependencies, which need to be processed and validated before they can be passed to webform.
- get_POST_data()¶
- aleph.export.exportEPublication(epub)¶
Send epub EPublication object to Aleph, where it will be processed by librarians.
isbn Module¶
This module is providing funcionality to validate ISBN checksums and also allows to compute ISBN checksum digits.
marcxml Module¶
Module for parsing and high-level processing of MARC XML records.
About format and how the class work; Standard MARC record is made from three parts:
leader – binary something, you can probably ignore it controlfileds – marc fields < 10 datafields – important information you actually want
Basic MARC XML scheme uses this structure:
<controlfield tag=”001”>data</controlfield> ... <controlfield tag=”010”>data</controlfield> <datafield tag=”011” ind1=” ” ind2=” “>
<subfield code=”scode”>data</subfield> <subfield code=”a”>data</subfield> <subfield code=”a”>another data, but same code!</subfield> ... <subfield code”scode+”>another data</subfield></datafield> ... <datafield tag=”999” ind1=” ” ind2=” “> ... </datafield>
</record> —
<leader> is optional and it is parsed into MARCXMLRecord.leader as string.
<controlfield>s are optional and parsed as dictionary into MARCXMLRecord.controlfields, and dictionary for data from example would look like this:
— MARCXMLRecord.controlfields = {
“001”: “data”, ... “010”: “data”
}¶
<datafield>s are non-optional and are parsed into MARCXMLRecord.datafields, which is little bit more complicated dictionary. Complicated is mainly because tag parameter is not unique, so there can be more <datafield>s with same tag!
scode is always one character (ASCII lowercase), or number.
— MARCXMLRecord.datafields = {
- “011”: [{
- “ind1”: ” ”, “ind2”: ” ”, “scode”: [“data”], “scode+”: [“another data”]
}],
# real example “928”: [{
“ind1”: “1”, “ind2”: ” ”, “a”: [“Portál”]}],
- “910”: [
- {
- “ind1”: “1”, “ind2”: ” ”, “a”: [“ABA001”]
}, {
“ind1”: “2”, “ind2”: ” ”, “a”: [“BOA001”], “b”: [“2-1235.975”]}, {
“ind1”: “3”, “ind2”: ” ”, “a”: [“OLA001”], “b”: [“1-218.844”]}
]
}¶
As you can see in 910 record example, sometimes there are multiple records in a list!
NOTICE, THAT RECORDS ARE STORED IN ARRAY, NO MATTER IF IT IS JUST ONE RECORD, OR MULTIPLE RECORDS. SAME APPLY TO SUBFIELDS.
Example above corresponds with this piece of code from real world:
— <datafield tag=”910” ind1=”1” ind2=” “> <subfield code=”a”>ABA001</subfield> </datafield> <datafield tag=”910” ind1=”2” ind2=” “> <subfield code=”a”>BOA001</subfield> <subfield code=”b”>2-1235.975</subfield> </datafield> <datafield tag=”910” ind1=”3” ind2=” “> <subfield code=”a”>OLA001</subfield> <subfield code=”b”>1-218.844</subfield> </datafield> —
- OAI ———————————————————————-
To prevent things to be too much simple, there is also another type of MARC XML document - OAI format.
OAI documents are little bit different, but almost same in structure.
leader is optional and is stored in MARCXMLRecord.controlfields[“LDR”], but also in MARCXMLRecord.leader for backward compatibility.
<controlfield> is renamed to <fixfield> and its “tag” parameter to “label”.
<datafield> tag is not named datafield, but <varfield>, “tag” parameter is “id” and ind1/ind2 are named i1/i2, but works the same way.
<subfield>s parameter “code” is renamed to “label”.
Real world example:
— <oai_marc> <fixfield id=”LDR”>—–nam-a22——aa4500</fixfield> <fixfield id=”FMT”>BK</fixfield> <fixfield id=”001”>cpk19990652691</fixfield> <fixfield id=”003”>CZ-PrNK</fixfield> <fixfield id=”005”>20130513104801.0</fixfield> <fixfield id=”007”>tu</fixfield> <fixfield id=”008”>990330m19981999xr-af–d——000-1-cze–</fixfield> <varfield id=”015” i1=” ” i2=” “> <subfield label=”a”>cnb000652691</subfield> </varfield> <varfield id=”020” i1=” ” i2=” “> <subfield label=”a”>80-7174-091-8 (sv. 1 : váz.) :</subfield> <subfield label=”c”>Kč 182,00</subfield> </varfield> ... </oai_marc> —
- Full documentation ——————————————————-
Description of simplified MARCXML schema can be found at http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd
Full description of MARCXML with definition of each element can be found at http://www.loc.gov/standards/marcxml/mrcbxmlfile.dtd (19492 lines of code)
Description of MARC OAI can be found at http://www.openarchives.org/OAI/oai_marc.xsd
- class aleph.marcxml.Corporation[source]¶
Bases: aleph.marcxml.Corporation
Some informations about corporations (fields 110, 610, 710, 810).
- Properties:
- .name .place .date
- class aleph.marcxml.MARCXMLRecord(xml=None)[source]¶
Class for serialization/deserialization of MARCXML and MARC OAI documents.
This class parses everything between <root> elements. It checks, if there is root element, so please, give it full XML.
Internal format is described in module docstring. You can access internal data directly, or using few handy methods on two different levels of abstraction:
- No abstraction at all ————————————————
You can choose to access data directly and for this use, there is few important properties:
.leader (string) .oai_marc (bool) .controlfields (dict) .datafields (dict of arrays of dict of arrays of strings ^-^).controlfields is simple and easy to use dictionary, where keys are field identificators (string, 3 chars, all chars digits). Value is always string.
.datafields is little bit complicated and it is dictionary, consisting of arrays of dictionaries, which consist from arrays of strings and two special parameters.
It sounds horrible, but it is not that hard to understand:
— .datafields = {
“011”: [“ind1”: ” ”, “ind2”: ” “] # array of 0 or more dicts “012”: [
- {
- “a”: [“a) subsection value”], “b”: [“b) subsection value”], “ind1”: ” ”, “ind2”: ” “
}, {
- “a”: [
- “multiple values in a) subsections are possible!”, “another value in a) subsection”
], “c”: [
“subsection identificator is always one character long”], “ind1”: ” ”, “ind2”: ” “
}
]
Notice ind1/ind2 keywords, which are reserved indicators and used in few cases thru MARC standard.
Dict structure is not that hard to understand, but kinda long to access, so there is also little bit more high-level abstraction access methods.
- Lowlevel abstraction ————————————————-
To access data little bit easier, there are defined two methods to access and two methods to add data to internal dictionaries:
.addControlField(name, value) .addDataField(name, i1, i2, subfields_dict)Names imho selfdescribing. subfields_dict is expected en enforced to be dictionary with one character long keys and list of strings as values.
Getters are also simple to use:
.getControlRecord(controlfield) .getDataRecords(datafield, subfield, throw_exceptions).getControlRecord() is basically just wrapper over .controlfields and works same way as accessing .controlfields[controlfield]
.getDataRecords(datafield, subfield, throw_exceptions) return list of MarcSubrecord* objects with informations from section datafield subsection subfield.
If throw_exceptions parameter is set to False, method returns empty list instead of throwing KeyError.
*As I said, function returns list of MarcSubrecord objects. They are almost same thing as normal strings (they are actually subclassed strings), but defines few important methods, which can make your life little bit easier:
.getI1() .getI2() .getOtherSubfiedls().getOtherSubfiedls() returns dictionary with other subsections, as subfield requested by calling .getDataRecords().
- Highlevel abstractions ———————————————–
There is also lot of highlevel getters:
.getName() .getSubname() .getPrice() .getPart() .getPartName() .getPublisher() .getPubDate() .getPubOrder() .getFormat() .getPubPlace() .getAuthors() .getCorporations() .getDistributors() .getISBNs() .getBinding() .getOriginals()- addDataField(name, i1, i2, subfields_dict)[source]¶
Add new datafield into self.datafields.
name – name of datafield i1 – value of i1/ind1 parameter i2 – value of i2/ind2 parameter subfields_dict – dictionary containing subfields in this format:
- {
- “field_id”: [“subfield data”,], ... “z”: [“X0456b”]
}
field_id can be only one characted long!
Function takes care of OAI MARC.
- getCorporations(roles=['dst'])[source]¶
Return list of Corporation objects specified by roles parameter.
- roles – specify which types of corporations you need. Set to
- [“any”] for any role, [“dst”] for distributors, etc.. See http://www.loc.gov/marc/relators/relaterm.html for details.
- getDataRecords(datafield, subfield, throw_exceptions=True)[source]¶
Return content of given subfield in datafield.
- datafield – String with section name (for example “001”, “100”,
- “700”)
subfield – String with subfield name (for example “a”, “1”, etc..) throw_exceptions – If True, KeyError is raised if method couldnt
found given datafield/subfield. If false, blank array [] is returned.Returns list of MarcSubrecords. MarcSubrecord is practically same thing as string, but has defined .getI1() and .getI2() properties.
Believe me, you will need this, because MARC XML depends on them from time to time (name of authors for example).
- getDistributors()[source]¶
Return list of distributors. Each distributor is represented as Corporation object.
- class aleph.marcxml.MarcSubrecord(arg, ind1, ind2, other_subfields)[source]¶
Bases: str
This class is used to stored data returned from .getDataRecords() method from MARCXMLRecord.
It looks kinda like overshot, but when you are parsing the MARC XML, values from subrecords, you need to know the context in which the subrecord is put.
Specifically the i1/i2 values, but sometimes is usefull to have acces even to the other subfields from this subrecord.
This class provides this acces thru .getI1()/.getI2() and .getOtherSubfiedls() getters. As a bonus, it is also fully convertable to string, in which case only the value of subrecord is preserved.
- class aleph.marcxml.Person[source]¶
Bases: aleph.marcxml.Person
This class represents informations about persons as they are defined in MARC standards.
- Properties:
- .name .second_name .surname .title