Metadata-Version: 2.2
Name: pelicandbms
Version: 0.5.70
Summary: Ultrafast binary JSON-oriented NoSQL
Author-email: Dmitry Vorontsov <dv1555@hotmail.com>
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/x-rst
License-File: LICENSE
Requires-Dist: filelock

Pelican is a lightning-fast serverless JSON-based DBMS with improved performance for all atomic CRUD operations. Allows you to work with large collections without losing performance due to data volume. That is, the speed of basic operations does not depend on size of database.

The strength of NoSQL document-oriented databases is their natural simplicity, but they are usually not very fast (unless they are serious backend databases like MongoDB). Pelican solves performance problems in all critical areas - insert, upsert, update, delete, get

* **Instant insert, upsert, update, delete and get operations** due to a special storage architecture (records are appended to the end, data is stored in binary form). The operation time does not depend on how many records are already in the collection and what the file size is
* Object versioning
* Additional approaches for even greater speed
* Support for **transactions** (sessions), use of custom handlers in a transaction
* Pointers to data are always **stored in RAM** with concurrent change tracking: data is recalculated from disk only if it has been modified by another process.
* Blocking files for writing for a short time (data is pre-prepared) which makes it easier to work in multi-threaded mode
* **ACID for multi-user and multi-threaded operation**
* Two types of indexes for key types of queries - **hash index** and **special B-tree** for full-text search.
* Syntax similar to MongoDB, incl. 100% similar query language
* Written in **pure Python**, about 2000 lines in total.

Why Pelican?
------------------

Written for situations when you need to organize a local database without a server with a JSON-oriented interface. For example, in a mobile solution. But at the same time, there are increased performance requirements: large collections (1,000,000+ documents in a collection) require fast, almost instantaneous execution of some operations:

* Adding a new (changing, deleting) document to a collection with 1,000,000 documents – 1-2 milliseconds,running time does not depend on collection size
* Find an element by equality in a collection with more than 1,000,000 entries in 1-2 microseconds.
* Organize a real-time search for the occurrence of a string in a large collection without friezes


.. code-block:: Python

    from pelicandb import Pelican,DBSession,feed
    import os
    from pathlib import Path
    import os
    import time


    import queue
    import threading


    """
    Basic Examples: CRUD Operations

    """

    #Initializing the database, path= path to the database directory
    db = Pelican("samples_db1",path=os.path.dirname(Path(__file__).parent))

    #adding a document without an ID
    id = db["goods"].insert({"name":"Banana"})
    print("Added :",id,sep=" ")

    #adding a document with an ID
    try:
        id = db["goods"].insert({"name":"Banana", "_id":"1"})
    except:
        print("the document already exists")    

    #Upsert
    db["goods"].insert({"name":"Peach", "price":100, "_id":"2"}, upsert=True)
    db["goods"].insert({"name":"Peach", "price":99, "_id":"2"}, upsert=True)

    #Insert array of documents
    ids = db["goods"].insert([{"name":"Apple", "price":60}, {"name":"Pear", "price":70}], upsert=True)
    print("Added:",ids,sep=" ")

    #All documents in the collection
    result = db["goods"].all()
    print(result)

    #Get by ID
    result = db["goods"].get("2")
    print(result)

    #...same thing via find
    result = db["goods"].find({"_id":"2"})
    print(result)


    #Get a specific version of a document by id
    result = db["goods"].get_version("2",0)
    print(result)

    #search by condition #1
    result = db["goods"].find({"name":"Peach"})
    print(result)

    #search by condition #2
    result = db["goods"].find({"price":{"$lte":70}})
    print(result)

    #search by condition #3
    result = db["goods"].find({"name":{"$regex":"Pea"}})
    print(result)

    #Update - search, update collection documents
    #сondition is similar to find, and the data argument is insert/upsert
    db["goods"].update({"name":"Peach"},{"updated":True})
    print(db["goods"].find({"name":"Peach"}))

    #Delete - search, delete collection documents
    #сondition is similar to find, and the data argument is insert/upsert
    db["goods"].delete({"name":"Peach"})

    #сondition as a list
    db["goods"].delete(["1","2"])

    #shrink deleted entries (optional)
    db['goods'].shrink()

    #complete deletion of the entire collection
    db["goods"].clear()


    """
    Indexes: hash index and text index

    """
    # hash indexes
    db['goods'].register_hash_index("hash_barcode","barcode", dynamic=False) #stored index on the barcode field
    #after adding documents, indexes are automatically updated
    db["goods"].insert([{"name":"Apple", "price":60, "barcode":"22000001441" }, {"name":"Pear", "price":70,"barcode":"22000001442"}], upsert=True)
    #search by index works accordingly
    r = db['goods'].get_by_index(db["hash_barcode"],"22000001442")
    print(r)
    db['goods'].register_hash_index("hash_barcode_dynamic","barcode", dynamic=True) #dynamic index registration index by barcode field
    #for dynamic at startup it makes sense to reindex
    db['goods'].reindex_hash("hash_barcode_dynamic")
    r = db['goods'].get_by_index(db["hash_barcode_dynamic"],"22000001442")
    print(r)

    #text indexes
    #registering a text index
    db['goods'].register_text_index("text_regular","name", dynamic=False) #there are stored indexes
    #if necessary (there is already a database with data) we re-index
    db['goods'].reindex_text("text_regular")
    db["goods"].insert([{"name":"Apple Golden", "price":60, "barcode":"22000001443" }], upsert=True)
    t = db['text_regular'].search_text_index("Appl")
    print(t)


    """
    Transactions and stored procedures. Search functions. Triggers. 
    """
    #Regular transaction. Writing to the database occurs after all operations are completed. If they are not executed, the transaction is not committed.
    #For example, when this code restarted, the first operation will generate an error
    try:
        with DBSession(db) as s:
            
            docs = [{"name":"Item #1","_id":"12"},{"name":"Item #2","_id":"121"} ]
            id = db["goods2"].insert(docs, upsert=False, session=s)
            id = db["goods2"].insert(docs, upsert=True, session=s)
    except Exception as e:
        print("Transaction not commited:" + str(e))  

    #Using a function to search instead of conditions
    def check_name(document, value):
        if document.get("name")== value:
            return True
        else:
            return False
    #we pass the function as a parameter, it works with the document
    res2 = db['goods'].find([check_name,"Pear"])


    #Using a Function as a "Before Write" Trigger
    def update_document_before_change(type,document):
        
        if document == None:
            raise ValueError("Document is null")
        else:
            if isinstance(document, list):
                for doc in document:
                    doc['Checked'] = True
            else:
                document['Checked'] = True

    db["goods2"].register_before_change_handler(update_document_before_change)
    id = db["goods2"].insert([{"name1":"Banana","_id":"111222"}], upsert=True)


    #Using the function to control before recording

    def check_document_before_change(type,document):
        
        if document == None:
            raise ValueError("Document is null")
        else:
            if not 'barcode' in document:
                #raise ValueError("No barcode in document")
                print("No barcode in document")

    db["goods2"].register_before_change_handler(check_document_before_change)
    id = db["goods2"].insert([{"name1":"Banana","_id":"111222"}], upsert=True)
    try:
        with DBSession(db) as s:
            
            docs = [{"name":"Item #1","_id":"12"},{"name":"Item #2","_id":"121"} ]
            id = db["goods2"].insert(docs, upsert=True, session=s)
            id = db["goods3"].insert(docs, upsert=True, session=s)
    except Exception as e:
        print("Transaction not commited:" + str(e)) 



    """
    Additional tricks to improve productivity
    """
    #1. Working with a pre-initialized database stack
    #some dictionary with databases
    db = Pelican("samples_db1",path=os.path.dirname(Path(__file__).parent))
    dbmap = {"samples_db1":db}

    #2. Initialization (preemptive reading of tables)
    dbmap["samples_db1"].initialize()

    #3. Using the singletone option to eliminate change checking (if there is no parallel write)
    db = Pelican("samples_db1",path=os.path.dirname(Path(__file__).parent), singleton=True)

    #4. Using the RAM=True option to place collection data in memory (default RAM=True)
    db = Pelican("samples_db1",path=os.path.dirname(Path(__file__).parent), RAM = True)

    #5. Working with indexes in a background thread (by default, when documents change, indexes are written synchronously)
    q = queue.Queue()
    def indexing(q):
        while True:
            task = q.get()
            
            documents = task[0]
            collection_name = task[1]
            db_name = task[2]
            operation = task[3]

            if operation=="add":
                dbmap[db_name][collection_name]._add_values_to_unique_indexes(documents)
                dbmap[db_name][collection_name]._add_values_to_text_indexes(documents)
            elif operation=="delete":
                dbmap[db_name][collection_name]._delete_values_from_unique_indexes(documents)
                dbmap[db_name][collection_name]._delete_values_from_text_indexes(documents)

            q.task_done()


    tinput = threading.Thread(target=indexing, args=(q,))
    tinput.daemon = True
    tinput.start()  

    #the queue is passed as a parameter to the database object
    db2 = Pelican("samples_db2",path=os.path.dirname(Path(__file__).parent),RAM = False, queue=q, singleton=True)

    dbmap["samples_db2"] = db2

    db2['goods'].register_hash_index("hash_barcode","barcode", dynamic=False)
    db2["goods"].insert([{"name":"Apple", "price":60, "barcode":"22000001441" }, {"name":"Pear", "price":70,"barcode":"22000001442"}], upsert=True)

    #Here you need to understand that because indexes are written asynchronously, then the use of indexes may not keep up, so I set sleep
    time.sleep(1)
    r = db2['goods'].get_by_index(db2["hash_barcode"],"22000001442")
    print(r)


    """
    Simplified work with ready-made datasets (data synchronization). Created so as not to parse JSON on the database side, but to transmit the message as is. To do this, it must be in a special format.
    """
    #as a parameter, the "database stack" from the previous example is passed. The second parameter of feed is either a string or a python object with commands in the following format
    """
    {
            "<database_name_in_stack>": {
                "<collection>": {
                    "uid": "<(optional) Operation ID (for response)>",
                    "<command:insert/upsert/update/delete/get/find>": <document(for update-[<condition>,<document>])> 
                }
            }
        }
    """

    #Example 1: upsert 1 document
    res = feed(dbmap,[
        {
            "samples_db1": {
                "goods_1": {
                    "uid": "23",
                    "upsert": {
                        "name": "banana"
                    }
                }
            }
        }
    ])

    #Example 2 with a transaction (the transaction is formatted in square brackets)
    res = feed(dbmap,[
        {
            "samples_db1": [ #transaction
                {
                    "goods_1": {
                        "upsert": {
                            
                            "name": "banana"
                        },
                        "uid": "s1"

                    }
                },
                {
                    "operations_1": {
                        "insert": {
                            
                            "type": "client_operation"
                        },
                        "uid": "s2"
                    }
                }
            ]
        }
    ])

    #Example 3 with search
    res2 = feed(dbmap,[
        {
            "samples_db1": {
                "goods_1": {
                    "uid": "23",
                    "find": {
                        "name": "banana"
                    }
                }
            }
        }
    ])
