Package cubicweb :: Module dataimport
[hide private]
[frames] | no frames]

Module dataimport

source code

This module provides tools to import tabular data.



Example of use (run this with `cubicweb-ctl shell instance import-script.py`):

.. sourcecode:: python

  from cubicweb.devtools.dataimport import *
  # define data generators
  GENERATORS = []

  USERS = [('Prenom', 'firstname', ()),
           ('Nom', 'surname', ()),
           ('Identifiant', 'login', ()),
           ]

  def gen_users(ctl):
      for row in ctl.iter_and_commit('utilisateurs'):
          entity = mk_entity(row, USERS)
          entity['upassword'] = u'motdepasse'
          ctl.check('login', entity['login'], None)
          ctl.store.add('CWUser', entity)
          email = {'address': row['email']}
          ctl.store.add('EmailAddress', email)
          ctl.store.relate(entity['eid'], 'use_email', email['eid'])
          ctl.store.rql('SET U in_group G WHERE G name "users", U eid %(x)s', {'x':entity['eid']})

  CHK = [('login', check_doubles, 'Utilisateurs Login',
          'Deux utilisateurs ne devraient pas avoir le même login.'),
         ]

  GENERATORS.append( (gen_users, CHK) )

  # create controller
  if 'cnx' in globals():
      ctl = CWImportController(RQLObjectStore(cnx))
  else:
      print 'debug mode (not connected)'
      print 'run through cubicweb-ctl shell to access an instance'
      ctl = CWImportController(ObjectStore())
  ctl.askerror = 1
  ctl.generators = GENERATORS
  ctl.data['utilisateurs'] = lazytable(ucsvreader(open('users.csv')))
  # run
  ctl.run()

.. BUG file with one column are not parsable
.. TODO rollback() invocation is not possible yet

Classes [hide private]
  catch_error
Helper for @contextmanager decorator.
  ObjectStore
Store objects in memory for faster validation (development mode)
  RQLObjectStore
ObjectStore that works with an actual RQL repository (production mode)
  CWImportController
Controller of the data import process.
  NoHookRQLObjectStore
ObjectStore that works with an actual RQL repository (production mode)
  MetaGenerator
Functions [hide private]
 
count_lines(stream_or_filename) source code
 
ucsvreader_pb(stream_or_path, encoding='utf-8', separator=',', quote='"', skipfirst=False, withpb=True)
same as ucsvreader but a progress bar is displayed as we iter on rows
source code
 
ucsvreader(stream, encoding='utf-8', separator=',', quote='"', skipfirst=False)
A csv reader that accepts files with any encoding and outputs unicode strings
source code
 
callfunc_every(func, number, iterable)
yield items of iterable one by one and call function func every number iterations. Always call function func at the end.
source code
 
lazytable(reader)
The first row is taken to be the header of the table and used to output a dict for each row of data.
source code
 
mk_entity(row, map)
Return a dict made from sanitized mapped values.
source code
 
tell(msg) source code
 
confirm(question)
A confirm function that asks for yes/no/abort and exits on abort.
source code
 
optional(value)
checker to filter optional field
source code
 
required(value)
raise ValueError if value is empty
source code
 
todatetime(format='%d/%m/%Y')
return a transformation function to turn string input value into a datetime.datetime instance, using given format.
source code
 
call_transform_method(methodname, *args, **kwargs)
return value returned by calling the given method on input
source code
 
call_check_method(methodname, *args, **kwargs)
check value returned by calling the given method on input is true, else raise ValueError
source code
 
check_doubles(buckets)
Extract the keys that have more than one item in their bucket.
source code
 
check_doubles_not_none(buckets)
Extract the keys that have more than one item in their bucket.
source code
Function Details [hide private]

mk_entity(row, map)

source code 

Return a dict made from sanitized mapped values.

ValueError can be raised on unexpected values found in checkers

>>> row = {'myname': u'dupont'}
>>> map = [('myname', u'name', (call_transform_method('title'),))]
>>> mk_entity(row, map)
{'name': u'Dupont'}
>>> row = {'myname': u'dupont', 'optname': u''}
>>> map = [('myname', u'name', (call_transform_method('title'),)),
...        ('optname', u'MARKER', (optional,))]
>>> mk_entity(row, map)
{'name': u'Dupont', 'optname': None}

optional(value)

source code 

checker to filter optional field

If value is undefined (ex: empty string), return None that will break the checkers validation chain

General use is to add 'optional' check in first condition to avoid ValueError by further checkers

>>> MAPPER = [(u'value', 'value', (optional, int))]
>>> row = {'value': u'XXX'}
>>> mk_entity(row, MAPPER)
{'value': None}
>>> row = {'value': u'100'}
>>> mk_entity(row, MAPPER)
{'value': 100}

required(value)

source code 

raise ValueError if value is empty

This check should be often found in last position in the chain.

todatetime(format='%d/%m/%Y')

source code 

return a transformation function to turn string input value into a datetime.datetime instance, using given format.

Follow it by todate or totime functions from logilab.common.date if you want a date/time instance instead of datetime.