--- title: Xml Parser keywords: fastai sidebar: home_sidebar summary: "This parser contains functions to extract data from vasprun.xml. All functions in xml parser can work without arguments if working directory contains `vasprun.xml`." description: "This parser contains functions to extract data from vasprun.xml. All functions in xml parser can work without arguments if working directory contains `vasprun.xml`." nb_path: "XmlElementTree.ipynb" ---
{% raw %}
{% endraw %} {% raw %}

  Index 
  XmlElementTree● 
  StaticPlots 
  InteractivePlots 
  Utilities 
  StructureIO 
  Widgets 

{% endraw %}
  • Almost every object in this module returns a Dict2Data object with attributes accessible via dot notation. This object can by transformed to a dictionary by to_dict() method on the object.
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

dict2tuple[source]

dict2tuple(name, d)

Converts a dictionary (nested as well) to namedtuple, accessible via index and dot notation as well as by unpacking.

  • Parameters
    • name: Name of the tuple.
    • d : Dictionary, nested works as well.
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class Dict2Data[source]

Dict2Data(d)

  • Returns a Data object with dictionary keys as attributes of Data accessible by dot notation or by key. Once an attribute is created, it can not be changed from outside.
  • Parmeters
    • dict : Python dictionary (nested as well) containing any python data types.
  • Methods
    • to_dict : Converts a Data object to dictionary if it could be made a dictionary, otherwise throws relevant error.
    • to_json : Converts to json str or save to file if outfil given. Accepts indent as parameter.
    • to_pickle: Converts to bytes str or save to file if outfile given.
    • to_tuple : Converts to a named tuple.
  • Example

    x = Dict2Data({'A':1,'B':{'C':2}}) x Data( A = 1 B = Data( C = 2 ) ) x.B.to_dict() {'C': 2}

{% endraw %} {% raw %}
{% endraw %} {% raw %}

Dict2Data.to_dict[source]

Dict2Data.to_dict()

Converts a Dict2Data object (root or nested level) to a dictionary.

Dict2Data.to_json[source]

Dict2Data.to_json(outfile=None, indent=1)

Dumps a Dict2Data object (root or nested level) to json.

  • Parameters
    • outfile : Default is None and returns string. If given, writes to file.
    • indent : Json indent. Default is 1.

Dict2Data.to_pickle[source]

Dict2Data.to_pickle(outfile=None)

Dumps a Dict2Data object (root or nested level) to pickle.

  • Parameters
    • outfile : Default is None and returns string. If given, writes to file.

Dict2Data.to_tuple[source]

Dict2Data.to_tuple()

Creates a namedtuple.

{% endraw %} {% raw %}
x = Dict2Data({'A':1,'B':2})
print('Dict: ',x.to_dict())
print('JSON: ',x.to_json())
print('Pickle: ',x.to_pickle())
print('Tuple: ',x.to_tuple())
x['A']
Dict:  {'A': 1, 'B': 2}
JSON:  {
 "A": 1,
 "B": 2
}
Pickle:  b'\x80\x03}q\x00(X\x01\x00\x00\x00Aq\x01K\x01X\x01\x00\x00\x00Bq\x02K\x02u.'
Tuple:  Data(A=1, B=2)
1
{% endraw %}

Parser Functions

{% raw %}

read_asxml[source]

read_asxml(path=None)

  • Reads a big vasprun.xml file into memory once and then apply commands. If current folder contains vasprun.xml file, it automatically picks it.

  • Parameters

    • path : Path/To/vasprun.xml
  • Returns

    • xml_data : Xml object to use in other functions
{% endraw %} {% raw %}
{% endraw %} {% raw %}

xml2dict[source]

xml2dict(xmlnode_or_filepath)

Convert xml node or xml file content to dictionary. All output text is in string format, so further processing is required to convert into data types/split etc.

  • The only paramenter xmlnode_or_filepath is either a path to an xml file or an xml.etree.ElementTree.Element object.
  • Each node has tag,text,attr,nodes attributes. Every text element can be accessed via xml2dict()['nodes'][index]['nodes'][index]... tree which makes it simple.
{% endraw %} {% raw %}
{% endraw %} {% raw %}

exclude_kpts[source]

exclude_kpts(xml_data=None)

  • Returns number of kpoints to exclude used from IBZKPT.
  • Parameters
  • Returns
    • int : Number of kpoints to exclude.
{% endraw %} {% raw %}
{% endraw %} {% raw %}

get_ispin[source]

get_ispin(xml_data=None)

  • Returns value of ISPIN.
  • Parameters
  • Returns
    • int : Value of ISPIN.
{% endraw %} {% raw %}
{% endraw %} {% raw %}

get_summary[source]

get_summary(xml_data=None)

  • Returns overview of system parameters.
  • Parameters
  • Returns
    • Data : pivotpy.Dict2Data with attibutes accessible via dot notation.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
import pivotpy.vr_parser as vp
xml_data=vp.read_asxml(path= '../vasprun.xml')
get_summary(xml_data=xml_data).to_tuple()
Data(SYSTEM='AlAs', NION=2, NELECT=8, TypeION=2, ElemName=['Al', 'As'], ElemIndex=[0, 1, 2], E_Fermi=3.72526782, ISPIN=1, fields=['s', 'py', 'pz', 'px', 'dxy', 'dyz', 'dz2', 'dxz', 'x2-y2'], incar=INCAR(SYSTEM='AlAs', PREC='high', ALGO='N', NELMIN='7', EDIFF='0.00000100', ISMEAR='0', SIGMA='0.10000000', LORBIT='11', KPOINT_BSE='-1     0     0     0', LHFCALC='T', HFSCREEN='0.20100000', PRECFOCK='fast'))
{% endraw %} {% raw %}

join_ksegments[source]

join_ksegments(kpath, kseg_inds=[])

Joins a broken kpath's next segment to previous. kseg_inds should be list of first index of next segment

{% endraw %} {% raw %}

get_kpts[source]

get_kpts(xml_data=None, skipk=0, kseg_inds=[])

Returns kpoints and calculated kpath.

Parameters:

xml_data From read_asxml function.

skipk : int Number of initil kpoints to skip.

kseg_inds : list List of indices of kpoints where path is broken.

Returns:

Data : pivotpy.Dict2Data with attibutes kpath and kpoints.

{% endraw %} {% raw %}
{% endraw %} {% raw %}
get_kpts(xml_data=xml_data,skipk=10)
Data(
    NKPTS = 126
    kpoints = <ndarray:shape=(126, 3)>
    kpath = <list:len=126>
)
{% endraw %} {% raw %}

get_tdos[source]

get_tdos(xml_data=None, spin_set=1, elim=[])

  • Returns total dos for a spin_set (default 1) and energy limit. If spin-polarized calculations, gives SpinUp and SpinDown keys as well.
  • Parameters
    • xml_data : From read_asxml function
    • spin_set : int, default is 1.and
    • elim : List [min,max] of energy, default empty.
  • Returns
    • Data : pivotpy.Dict2Data with attibutes E_Fermi, ISPIN,tdos.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
get_tdos(xml_data=xml_data,spin_set=1,elim=[])
Data(
    E_Fermi = 3.72526782
    ISPIN = 1
    tdos = <ndarray:shape=(301, 3)>
)
{% endraw %} {% raw %}

get_evals[source]

get_evals(xml_data=None, skipk=None, elim=[])

  • Returns eigenvalues as numpy array. If spin-polarized calculations, gives SpinUp and SpinDown keys as well.
  • Parameters
    • xml_data : From read_asxml function
    • skipk : Number of initil kpoints to skip.
    • elim : List [min,max] of energy, default empty.
  • Returns
    • Data : pivotpy.Dict2Data with attibutes evals and related parameters.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
get_evals(xml_data=xml_data,skipk=10,elim=[-5,5])
Data(
    E_Fermi = 3.72526782
    ISPIN = 1
    NBANDS = 7
    evals = <ndarray:shape=(126, 7)>
    indices = range(1, 8)
)
{% endraw %} {% raw %}

get_bands_pro_set[source]

get_bands_pro_set(xml_data=None, spin_set=1, skipk=0, bands_range=None, set_path=None)

  • Returns bands projection of a spin_set(default 1). If spin-polarized calculations, gives SpinUp and SpinDown keys as well.
  • Parameters
    • xml_data : From read_asxml function
    • skipk : Number of initil kpoints to skip (Default 0).
    • spin_set : Spin set to get, default is 1.
    • bands_range : If elim used in get_evals,that will return bands_range to use here. Note that range(0,2) will give 2 bands 0,1 but tuple (0,2) will give 3 bands 0,1,2.
    • set_path : path/to/_set[1,2,3,4].txt, works if split_vasprun is used before.
  • Returns
    • Data : pivotpy.Dict2Data with attibutes of bands projections and related parameters.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
get_bands_pro_set(xml_data,skipk=0,spin_set=1,bands_range=range(0, 1))
Data(
    labels = ['s', 'py', 'pz', 'px', 'dxy', 'dyz', 'dz2', 'dxz', 'x2-y2']
    pros = <ndarray:shape=(2, 136, 1, 9)>
)
{% endraw %} {% raw %}

get_dos_pro_set[source]

get_dos_pro_set(xml_data=None, spin_set=1, dos_range=None)

  • Returns dos projection of a spin_set(default 1) as numpy array. If spin-polarized calculations, gives SpinUp and SpinDown keys as well.
  • Parameters
    • xml_data : From read_asxml function
    • spin_set : Spin set to get, default 1.
    • dos_range : If elim used in get_tdos,that will return dos_range to use here..
  • Returns
    • Data : pivotpy.Dict2Data with attibutes of dos projections and related parameters.
{% endraw %} {% raw %}
{% endraw %} {% raw %}

get_structure[source]

get_structure(xml_data=None)

  • Returns structure's volume,basis,positions and rec-basis.
  • Parameters
  • Returns
    • Data : pivotpy.Dict2Data with attibutes volume,basis,positions rec_basis and labels.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
get_structure(xml_data=xml_data)
Data(
    SYSTEM =  AlAs
    volume = 45.73530449
    basis = <ndarray:shape=(3, 3)>
    rec_basis = <ndarray:shape=(3, 3)>
    positions = <ndarray:shape=(2, 3)>
    labels = ['Al 1', 'As 1']
    unique = Data(
        Al = range(0, 1)
        As = range(1, 2)
    )
)
{% endraw %}

Quick Export for Bandstructure

A fully comprehensive command that uses all functions and returns data for spin set 1 (set 1 and 2 if spin-polarized calculations) could be constructed for immediate usage. It is export_vasrun().

{% raw %}

export_vasprun[source]

export_vasprun(path=None, skipk=None, elim=[], kseg_inds=[], shift_kpath=0, try_pwsh=True)

  • Returns a full dictionary of all objects from vasprun.xml file. It first try to load the data exported by powershell's Export-VR(Vasprun), which is very fast for large files. It is recommended to export large files in powershell first.
  • Parameters
    • path : Path to vasprun.xml file. Default is './vasprun.xml'.
    • skipk : Default is None. Automatically detects kpoints to skip.
    • elim : List [min,max] of energy interval. Default is [], covers all bands.
    • kseg_inds : List of indices of kpoints where path is broken.
    • shift_kpath: Default 0. Can be used to merge multiple calculations on single axes side by side.
    • try_pwsh : Default is True and tries to load data exported by Vasp2Visual in Powershell.
  • Returns
    • Data : Data accessible via dot notation containing nested Data objects:
      • sys_info : System Information
      • dim_info : Contains information about dimensions of returned objects.
      • kpoints : numpy array of kpoints with excluded IBZKPT points
      • kpath : 1D numpy array directly accessible for plot.
      • bands : Data containing bands.
      • tdos : Data containing total dos.
      • pro_bands : Data containing bands projections.
      • pro_dos : Data containing dos projections.
      • poscar : Data containing basis,positions, rec_basis and volume.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
export_vasprun(path='E:/Research/graphene_example/ISPIN_1/bands/vasprun.xml',elim=[-1,0],try_pwsh=True)
Loading from PowerShell Exported Data...
Data(
    sys_info = Data(
        SYSTEM = C2
        NION = 2
        NELECT = 8
        TypeION = 1
        ElemName = ['C']
        E_Fermi = -3.3501
        fields = ['s', 'py', 'pz', 'px', 'dxy', 'dyz', 'dz2', 'dxz', 'x2-y2']
        incar = Data(
            SYSTEM = C2
            PREC = high
            ALGO = N
            LSORBIT = T
            NELMIN = 7
            ISMEAR = 0
            SIGMA = 0.10000000
            LORBIT = 11
            GGA = PS
        )
        ElemIndex = [0, 2]
        ISPIN = 1
    )
    dim_info = Data(
        kpoints = (NKPTS,3)
        kpath = (NKPTS,1)
        bands = ⇅(NKPTS,NBANDS)
        dos = ⇅(grid_size,3)
        pro_dos = ⇅(NION,grid_size,en+pro_fields)
        pro_bands = ⇅(NION,NKPTS,NBANDS,pro_fields)
    )
    kpoints = <ndarray:shape=(90, 3)>
    kpath = <list:len=90>
    bands = Data(
        E_Fermi = -3.3501
        ISPIN = 1
        NBANDS = 21
        evals = <ndarray:shape=(90, 21)>
        indices = range(1, 22)
    )
    tdos = Data(
        E_Fermi = -3.3501
        ISPIN = 1
        tdos = <ndarray:shape=(301, 3)>
    )
    pro_bands = Data(
        labels = ['s', 'py', 'pz', 'px', 'dxy', 'dyz', 'dz2', 'dxz', 'x2-y2']
        pros = <ndarray:shape=(2, 90, 21, 9)>
    )
    pro_dos = Data(
        labels = ['s', 'py', 'pz', 'px', 'dxy', 'dyz', 'dz2', 'dxz', 'x2-y2']
        pros = <ndarray:shape=(2, 301, 10)>
    )
    poscar = Data(
        SYSTEM = C2
        volume = 105.49324928
        basis = <ndarray:shape=(3, 3)>
        rec_basis = <ndarray:shape=(3, 3)>
        positions = <ndarray:shape=(2, 3)>
        labels = ['C 1', 'C 2']
        unique = Data(
            C = range(0, 2)
        )
    )
)
{% endraw %} {% raw %}
{% endraw %}

Joining Multiple Calculations

  • Sometimes one may need to compare two or more bandstructures in same figure, for that reason, it is easy to export two calculations and plot on same axis.
  • There is another situation, if you have a large supercell and split calculations into multiple ones, joining that calculations works same way, you will add the last value of first kpath into all values of next kpath and next last to next and so on, by just using shift_kpath in export_vasprun and plot each export on same axis, this will align bandstructures side by side on same axis.

Load Exported Vasprun from PowerShell

On Windows, it will work automatically. On Linux/Mac it may require path to powershell executable.

{% raw %}

load_export[source]

load_export(path='./vasprun.xml', kseg_inds=[], shift_kpath=0, path_to_ps='pwsh', skipk=None, max_filled=10, max_empty=10, keep_files=True)

  • Returns a full dictionary of all objects from vasprun.xml file exported using powershell.
  • Parameters
    • path : Path to vasprun.xml file. Default is './vasprun.xml'.
    • skipk : Default is None. Automatically detects kpoints to skip.
    • path_to_ps : Path to powershell.exe. Automatically picks on Windows and Linux if added to PATH.
    • kseg_inds : List of indices of kpoints where path is broken.
    • shift_kpath: Default 0. Can be used to merge multiple calculations side by side.
    • keep_files : Could be use to clean exported text files. Default is True.
    • max_filled : Number of filled bands below and including VBM. Default is 10.
    • max_empty : Number of empty bands above VBM. Default is 10.
  • Returns
    • Data : Data accessible via dot notation containing nested Data objects:
      • sys_info : System Information
      • dim_info : Contains information about dimensions of returned objects.
      • kpoints : numpy array of kpoints with excluded IBZKPT points
      • kpath : 1D numpy array directly accessible for plot.
      • bands : Data containing bands.
      • tdos : Data containing total dos.
      • pro_bands : Data containing bands projections.
      • pro_dos : Data containing dos projections.
      • poscar : Data containing basis,positions, rec_basis and volume.
{% endraw %} {% raw %}
{% endraw %}

This back and forth data transport is required in pivotpy-dash app where data is stored in browser in json format, but needs to by python objects for figures.

Write Clean data to JSON or Pickle file

Use dump_vasprun to write output of export_vasprun or load_export to pickle/json file. Pickle is useful for quick load in python while json is useful to transfer data into any language.

{% raw %}

dump_dict[source]

dump_dict(dict_data=None, dump_to='pickle', outfile=None, indent=1)

  • Dump an export_vasprun or load_export's Data object or any dictionary to json or pickle string/file. It convert Dict2Data to dictionary before serializing to json/pickle, so json/pickle.loads() of converted Data would be a simple dictionary, pass that to Dict2Data to again make accessible via dot notation.
  • Parameters
    • dict_data : Any dictionary/Dict2Data object containg numpy arrays, including export_vasprun or load_export output.
    • dump_to : Defualt is pickle or json.
    • outfile : Defualt is None and return string. File name does not require extension.
    • indent : Defualt is 1. Only works for json.
{% endraw %} {% raw %}
{% endraw %} {% raw %}

load_from_dump[source]

load_from_dump(file_or_str, keep_as_dict=False)

  • Loads a json/pickle dumped file or string by auto detecting it.
  • Parameters
    • file_or_str : Filename of pickl/json or their string.
    • keep_as_dict: Defualt is False and return Data object. If True, returns dictionary.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
import pivotpy as pp 
evr = pp.Vasprun('../vasprun.xml').data
{% endraw %} {% raw %}
s = dump_dict(evr.poscar,dump_to='pickle')
#print(s)
load_from_dump(s)
Data(
    SYSTEM =  AlAs
    volume = 45.73530449
    basis = <ndarray:shape=(3, 3)>
    rec_basis = <ndarray:shape=(3, 3)>
    positions = <ndarray:shape=(2, 3)>
    labels = ['Al 1', 'As 1']
    unique = Data(
        Al = range(0, 1)
        As = range(1, 2)
    )
)
{% endraw %}

Parse Text Files with Flexibility

  • The function islice2array is used to read text files which have patterns of text and numbers inline, such as EIGENVAL and PROCAR. With all the options of this function, reading and parsing of such files should take a few lines of code only. It can be used to read txt,csv tsv as well with efficent speed.
  • It reads a file without fully loading into memory and you can still access slices of data in the file. That partial data fetching from file is very handy.
{% raw %}

islice2array[source]

islice2array(path_or_islice, dtype=float, delimiter='\\s+', include=None, exclude='#', raw=False, fix_format=True, start=0, nlines=None, count=-1, cols=None, new_shape=None)

  • Reads a sliced array from txt,csv type files and return to array. Also manages if columns lengths are not equal and return 1D array. It is faster than loading whole file into memory. This single function could be used to parse EIGENVAL, PROCAR, DOCAR and similar files with just a combination of exclude, include,start,stop,step arguments.
  • Parameters
    • path_or_islice: Path/to/file or itertools.islice(file_object). islice is interesting when you want to read different slices of an opened file and do not want to open it again and again. For reference on how to use it just execute pivotpy.export_potential?? in a notebook cell or ipython terminal to see how islice is used extensively.
    • dtype: float by default. Data type of output array, it is must have argument.
    • start,nlines: The indices of lines to start reading from and number of lines after start respectively. Only work if path_or_islice is a file path. both could be None or int, while start could be a list to read slices from file provided that nlines is int. The spacing between adjacent indices in start should be equal to or greater than nlines as pointer in file do not go back on its own. These parameters are in output of slice_data

      Note: start should count comments if exclude is None. You can use slice_data function to get a dictionary of start,nlines, count, cols, new_shape and unpack in argument instead of thinking too much.

    • count: np.size(output_array) = nrows x ncols, if it is known before execution, performance is increased. This parameter is in output of slice_data.
    • delimiter: Default is \s+. Could be any kind of delimiter valid in numpy and in the file.
    • cols: List of indices of columns to pick. Useful when reading a file like PROCAR which e.g. has text and numbers inline. This parameter is in output of slice_data.
    • include: Default is None and includes everything. String of patterns separated by | to keep, could be a regular expression.
    • exclude: Default is '#' to remove comments. String of patterns separated by | to drop,could be a regular expression.
    • raw : Default is False, if True, returns list of raw strings. Useful to select cols.
    • fix_format: Default is True, it sepearates numbers with poor formatting like 1.000-2.000 to 1.000 2.000 which is useful in PROCAR. Keep it False if want to read string literally.
    • new_shape : Tuple of shape Default is None. Will try to reshape in this shape, if fails fallbacks to 2D or 1D. This parameter is in output of slice_data.
  • Examples

    islice2array('path/to/PROCAR',start=3,include='k-point',cols=[3,4,5])[:2] array([[ 0.125, 0.125, 0.125], [ 0.375, 0.125, 0.125]]) islice2array('path/to/EIGENVAL',start=7,exclude='E',cols=[1,2])[:2] array([[-11.476913, 1. ], [ 0.283532, 1. ]]) Note: Slicing a dimension to 100% of its data is faster than let say 80% for inner dimensions, so if you have to slice more than 50% of an inner dimension, then just load full data and slice after it.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

slice_data[source]

slice_data(dim_inds, old_shape)

  • Returns a dictionary that can be unpacked in arguments of isclice2array function. This function works only for regular txt/csv/tsv data files which have rectangular data written.
  • Parameters
    • dim_inds : List of indices array or range to pick from each dimension. Inner dimensions are more towards right. Last itmes in dim_inds is considered to be columns. If you want to include all values in a dimension, you can put -1 in that dimension. Note that negative indexing does not work in file readig, -1 is s special case to fetch all items.
    • old_shape: Shape of data set including the columns length in right most place.
  • Example
    • You have data as 3D arry where third dimension is along column.

      0 0 0 2 1 0 1 2

    • To pick [[0,2], [1,2]], you need to give

      slice_data(dim_inds = [[0,1],[1],-1], old_shape=(2,2,2)) {'start': array([1, 3]), 'nlines': 1, 'count': 2}

    • Unpack above dictionary in islice2array and you will get output array.
  • Note that dimensions are packed from right to left, like 0,2 is repeating in 2nd column.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
slice_data([list(range(1,7)),-1,-1,range(2)],old_shape=[52,768,64,9])
{'start': array([ 49152,  98304, 147456, 196608, 245760, 294912]),
 'nlines': 49152,
 'count': 294912,
 'cols': [0, 1],
 'new_shape': (6, 768, 64, 9)}
{% endraw %}

Process Largs vasprun.xml Files

You can split a large vasprun.xml file in a small _vasprun.xml file which does not contain projected data, and _set[1,2,3,4].txt file(s) which contain projected data of each spin set. These spin set text files can be processed by islice2array function efficiently.

{% raw %}

split_vasprun[source]

split_vasprun(path=None)

  • Splits a given vasprun.xml file into a smaller _vasprun.xml file plus _set[1,2,3,4].txt files which contain projected data for each spin set.
  • Parameters
    • path: path/to/vasprun.xml file.
  • Output
    • _vasprun.xml file with projected data.
    • _set1.txt for projected data of colinear calculation.
    • _set1.txt for spin up data and _set2.txt for spin-polarized case.
    • _set[1,2,3,4].txt for each spin set of non-colinear calculations.
{% endraw %} {% raw %}
{% endraw %} {% raw %}

  Index 
  XmlElementTree● 
  StaticPlots 
  InteractivePlots 
  Utilities 
  StructureIO 
  Widgets 

{% endraw %}