parsnip¶
Overview
An interface for reading CIF files in Python.
Importing parsnip allows users to read CIF 1.1 files, as well as many features from the CIF 2.0 and mmCIF formats.
Creating a CifFile object provides easy access to name-value pairs, as well
as loop_-delimited loops. Data entries can be extracted as python primitives or
numpy arrays for further use.
The CIF Format
This is an example of a simple CIF file. A key (data name or tag) must start with
an underscore, and is separated from the data value with whitespace characters.
A table begins with the loop_ keyword, and contain a header block and a data
block. The vertical position of a tag in the table headings corresponds with the
horizontal position of the associated column in the table values.
# A header describing this portion of the file
data_cif_Cu-FCC
# Several key-value pairs
_journal_year 1999
_journal_page_first 0
_journal_page_last 123
_chemical_name_mineral 'Copper FCC'
_chemical_formula_sum 'Cu'
# Key-value pairs describing the unit cell (Å and °)
_cell_length_a 3.6
_cell_length_b 3.6
_cell_length_c 3.6
_cell_angle_alpha 90.0
_cell_angle_beta 90.0
_cell_angle_gamma 90.0
# A table with 6 columns and one row
loop_
_atom_site_label
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_type_symbol
_atom_site_Wyckoff_label
Cu1 0.0000000000 0.0000000000 0.0000000000 Cu a
_symmetry_space_group_name_H-M 'Fm-3m' # One more key-value pair
# A table with two columns and four rows:
loop_
_symmetry_equiv_pos_site_id
_symmetry_equiv_pos_as_xyz
1 x,y,z
96 z,y+1/2,x+1/2
118 z+1/2,-y,x+1/2
192 z+1/2,y+1/2,x
Classes:
|
Parser for CIF files. |
- class CifFile(file, cast_values=False, strict=False)¶
Bases:
objectParser for CIF files.
Example
To get started, simply provide a filename:
>>> from parsnip import CifFile >>> cif = CifFile("example_file.cif") >>> print(cif) CifFile(file='example_file.cif') : 12 data entries, 2 data loops
Data entries are accessible via the
pairsandloopsattributes:>>> cif.pairs {'_journal_year': '1999', '_journal_page_first': '0', ...} >>> cif.loops[0] array([[('Cu1', '0.0000000000', '0.0000000000', '0.0000000000', 'Cu', 'a')]], dtype=...) >>> cif.loops[1] array([[('1', 'x,y,z')], [('96', 'z,y+1/2,x+1/2')], [('118', 'z+1/2,-y,x+1/2')], [('192', 'z+1/2,y+1/2,x')]], dtype=...)
Tip
See the docs for
__getitem__andget_from_loopsto query for data by key or column label.- Parameters:
Attributes:
A dict containing key-value pairs extracted from the file.
A list of data tables (loop_'s) extracted from the file.
The lattice vectors of the unit cell, with \(\vec{a_1}\perp[100]\).
A list of column labels for each data array.
Extract the symmetry operations in a parsable algebraic form.
Extract symmetry-irreducible, fractional x,y,z coordinates.
Whether to cast "number-like" values to ints & floats.
Regex patterns used when parsing files.
Methods:
__getitem__(index)get_from_pairs(index)Return an item or items from the dictionary of key-value pairs.
get_from_loops(index)Return a column or columns from the matching table in
loops.read_cell_params([degrees, normalize])Read the unit cell parameters (lengths and angles).
build_unit_cell([n_decimal_places, ...])Reconstruct fractional atomic positions from Wyckoff sites and symops.
set_wyckoff_positions(wyckoff_sites)Set the Wyckoff sites in the CIF file data.
Convert a structured (column-labeled) array to a standard unstructured array.
- property pairs¶
A dict containing key-value pairs extracted from the file.
Numeric values will be parsed to int or float if possible. In these cases, precision specifiers will be stripped.
- property loops¶
A list of data tables (loop_’s) extracted from the file.
These are stored as numpy structured arrays, which can be indexed by column labels. See the
structured_to_unstructuredhelper function below for details on converting to standard arrays.- Returns:
A list of structured arrays containing table data from the file.
- Return type:
- __getitem__(index)¶
Return an item or list of items from
pairs()andloops().This getter searches the entire CIF state to identify the input keys, returning
Noneif the key does not match any data. Matching columns from loop tables are returned as 1D arrays.Tip
This method of accessing data is recommended for most uses, as it ensures data is returned wherever possible.
get_from_loops()may be useful when multi-column slices of an array are needed.Example
Indexing the class with a single key:
>>> cif["_journal_year"] '1999' >>> cif["_atom_site_label"] array([['Cu1']], dtype='<U12')
Indexing with a list of keys:
>>> cif[["_chemical_name_mineral", "_symmetry_equiv_pos_as_xyz"]] ["'Copper FCC'", array([['x,y,z'], ['z,y+1/2,x+1/2'], ['z+1/2,-y,x+1/2'], ['z+1/2,y+1/2,x']], dtype='<U14')]
Wildcards are supported for lookups with this method:
>>> cif[["_journal*", "_atom_site_fract_?"]] [['1999', '0', '123'], ...array([['0.0000000000', '0.0000000000', '0.0000000000']], dtype='<U12')]
- get_from_pairs(index)¶
Return an item or items from the dictionary of key-value pairs.
Tip
This method supports unix-style wildcards. Use
*to match any number of any character, and?to match any single character. If a wildcard matches more than one key, a list is returned for that index. The ordering of array data resulting from wildcard queries matches the ordering of the matching keys in the file. Lookups using this method are case-insensitive, per the CIF specification.Indexing with a string returns the value from the
pairs()dict. Indexing with an Iterable of strings returns a list of values, withNoneas a placeholder for keys that did not match any data.Example
Indexing the class with a single key:
>>> cif.get_from_pairs("_journal_year") '1999'
Indexing with a list of keys:
>>> cif.get_from_pairs(["_journal_page_first", "_journal_page_last"]) ['0', '123']
Indexing with wildcards:
>>> cif.get_from_pairs("_journal*") ['1999', '0', '123']
Single-character wildcards can generalize keys across CIF and mmCIF files:
>>> cif.get_from_pairs("_symmetry?space_group_name_H-M") "'Fm-3m'"
- get_from_loops(index)¶
Return a column or columns from the matching table in
loops.If index is a single string, a single column will be returned from the matching table. If index is an Iterable of strings, the corresponding table slices will be returned. Slices from the same table will be grouped in the output array, but slices from different arrays will be returned separately.
Tip
It is highly recommended that queries across multiple loops are provided in separated calls to this function. This helps ensure output data is ordered as expected and allows for easier handling of cases where non-matching keys are provided.
Example
Extract a single column from a single table:
>>> cif.get_from_loops("_symmetry_equiv_pos_as_xyz") array([['x,y,z'], ['z,y+1/2,x+1/2'], ['z+1/2,-y,x+1/2'], ['z+1/2,y+1/2,x']], dtype='<U14')
Extract multiple columns from a single table:
>>> table_1_cols = ["_symmetry_equiv_pos_site_id", "_symmetry_equiv_pos_as_xyz"] >>> table_1 = cif.get_from_loops(table_1_cols) >>> table_1 array([['1', 'x,y,z'], ['96', 'z,y+1/2,x+1/2'], ['118', 'z+1/2,-y,x+1/2'], ['192', 'z+1/2,y+1/2,x']], dtype='<U14')
Wildcard patterns are accepted for single input keys:
>>> assert (cif.get_from_loops("_symmetry_equiv_pos*") == table_1).all()
Extract multiple columns from multiple loops:
>>> table_1_cols = ["_symmetry_equiv_pos_site_id", "_symmetry_equiv_pos_as_xyz"] >>> table_2_cols = ["_atom_site_type_symbol", "_atom_site_Wyckoff_label"] >>> [cif.get_from_loops(cols) for cols in (table_1_cols, table_2_cols)] [array([['1', 'x,y,z'], ['96', 'z,y+1/2,x+1/2'], ['118', 'z+1/2,-y,x+1/2'], ['192', 'z+1/2,y+1/2,x']], dtype='<U14'), array([['Cu', 'a']], dtype='<U12')]
Caution
Returned arrays will match the ordering of input
indexkeys if all indices correspond to a single table. Indices that match multiple loops will return all possible matches, in the order of the input loops. Lists of input that correspond with multiple loops will return data from those loops in the order they were read from the file.Case where ordering of output matches the input file, not the provided keys:
>>> cif.get_from_loops([*table_1_cols, *table_2_cols]) [array([['Cu', 'a']], dtype='<U12'), array([['1', 'x,y,z'], ['96', 'z,y+1/2,x+1/2'], ['118', 'z+1/2,-y,x+1/2'], ['192', 'z+1/2,y+1/2,x']], dtype='<U14')]
- Parameters:
index (str | Iterable[str]) – A column name or list of column names.
- Returns:
A list of unstructured arrays corresponding with matches from the input keys. If the resulting list would have length 1, the data is returned directly instead. See the note above for data ordering.
- Return type:
list[
numpy.ndarray] |numpy.ndarray
- read_cell_params(degrees=True, normalize=False)¶
Read the unit cell parameters (lengths and angles).
- Parameters:
- Returns:
The box vector lengths (in angstroms) and angles (in degrees or radians) \((L_1, L_2, L_3, \alpha, \beta, \gamma)\).
- Return type:
- Raises:
ValueError – If the stored data cannot form a valid box.
- build_unit_cell(n_decimal_places=4, additional_columns=None, parse_mode='python_float', verbose=False)¶
Reconstruct fractional atomic positions from Wyckoff sites and symops.
Rather than storing an entire unit cell’s atomic positions, CIF files instead include the data required to recreate those positions based on symmetry rules. Symmetry operations (stored as strings of x,y,z position permutations) are applied to the Wyckoff (symmetry irreducible) positions to create a list of possible atomic sites. These are then wrapped into the unit cell and filtered for uniqueness to yield the final crystal.
Tip
If the parsed unit cell has more atoms than expected, decrease
n_decimal_placesto account for noise. If the unit cell has fewer atoms than expected, increasen_decimal_placesto ensure atoms are compared with sufficient precision. In many cases, settingparse_mode='sympy'can improve the accuracy of reconstructed unit cells.Example
Construct the atomic positions of the FCC unit cell from its Wyckoff sites:
>>> pos = cif.build_unit_cell() >>> pos array([[0. , 0. , 0. ], [0. , 0.5, 0.5], [0.5, 0. , 0.5], [0.5, 0.5, 0. ]])
Reconstruct a unit cell with its associated atomic labels. The ordering of the auxiliary data array will match the ordering of the atomic positions:
>>> data = cif.build_unit_cell(additional_columns=["_atom_site_type_symbol"]) >>> data[0] # Chemical symbol for the atoms at each lattice site array([['Cu'], ['Cu'], ['Cu'], ['Cu']], dtype='<U12') >>> data[1] # Lattice positions array([[0. , 0. , 0. ], [0. , 0.5, 0.5], [0.5, 0. , 0.5], [0.5, 0.5, 0. ]]) >>> assert (pos==data[1]).all()
- Parameters:
n_decimal_places (int, optional) – The number of decimal places to round each position to for the uniqueness comparison. Ideally this should be set to the number of decimal places included in the CIF file, but
3and4work in most cases. Default value =4additional_columns (str | Iterable[str] | None, optional) – A column name or list of column names from the loop containing the Wyckoff site positions. This data is replicated alongside the atomic coordinates and returned in an auxiliary array. Default value =
Noneparse_mode ({'sympy', 'python_float'}, optional) – Whether to parse lattice sites symbolically (
parse_mode='sympy') or numerically (parse_mode='python_float'). Sympy is typically more accurate, but may be slower. Default value ='python_float'verbose (bool, optional) – Whether to print debug information about the uniqueness checks. Default value =
False
- Returns:
The full unit cell of the crystal structure.
- Return type:
\((N, 3)\)
numpy.ndarray[float]- Raises:
ValueError – If the stored data cannot form a valid box.
ValueError – If the
additional_columnsare not properly associated with the Wyckoff positions.ImportError – If
parse_mode='sympy'and Sympy is not installed.
- property box¶
Read the unit cell as a freud or HOOMD box-like object.
Important
cif.boxreturns box extents and tilt factors, whileCifFile.read_cell_paramsreturns unit cell vector lengths and angles. See the box-like documentation linked above for more details.Example
This method provides a convenient interface to create box objects.
>>> box = cif.box >>> print(box) (3.6, 3.6, 3.6, 0.0, 0.0, 0.0) >>> import freud, hoomd >>> freud.Box(*box) freud.box.Box(Lx=3.6, Ly=3.6, Lz=3.6, xy=0, xz=0, yz=0, ...) >>> hoomd.Box(*box) hoomd.box.Box(Lx=3.6, Ly=3.6, Lz=3.6, xy=0.0, xz=0.0, yz=0.0)
- property lattice_vectors: ndarray[3, 3, float64]¶
The lattice vectors of the unit cell, with \(\vec{a_1}\perp[100]\).
Important
The lattice vectors are stored as columns of the returned matrix, similar to freud to_matrix(). This matrix must be transposed when creating a Freud box or transforming fractional coordinates to absolute.
Example
The box matrix can be used to transform fractional coordinates to absolute coordinates after transposing to row-major form.
>>> lattice_vectors = cif.lattice_vectors >>> lattice_vectors array([[3.6, 0.0, 0.0], [0.0, 3.6, 0.0], [0.0, 0.0, 3.6]]) >>> cif.build_unit_cell() @ lattice_vectors.T # Calculate absolute positions array([[0.0, 0.0, 0.0], [0.0, 1.8, 1.8], [1.8, 0.0, 1.8], [1.8, 1.8, 0.0]])
- Returns:
The lattice vectors of the unit cell \(\vec{a_1}, \vec{a_2},\vec{a_3}\).
- Return type:
\((3, 3)\)
numpy.ndarray
- property loop_labels: list[tuple[str, ...]]¶
A list of column labels for each data array.
This property is equivalent to
[arr.dtype.names for arr in self.loops].
- property symops: ndarray | None¶
Extract the symmetry operations in a parsable algebraic form.
Example
>>> cif.symops array([['x,y,z'], ['z,y+1/2,x+1/2'], ['z+1/2,-y,x+1/2'], ['z+1/2,y+1/2,x']], dtype='<U14')
- Returns:
An array containing the symmetry operations, or None if none are found.
- Return type:
\((N,1)\) numpy.ndarray[str]
- property wyckoff_positions¶
Extract symmetry-irreducible, fractional x,y,z coordinates.
- Returns:
Symmetry-irreducible positions of atoms in fractional coordinates.
- Return type:
\((N, 3)\)
numpy.ndarray
- set_wyckoff_positions(wyckoff_sites)¶
Set the Wyckoff sites in the CIF file data.
This method updates the values of the Wyckoff position coordinates in the corresponding loop structure. The input is a NumPy array of floating point values, which will be converted to strings for storage.
If the provided array has a different number of rows than the existing data, the loop will be resized. When adding new sites, placeholder data (“?”) will be used for non-coordinate columns. When removing sites, rows are removed from the end of the loop.
Danger
Changing the Wyckoff positions may invalidate other keys in the original file, most commonly by changing the
_chemical_formula_sumand space group data. Correct structures will be built when usingbuild_unit_cell(), but use of keys related to structural or chemical data is discouraged once the basis has been modified. Refer to Refining and Experimenting with Structures for further details.- Parameters:
wyckoff_sites (\((N, 3)\)
numpy.ndarray:) – The new Wyckoff site data.- Raises:
ValueError – If the Wyckoff position keys cannot be found in any loop, or if the input array does not have 3 columns.
- property cast_values¶
Whether to cast “number-like” values to ints & floats.
Caution
When set to True after construction, the values are modified in-place. This action cannot be reversed.
- Type:
Bool
- classmethod structured_to_unstructured(arr)¶
Convert a structured (column-labeled) array to a standard unstructured array.
This is useful when extracting entire loops from
loopsfor use in other programs. This classmethod callsnp.lib.recfunctions.structured_to_unstructuredon the input data to ensure the resulting array is properly laid out in memory, with additional checks to ensure the output properly reflects the underlying data. See this page in the structured array docs for more information.- Parameters:
arr (
numpy.ndarray: |numpy.recarray) – The structured array to convert.- Returns:
An unstructured array containing a copy of the data from the input.
- Return type:
- PATTERNS: ClassVar = {'block_delimiter': '(data_)[\t ]*+([^\\n]*+)', 'bracket': '(\\[|\\])', 'comment': '#.*?$', 'key_list': '_\\S+?(?=\\s|$)', 'key_value_general': '^(_\\S+?)\\s++((?s:.)+?)$', 'loop_delimiter': '(loop_)[\t ]*+([^\\n]*+)', 'space_delimited_data': '(;[^;]*?;|\'(?:\'\\S|[^\'])*\'|[^\';\\"\\s]*+)'}¶
Regex patterns used when parsing files.
Caution
This dictionary can be modified to change parsing behavior, although doing is not recommended. Changes to this variable are shared across all instances of the class.
Please refer to the CIF grammar for further details.