Reconstrucing Noisy Unit Cells

Diffraction experiments and other experimental techniques for quantifying structure typically offer limited precision in the measurements that can be made. As a result, the Wyckoff position data recorded in some CIF files – particularly older ones – may make reproduction of the original structure challenging. In this example, we explore how parsnip’s build_unit_cell method can be tuned to accurately reproduce structures with complicated geometries, using alpha-Selenium as an example.

# A header describing this portion of the file
data_cif_Se-hP3

_chemical_name_mineral 'alpha-Selenium'
_chemical_formula_sum 'Se'

# Key-value pairs describing the unit cell (Å and °)
# Note the cell angles 90-90-120 indicate a hexagonal structure.
_cell_length_a    4.36620
_cell_length_b    4.36620
_cell_length_c    4.95360
_cell_angle_alpha 90.00000
_cell_angle_beta  90.00000
_cell_angle_gamma 120.00000

loop_
_space_group_symop_id
_space_group_symop_operation_xyz
1 x,y,z
2 -y,x-y,z+1/3
3 -x+y,-x,z+2/3
4 x-y,-y,-z+2/3
5 y,x,-z
6 -x,-x+y,-z+1/3

loop_
_atom_site_type_symbol
_atom_site_symmetry_multiplicity
_atom_site_Wyckoff_label
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
Se   3 a 0.22540 0.00000 0.33333

Note that the basis positions for alpha-Selenium are provided to five decimal places of accuracy, while the symmetry operations are provided in a rational form.

>>> from parsnip import CifFile
>>> cif = CifFile("hP3.cif")
>>> # Let's make sure we reconstruct the unit cell's three atoms
>>> correct_uc = cif.build_unit_cell()
>>> correct_uc
array([[0.2254    , 0.        , 0.33333   ],
       [0.        , 0.2254    , 0.66666333],
       [0.7746    , 0.7746    , 0.99999667]])
>>> site_multiplicity = int(cif["_atom_site_symmetry_multiplicity"].squeeze())
>>> assert len(correct_uc) == site_multiplicity

parsnip’s default settings are able to correctly reproduce the unit cell – but the mismatch between numerical data and the symmetry operation strings can cause issues. If we truncate the Wyckoff position data, even by one decimal place, the reconstructed crystal contains duplicate atoms:

--- /home/docs/checkouts/readthedocs.org/user_builds/parsnip-cif/checkouts/124/doc/source/hP3.cif
+++ /home/docs/checkouts/readthedocs.org/user_builds/parsnip-cif/checkouts/124/doc/source/hP3-four-decimal-places.cif
@@ -30,4 +30,4 @@
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
-Se   3 a 0.22540 0.00000 0.33333
+Se   3 a 0.2254  0.0000  0.3333

Rebuilding our crystal results in an error:

>>> lower_precision_cif = CifFile("hP3-four-decimal-places.cif")
>>> uc = lower_precision_cif.build_unit_cell()
>>> uc
array([[0.2254    , 0.        , 0.3333    ],  # A
       [0.        , 0.2254    , 0.66663333],  # B
       [0.7746    , 0.7746    , 0.99996667],  # C
       [0.2254    , 0.        , 0.33336667],  # A
       [0.        , 0.2254    , 0.6667    ]]) # B
>>> uc.shape == correct_uc.shape # Our unit cell has duplicate atoms!
False

By default, parsnip uses four decimal places of accuracy to reconstruct crystals. This yields the best overall accuracy (tested with several thousand CIFs), but is not always the best choice in general. A good rule of thumb is to use one fewer decimal places than the CIF file contains. This ensures positions are rounded sufficiently to detect duplicate atoms, and avoids issues in complex structures where Wyckoff positions may be very close to one another. Making this change results in the correct structure once again.

>>> cif = CifFile("hP3-four-decimal-places.cif")
>>> four_decimal_places = cif.build_unit_cell(n_decimal_places=3)
>>> four_decimal_places
array([[0.2254    , 0.        , 0.3333    ],
       [0.        , 0.2254    , 0.66663333],
       [0.7746    , 0.7746    , 0.99996667]])
>>> assert four_decimal_places.shape == correct_uc.shape

Important

Rounding of Wyckoff positions is an intermediate step in the unit cell reconstruction, and does not negatively impact the accuracy of the returned data. The unit cell is always returned in the full precision of the input CIF:

>>> cif = CifFile("hP3-four-decimal-places.cif")
>>> one_decimal_place = cif.build_unit_cell(n_decimal_places=1)
>>> np.testing.assert_array_equal(one_decimal_place, four_decimal_places)

Symbolic Parsing

In some cases, particularly in structures with many atoms, careful tuning of numerical precision is not enough to accurately reproduce a crystal. parsnip includes a specialized parser that uses rational arithmetic to correctly compare fractions that only differ by a few units in last place. To enable this, install the sympy library and set parse_mode="sympy" when building the unit cell.

>>> cif = CifFile("hP3.cif")
>>> symbolic = cif.build_unit_cell(n_decimal_places=4, parse_mode="sympy")
>>> symbolic
array([[0.2254    , 0.        , 0.33333  ],
       [0.        , 0.2254    , 0.6666633],
       [0.7746    , 0.7746    , 0.99999667]])
>>> assert symbolic.shape == correct_uc.shape