Powder diffraction/Indexing powder patterns

source: WolfWiki http://wikis.lib.ncsu.edu/index.php/Powder_diffraction/Indexing_powder_patterns

Indexing a pattern

http://wikis.lib.ncsu.edu/index.php/Powder_diffraction/Indexing_powder_patterns

Indexing the pattern is a process where each peak is identified by the (hkl) Miller-index of the lattice plane that it results from and where from a least squares refinement of all positional information available precise values for the lattice parameters are computed.

A successful indexation answers a number of questions:

what is the symmetry of the material
does the pattern really arise from one single (or at least preponderant) phase?
is it possible to find an isostructural material, the pattern (or even structure) of which is known?

Success in indexing is by no means guaranteed from the onset. Success depends on:

the presence or absence of lines from other phases (absence please)
the precision in the value of the angle (high precision please)
the size of the unit cell (not too big please)
the symmetry of the phase (cubic, hexagonal an tetragonal are fine, orthorhombic not too bad, monoclinic gets iffy, triclinic only on a lucky day)

There are a number of computer programs that will do it for you but you still need to know what you are doing because they will usually give you a whole list of possible solutions from which you need to select the correct one. It is also possible to do it by hand particularly using a spreadsheet program like Excel and some experience in fitting by hand is desirable to sharpen your judgment in using the programs

Indexing by hand

Putting lines on reserve

First thing to do: determine the peak positions as precisely as you can. If peaks overlap and separate positions are hard to determine, do collect the information but leave it aside from any refinement at first. The same thing goes for impurity lines: check if lines could result from small amounts of unreacted material or know side products. Only the strongest lines of those should be visible -if at all- and these lines should be held in reserve as well. (They could be overlapping with a line that does come from your phase).

The working variable Q

The best quantity to work with is either sin²θ usually multiplied by 10⁵ or the square scattering vector Q=q². Working with the scattering vector q:

q = 2π(2sinθ/λ)

has the advantage that the wavelength dependence of the scattering angle has been removed. This particularly handy when comparing data from various beam lines (.922Å for X7B, .62Å for X6B, .12Å at Argonne) and X-ray tubes (1.5405Å for copper, ... for cobalt).

So: prepare a list of Q=q² values in a column of your spreadsheet. Intensity values are not critical (they are not related to the unit cell size but to its contents) but they are often important in that sense that weak peaks could be impurity peaks.

How many parameters are needed to describe the values of q² depends on the symmetry:

n	symmetry	Q(khl;ABCDEF)	short	relation to a,c,b
:1	cubic	q² = [h²+k²+l²].A	Q = [M²].A	a= (2π)/√A;
:2:	tetragonal:	q² = [h²+k²]. A +[l²]. C	Q = [HK].A+[L²].C	a= (2π)/√A; c= (2π)/√C
:2:	hexagonal:	q² = [h²+k²+ h.k]. A +[l²]. C	Q = [HK].A+[L²].C	a= (4π)/√(3A); c= (2π)/√C
:3:	orthorhombic:	q² = [h²]A +[k²]B +[l²].C		a= (2π)/√A; b= (2π)/√B; c= (2π)/√C
:4:	monoclinic:	q² = [h²]A +[k²]B +[l²].C + [h.l].D		a.sinβ=(2π)/√A; b=(2π)/√B; c.sinβ=(2π)/√C; cosβ= -D/2√(AC)
:6:	triclinic:	q² = [h²]A +[k²]B +[l²].C+ [h.l].D+ [h.k].E+ [k.l].F		(see Cullity page 501)

In addition there are various condition for which (hkl) can be observed and which cannot, depending on which Bravais lattice the unit cell belongs to. More about that below.

The parameters A,B,C,D,E and F can be called cell parameters but they are not the same as a,b,c,α,β and γ. However, one set of parameters can be converted into the other, as shown in the table.

Notice that it makes no difference whether an index is negative or positive for anything above monoclinic: the squares make sure that a reflection like (-100) and (100) are superposed in the pattern. When indexing we can consider them identical. However, the D,E and F terms of the mono- and triclinic cases do necessitate that we worry about signs. For monoclinic (-211) and (211) are two separate lines whose Q values differ by 2*[hl]D= 4D.

The unknowns of our sudoku

Each line has its own (hkl) values that are initially unknown and so is the symmetry as well as the values of A,B,C,D,E and F (or subset thereof depending on symmetry).

This means the problem is a kind of scientific sudoku: people who like solving puzzles usually do pretty well. Unfortunately the degree of difficulty of the puzzle is not known in advance either, although a pattern that has few peaks at lower angles separated by relatively large empty spaces are usually easier than ones with lots of closely spaced low angle peaks. The latter typically means a big cell with low symmetry.

There is a variety of strategies.

Working your way down the symmetries

Strategy

First assume that the symmetry is cubic, if that does not work because it leaves too many lines unexplained go to tetragonal or hexagonal, then to orthorhombic etc.

Let's start with the simplest case:
If a material is primitive cubic Q= q² = [h²+k²+l²].A = [M²].A.
As h,k,l are integers, the value of [M²] can be:

1,2,3,4,5,6,..,8,9,10,11,12,13,14,..,16,17, etc.

This means that the first line should have [M²]=1, thus its Q value gives us Â, an initial estimate for A. If we divide all the Q values by this estimate, we should approximately get to see the above series. Of course, there are always errors in the Q values, so in reality we might see something like:

1, 1.9923, 3.0032, 3.9899, ...,6.1034,...., 8.1132

This could be 1,2,3,4,6,8 and line 5 might be missing because its intensity might be very low. (Remember: that depends on the atoms inside the unit cell). However, is this really true? Or is the similarity fortuitous?

The need for repetitive regression

The problem we are encountering here is another nasty property of our little sudoku:

The first line is usually something like (100), not (325) because the latter would mean that all lower indexed lines (at lower angles) must somehow have escaped detection and that is unlikely.

Thus, attributing (hkl) values (indexing) is easiest for the first line. However, the estimate Â that we obtain from it is the least precise of all lines!

After all, the positional error ΔQ in the line with [M²]=16 is typically not very different from the line with [M²]=1 but the estimate for A we get from these two lines is:

Â₁= Q₁ / 1

Â₁₆= Q₁₆ / 16

Thus the error in Â₁₆ = ΔQ/16; sixteen times smaller than in Â₁! Clearly we would like to get our estimates out of the higher angle lines, but these are the hardest to identify as long as all we have are the inferior estimates from the fist lines of the spectrum. Therein lies the catch 22 of our sudoku game..

We need to somehow involve the higher lines to improve our estimate Â, but they are not always easily identified. This is true for cubic phases but even more so for the lower symmetries where we have more parameters to estimate. It already helps a lot if we can use a few lower angle lines at once to get a better estimate. We usually do that by linear regression. Spreadsheets have made this job a lot easier which is why indexing by hand is no longer the monastic job it used to be.

depending on the assumed symmetry: create columns for the assigned values of [M²] or [HK] and [L²] etc. (see formulas above) .
try to identify a few lines by giving them [M²] or [HK] and [L²] values.
select a range of n columns and 5 rows (for cubic, hex, tetr: n=2, orth n=3, mono n=4, tri n=6)
type =LINEST( ..range of q².., range of [M²] etc.., 0,1)
enter the formula with Ctrl+Shift+Enter

The 0 in the Linest function ensures that the model applied does not include a constant term (which it should not according to the above formulas, but sometimes such a term does occur if the calibration of the zero point of the angular values θ is not correct.)

Of course we need to identify enough lines to do a regression. There needs to be at least one line more identified with a relatively reliable indexing than there are parameters. (That means that for triclinic we need to guess the (hkl) values of at least 7 lines just to get started!)

Once we have obtained improved values for Â (and/or the other parameters) we can try to identify what the indexing of the following lines can be. It is advisable to add another column in which we calculate the estimated value of Q based on calculation from the estimated cell parameters. Also prepare a column with the deviations between calculated and measured Q.

As new lines are added to the regression, the deviations between the calculated and the measured values should get smaller and the regression fit should get better.

If that does not happen and there are lines we cannot ascribe within reasonable error to a set of indices we may have to drop the symmetry and add one or more parameters (change the model).

If it does happen we should add the newly identified lines and redo the regression to get better values yet for the cell parameters.

We should also observe that the correlation R² of the regression improves when we add the new lines. It is often advisable to compute Fisher's Z = 1/2 ln[(1+R)/(1-R)] and look at that number. It steadily increases rather than converging to unity as R does. (If often get tedious to distinguish R=.99989 from R=.99998).

For lower symmetries it is advisable to revisit the earlier lines each time a regression is redone to see if alternative indexes do not lead to a better fit. For lower symmetries there are often multiple possibilities for indexing because potential lines may lie within precision from each other, particularly at higher angles. Because lines usually do differ in intensity there is usually only one correct assignment, because one strong line dominates. If that is not so the line is usually broadened by overlap. It may be necessary to put such a line on reserve because it may otherwise interfere with the regression game.

Thus we keep doing the following:

using the best estimated values we calculate what the indexing of a higher line might be
we add it to the regression
if it deteriorates the regression, we check if another possibility exist not only for the new line but also for all earlier ones
we continue doing this until we can explain all observed lines.
we then check if we can explain the lines on reserve: do broad lines indeed correspond to expected overlaps?

Systematic extinctions

The [M²] series: 1,2,3,4,5,6,..,8,9 etc. only has missing values like 7 because it is not possible to add three square integers to produce 7. This series holds for primitive cubic lattices. There are also body centered (I) and face centered cubic (F) lattices. In both these cases there are extinction conditions.

For example for I cells the some of h+k+l must be even for the reflection to be observed, the result is that the [M²] becomes:

..,2,..,4,..,6,..,8,..,10,..,12,..14,..16, etc.

The problem here is that the first line is not (100), it is actually (110).

If -ignorant of that fact- we now divide all Q values by the first one Q₁ we should get something like:

...,1,...,2.0032,...,3.103,...,3.989,...,4.923,...,6.0123,...,7.0123

The (almost) 7 value is a dead give away that we are dealing with a bcc case, because 7 cannot occur in a primitive series. Thus we should multiply everything with 2 to get the proper series. Of course the series would get a lot nicer if we would base our estimate of A on more than just the first line.

In fcc cases we have a similar problem. For this lattice the hkl values have to be either all odd or all even. Again (100) is forbidden as 1 is odd but 0 is even. The first observed line is now (111) (M²=3; all odd) and the second one (200) (M²=4; all even). The resulting series looks like:

3,4,8,11,12,16,19,20 etc.

If we simply divide by Q₁ we get something like:

1, 1.3423, 2.675, 3.712, 3.989,

This does not seem to make sense at all until we multiply everything by 3! Again the systematic extinction causes the first line to be something else than (100).

Lower symmetry lattices also have various Bravais lattices. Tetragonal has I, some of the hexagonal groups have R, orthorombic has F,I and C, monoclinic has C centering. Each of these have their own systematic extinctions.

Other strategies and handy tricks

Zone finding

Another way of getting improved values is zone finding. Regardless of what the symmetry or the Miller index is one can always say that:

Q(2h,2k,2l) = 4.Q(hkl)

Q(3h,3k,3l)= 9 Q(hkl)

In general the n^th harmonic of a reflection has a Q value that is n² larger.

This allows us to get a better Q value for one of the low lines provided we can find a few of its harmonics.

Series hunting

As shown above cubic lattices have their characteristic series. To some extent that is also true for the a,b planes of hexagonal and tetragonal:

tetragonal: [h²+k²] = 1,2,4,5,8,9,10,13,16,17,18,..

hexagonal: [h²+k²+ hk] = 1,3,4,7,9,12,..

The c-axis produces its own series:

[l²] = 1,4,9,16,25,..

If the shape of the unit cell is highly anisotropic one of the two series dominates. E.g. if the c-axis is long the first few reflections may well just come from the axis-series. Often such materials produce needle-like crystals. The other extreme is platelets wich show mostly the (a,b) series. However there is a caveat here: there is often preferential orientation in the sample in such cases that depending on the sample geometry might enhance one series and suppress the other. E.g. in the case of needles in a capillary perpendicular to the beam (as in the INEL) the (potentially many) l- reflections are suppressed.

Another point to be aware of is that non-symmorphic rotational symmetry like screw axes and glides cause extinctions particularly of axis-reflections. e.g. in P6₁ only every sixth (00l) reflection will be visible.

The addition and subtraction game

The Q values are often additive. Let's consider the orthorhombic case:

Q₁₀₀ = A

Q₀₂₀ = 4B

Q₁₂₀ = A+ 4B

Clearly :Q₁₂₀ = A+ 4B =Q₁₀₀ + Q₀₂₀

This means that if (100) is missing (possible extinct for reasons of symmetry), we can get an estimate for A by taking the difference between Q₁₂₀ and Q₀₂₀.

This hold true for all values of k:

Q₁₃₀ - Q₀₃₀ = A

Q₁₄₀ - Q₀₄₀ = A

Q₁₅₀ - Q₀₅₀ = A

This means that if we prepare a table of all possible differences ΔQ we should see the value of A (and 4A and 9A etc.) pop up again and again. Unfortunately you also get an awful lot of numbers that do not mean anything, so it is easy to get lost in all the numbers.

Generating hkl lists

During the regression game it is often very useful to have complete list of possible hkl values at hand prefereably showing the calculated value of Q based on the the current best estimate of the cell parameters. Generating such a lits by hand is tedious in Excel but with a modicum of Visual Basic programming it's quite easy.

Going to the VBA IDE

To go to the Visual Basic IDE: type Alt+F11. (This works on most computers unless some other prgram gets activated by its such as your webcam.. In that case

in Excel prior to 2007: go to Tools, Macro and on the pop up opt for visual basic
in Excel 2007 first make sure the Developer is on your ribbon. Click the big office button of the top left, click Excel options and tick off the Show Developer tab on the ribbon option. Then go back and go to developer there should be an icon to go to the VB IDE now.

Once in the IDE you can see your current projects on the left (if not go to view and ask for the project explorer) Click on the workbook you want the code to be put into in the project explorer window. Then go to insert and insert a module in the current workbook. An empty window should open in the middle of the screen. This is where you can write your programs. The following code generates all hkl values up to (888) under the condition that the sum of the indices is even (i.e. I-centering):

Sub Icentered()
Sheets.Add
a = 0
For h = 0 To 8
 For k = h To 8
  For l = 0 To 8
  If Int((h + k + l) / 2) = (h + k + l) / 2 Then
   a = a + 1
   Cells(a, 1) = h
   Cells(a, 2) = k
   Cells(a, 3) = l
  End If
  Next l
 Next k
Next h
End Sub

Sub Fcentered()
Sheets.Add
a = 0
For h = 0 To 8
 For k = h To 8
  For l = 0 To 8
   hev = Application.IsEven(h)
   kev = Application.IsEven(k)
   lev = Application.IsEven(l)
   alleven = hev And kev And lev
   allodd = (Not hev) And (Not kev) And (Not lev)
   If alleven Or allodd Then
    a = a + 1
    Cells(a, 1) = h
    Cells(a, 2) = k
    Cells(a, 3) = l
   End If
  Next l
 Next k
Next h
End Sub

Just copy and paste one of these to try and see what happens when you hit F5 to run the module. There is an icon top left just under 'file' with the Excel X' symbol that takes you back to the spreadsheet.

As you see the code is pretty simple! I am sure your can adapt it as the need arises. First a new worksheet is generated to put everything in. The variable a is just a counter that counts the reflections being generated. Then there are three nested loops that increment h,k and l from zero to 8. Here I let k start from the current value of h because (120) and (210) end up at the same Q value for tetragonal, so I only need the ones for which k>l.

Then there is an if statement that only allows output if the sum h+k+l is even for body centered. For face centered the indices must either all even or all odd, which is guaranteed with a bit of boolean algebra. The cells statement puts the output in the new sheet.

Once you have this list it is not hard to add some columns in which you use the master formulas and the currently best cell parameters to calculate Q for each reflection. Subsequently sort the whole thing by Q and you have a complete list of where to expect reflections based on your current estimates. If you change the estimates you might have to sort again because some lines might swap positions.

Logarithmic plotting

In the old days you could buy specially prepared logarithmic charts for indexing visually for the hexagonal and tetragonal cases. Such charts are known as Bunn charts and Hull-Davey charts. They have probably not been used in many years, but they are pretty easy to generate in a spreadsheet. The idea of a Hull-Davey chart is based on plotting the logarithm of Q against all the allowed values of some function of (hkl). For example for cubic:

Q= [h²+k²+l²].A

lnQ= ln[h²+k²+l²]+lnA

Thus if we plot lnQ against ln[h²+k²+l²] we should get a straight line of slope unity and an intercept of lnA. In other words: if we prepare a list of values of ln[h²+k²+l²] and one of lnQ values we should be able to make them coincide if we shift one of them by the appropriate amount (being lnA).

For hexagonal and tetragonal cases there is an additional parameter. One way to deal with that is to introduce an anisotropy parameter ξ=c/a. If this ratio is unity you have a cubic (or pseudo-cubic cell). You can rewrite the master equation for tetragonal as:

lnQ = lnA + ln[h²+k²+{l²/ξ²}]

So now we can plot the quantity ln[h²+k²+{l²/ξ²}] as a function of the anisotropy parameter ξ for each reflection and try to make our list of lnQ values match by shifting both horizontally (ξ) and vertically (lnA). We could easily superpose our measured lnQ values on the vertical scale of the graph and make them shift horizontally (by changing the x coordinate) and vertically (by adding a constant term to lnQ) until we have a match.

Once you have generated a suitable list of possible reflections it is not hard to generate a Hull-Davey plot. All you need to do is generate a row of ξ values at the top of the sheet and fill a table with the above formula. Make sure you reference to the right cells using dollar signs to prevent a change in reference when copying the formula. The button F4 allows you to toggle between A7, $A$7, A$7 and $A7

Although such graphic methods are now pretty feasible given the abilities of modern spreadsheets they have been almost completely abandoned in favor of indexing programs like TREOR etc. However, it should be kept in mind that such programs are not nearly as user interactive as a spreadsheet is. Most of them were written in the 80's and are console based. They also often fail to give results if there are shortcomings in the data like impurity peaks. Interpreting the output of the program then often becomes a sudoku in its own right.

Pseudo-cells

Sometimes the cubic indexing almost works, at least at low q values, but at higher values things start to fall apart, e.g. there are two lines you could call M²=19. This is a sure sign that the symmetry is actually lower but the shape of the the cell is almost cubic. It could be e.g. a tetragonal cell with ξ=c/a very close to one. This means that for lower reflections the distance between says (102) and (201) is too small to result in separate peaks, but the higher reflections do get resolved. This is not an easy case because the lower lines are now almost all broadened doublets ans the measured Q value is less precise. The best thing is to first treat the pattern as cubic to get a rough average of a and c and then only take the higher lines for which you can assign tetragonal indices to refine the tetragonal cell. To really get good cell parameter values the best thing is to do profile fitting with a Rietveld program but then then you must have a structure model.

On the other hand, if the distortion from higher symmetry is a bit bigger this may help to go down the symmetries more easily. The lines will have split enough that they are distinct rather than overlapping but they are close enough that you can still identify them as pairs or triplets. In first instance you could assign e.g. cubic indices, but you need to attribute e.g. M²=16 twice or thrice. You then figure out which one is e.g. (400) and which (004) is a tetragonal setting or (400),(040) and (004) in a orthorhombic one. In such cases the pseudo-cubic indexing actually helps you to narrow down the possibilities for the lower symmetry indices.

Leftovers

Sometimes you get a great fit to the vast majority of strong lines, but there are some weak lines you just cannot identify. They could of course come from an impurity that you have not been able to identify, either because it too is unknown or because some of its lines coincide with those of your main phase and that makes it hard to use the Hanawalt method e.g. However, there could also be another explanation: if your phase has a superstructure this can lead to extra lines because the real unit cell is bigger than that of the subcell. Often you need to double one of the axes to explain the lines, but if on top of that the superstructure is incommensurate the superstructure lines may not be in a simple relationship with the substructure lines even though they do come from one and the same phase. This means that leftover lines do not prove you have an impurity.