Latin Hypercube Samples (lhs)
1.0
R, C++, and Rcpp code to generate Latin hypercube samples
|
From the original documentation by Owen:
From: owen@ stat .stan ford .edu
These programs construct and manipulate orthogonal arrays. They were prepared by
- Art Owen
- Department of Statistics
- Sequoia Hall
- Stanford CA 94305
They may be freely used and shared. This code comes with no warranty of any kind. Use it at your own risk.
I thank the Semiconductor Research Corporation and the National Science Foundation for supporting this work.
I thank Randall Tobias of SAS Inc. for many helpful electronic discussions that lead to improvements in these programs.
An orthogonal array
A
is a matrix ofn
rows,k
columns with every element being one ofq
symbols0,...,q-1
. The array has strengtht
if, in everyn
byt
submatrix, theq^t
possible distinct rows, all appear the same number of times. This number is the index of the array, commonly denotedlambda
. Clearly,lambda*q^t = n
. Geometrically, if one were to "plot" the submatrix with one plotting axis for each of thet
columns and one point int
dimensional space for each row, the result would be a grid ofq^t
distinct points. There would belambda
"overstrikes" at each point of the grid.The notation for such an array is
OA( n, k, q, t )
.If
n <= q^(t+1)
, then then
rows "should" plot asn
distinct points in everyn
byt+1
dimensional subarray. When this fails to hold, the array has the "coincidence defect".Owen (1992,199?) describes some uses for randomized orthogonal arrays, in numerical integration, computer experiments and visualization of functions. Those references contain further references to the literature, that provide further explanations. A strength 1 randomized orthogonal array is a Latin hypercube sample, essentially so or exactly so, depending on the definition used for Latin hypercube sampling. The arrays constructed here have strength 2 or more, it being much easier to construct arrays of strength 1.
The randomization is achieved by independent uniform permutation of the symbols in each column.
To investigate a function
f
ofd
variables, one has to have an array withk >= d
. One may also have a maximum value ofn
in mind and a minimum value for the numberq
of distinct levels to investigate.It is entirely possible that no array of strength
t > 1
is compatible with these conditions. The programs below provide some choices to pick from, hopefully without too much of a compromise.The constructions used are based on published algorithms that exploit properties of Galois fields. Because of this the number of levels
q
must be a prime power. That isq = p^r
wherep
is prime andr >= 1
is an integer.The Galois field arithmetic for the prime powers is based on tables published by Knuth and Alanen (1964) below. The resulting fields have been tested by the methods described in Appendix 2 of that paper and they passed. This is more a test of the accuracy of my transcription than of the original tables.
The designs given here require a prime power for the number of levels. They presently work for the following prime powers:
All Primes All prime powers
q = p^r
wherep < 50
andq < 10^9
Here are some of the smaller prime powers:
- Powers of 2: 4 8 16 32 64 128 256 512
- Powers of 3: 9 27 81 243 729
- Powers of 5: 25 125 625
- Powers of 7: 49 343
- Square of 11: 121
- Square of 13: 169
Here are some useful primes:
- 2,3,5,7,11,13,17,19,23,29,31,37,101,251,401
The first row are small primes, the second row are primes that are 1 more than a "round number". The small primes lead to small arrays. An array with 101 levels is useful for exploring a function at levels 0.00 0.01 through 1.00. Keep in mind that a strength 2 array on 101 levels requires
101^2 = 10201
experimental runs, so it is only useful where large experiments are possible.Note that some of these will require more memory than your computer has. For example, with a large prime like 10663, the program knows the Galois field, but can't allocate enough memory:
bose 10663
- Unable to allocate 1927'th row in an integer matrix.
- Unable to allocate space for Galois field on 10663 elements.
- Construction failed for
GF(10663)
.- Could not construct Galois field needed for Bose design.
The smallest prime power not covered is
53^2 = 2809
. The smallest strength 2 array with 2809 symbols has2809^2 = 7890481
rows. Therefore the missing prime powers are only needed in certain enormous arrays, not in the small ones of most practical use. In any event there are some large primes and prime powers in the program if an enormous array is needed.To add
GF(p^r)
for some new prime power p^r, consult Alanen and Knuth for instructions on how to search for an appropriate indexing polynomial, and for how to translate that polynomial into a replacement rule forx^r
.
It is faster to generate only the columns you need. For example
bose 101 4
only generates the first 4 columns of the array, whereasbose 101
generates 102 columns. If you only want 4 columns the former saves a lot of time.Passing the
q n k
on the command line is more difficult than letting the computer figure them out, but it allows more error checking.In practical use, I would try first to use a Bose design. Then I would consider either an Addelman- Kempthorne or Bose-Bush design to see whether it could accommodate the desired number of columns with fewer runs. Obviously this advice depends on the sort of problems I expect to handle. When a very large number of runs is possible a Bush design may work well, since it can have high strength.
Here are the references for the constructions used:
- S. Addelman and O. Kempthorne (1961) Annals of Mathematical Statistics, Vol 32 pp 1167-1176.
- J.D. Alanen and D.E. Knuth (1964) Sankhya Ser. A Vol. 26, pp 305-328
- R.C. Bose (1938) Sankhya Vol 3 pp 323-338
- K.A. Bush (1952) Annals of Mathematical Statistics, Vol 23 pp 426-434
- R.C. Bose and K.A. Bush (1952) Annals of Mathematical Statistics, Vol 23 pp 508-524.
This book provides a large list of orthogonal array constructions:
- Aloke Dey (1985) "Orthogonal Fractional Factorial Designs" Halstead Press
These papers discuss randomized orthogonal arrays, the second is being revised in parallel with development of the software described here:
- A.B. Owen (1992) Statistica Sinica, v2 n2 pp 439-452
- A.B. Owen (199?) Annals of Statistics, to appear "Lattice Sampling Revisited: Monte Carlo Variance of Means Over Randomized Orthogonal Arrays"
- H.D. Patterson (1954) J.R.S.S. B 16, 140-149
These papers discuss Latin hypercube sampling:
- M.D. McKay, W.J. Conover and R.J. Beckman (1979) Technometrics 21, 239-245
- A.B. Owen (1992) J.R.S.S. B 541-551
- H.D. Patterson (1954) J.R.S.S. B 16, 140-149
- M. Stein (1987) Technometrics 29, 143-151
Galois fields are implemented through arrays that store their addition and multiplication tables. Some space could have been saved by using powers of primitive marks in place of the multiplication table. But since the multiplication tables itself is only as large as the smallest possible column in a strength 2 array it was not considered to be a burden. Subtraction and division are implemented through vectors of additive and multiplicative inverses, derived from the tables. The tables for
GF(p^r)
are constructed using a representation of the field elements as polynomials inx
with coefficients as integers modulop
and a special rule (derived from minimal polynomials) for handling products involvingx^r
. These rules are taken from published references. The rules have not all been checked for accuracy, because some of the fields are very large (e.g. 16807 elements).The functions that manipulate orthogonal arrays keep the arrays in integer matrices. This might be a problem for applications that require enormous arrays. The reason for keeping them in memory is that it makes it easier for others to lift out the functions and embed them in applications or to put on a GUI front end. It was also thought that any array that is too large to store in a computer, is likely to be too large to use in integration/experimentation on that same computer. The arrays are generated row by row, so it is not too hard to change the program to place the elements on an output stream as they are computed and do away with the storage.
The functions that test the strength of the arrays may be very far from optimally fast.
When compiling oalib
these preprocessor directives are used: