"Genetic Algorithm tutorial 001.ps.gz" - читать интересную книгу автора

A Genetic Algorithm Tutorial

Darrell Whitley Computer Science Department, Colorado State University

Fort Collins, CO 80523 [email protected]

Abstract This tutorial covers the canonical genetic algorithm as well as more experimental forms of genetic algorithms, including parallel island models and parallel cellular genetic algorithms. The tutorial also illustrates genetic search by hyperplane sampling. The theoretical foundations of genetic algorithms are reviewed, include the schema theorem as well as recently developed exact models of the canonical genetic algorithm.

Keywords: Genetic Algorithms, Search, Parallel Algorithms

1 Introduction Genetic Algorithms are a family of computational models inspired by evolution. These algorithms encode a potential solution to a specific problem on a simple chromosome-like data structure and apply recombination operators to these structures so as to preserve critical information. Genetic algorithms are often viewed as function optimizers, although the range of problems to which genetic algorithms have been applied is quite broad.

An implementation of a genetic algorithm begins with a population of (typically random) chromosomes. One then evaluates these structures and allocates reproductive opportunities in such a way that those chromosomes which represent a better solution to the target problem are given more chances to "reproduce" than those chromosomes which are poorer solutions. The "goodness" of a solution is typically defined with respect to the current population.

This particular description of a genetic algorithm is intentionally abstract because in some sense, the term genetic algorithm has two meanings. In a strict interpretation, the genetic algorithm refers to a model introduced and investigated by John Holland (1975) and by students of Holland (e.g., DeJong, 1975). It is still the case that most of the existing theory for genetic algorithms applies either solely or primarily to the model introduced by Holland, as well as variations on what will be referred to in this paper as the canonical genetic algorithm. Recent theoretical advances in modeling genetic algorithms also apply primarily to the canonical genetic algorithm (Vose, 1993).

In a broader usage of the term, a genetic algorithm is any population-based model that uses selection and recombination operators to generate new sample points in a search space. Many genetic algorithm models have been introduced by researchers largely working from

1

an experimental perspective. Many of these researchers are application oriented and are typically interested in genetic algorithms as optimization tools.

The goal of this tutorial is to present genetic algorithms in such a way that students new to this field can grasp the basic concepts behind genetic algorithms as they work through the tutorial. It should allow the more sophisticated reader to absorb this material with relative ease. The tutorial also covers topics, such as inversion, which have sometimes been misunderstood and misused by researchers new to the field.

The tutorial begins with a very low level discussion of optimization to both introduce basic ideas in optimization as well as basic concepts that relate to genetic algorithms. In section 2 a canonical genetic algorithm is reviewed. In section 3 the principle of hyperplane sampling is explored and some basic crossover operators are introduced. In section 4 various versions of the schema theorem are developed in a step by step fashion and other crossover operators are discussed. In section 5 binary alphabets and their effects on hyperplane sampling are considered. In section 6 a brief criticism of the schema theorem is considered and in section 7 an exact model of the genetic algorithm is developed. The last three sections of the tutorial cover alternative forms of genetic algorithms and evolutionary computational models, including specialized parallel implementations.

1.1 Encodings and Optimization Problems Usually there are only two main components of most genetic algorithms that are problem dependent: the problem encoding and the evaluation function.

Consider a parameter optimization problem where we must optimize a set of variables either to maximize some target such as profit, or to minimize cost or some measure of error. We might view such a problem as a black box with a series of control dials representing different parameters; the only output of the black box is a value returned by an evaluation function indicating how well a particular combination of parameter settings solves the optimization problem. The goal is to set the various parameters so as to optimize some output. In more traditional terms, we wish to minimize (or maximize) some function F (X1; X2; :::; XM ).

Most users of genetic algorithms typically are concerned with problems that are nonlinear. This also often implies that it is not possible to treat each parameter as an independent variable which can be solved in isolation from the other variables. There are interactions such that the combined effects of the parameters must be considered in order to maximize or minimize the output of the black box. In the genetic algorithm community, the interaction between variables is sometimes referred to as epistasis.

The first assumption that is typically made is that the variables representing parameters can be represented by bit strings. This means that the variables are discretized in an a priori fashion, and that the range of the discretization corresponds to some power of 2. For example, with 10 bits per parameter, we obtain a range with 1024 discrete values. If the parameters are actually continuous then this discretization is not a particular problem. This assumes, of course, that the discretization provides enough resolution to make it possible to adjust the output with the desired level of precision. It also assumes that the discretization is in some sense representative of the underlying function.

2

If some parameter can only take on an exact finite set of values then the coding issue becomes more difficult. For example, what if there are exactly 1200 discrete values which can be assigned to some variable Xi. We need at least 11 bits to cover this range, but this codes for a total of 2048 discrete values. The 848 unnecessary bit patterns may result in no evaluation, a default worst possible evaluation, or some parameter settings may be represented twice so that all binary strings result in a legal set of parameter values. Solving such coding problems is usually considered to be part of the design of the evaluation function.