## Predictive Modelling

Although a GIS is a perfect way of visualising data and producing maps from that data, GIS also allows you to create new data using statistically based gridding techniques or predictive maps using spatial data modelling techniques. This modelling is where businesses can really add value using their data rather than just passively using it to generate maps and figures. Kenex considers that the real value in your data, once converted to digital form, is created using these modelling techniques.

Basic statistical gridding allows you to predict unknown values from within a single layer such as topography, geochemistry, hydrology or climate data. However the real power of GIS is when spatial modelling is applied to combine several layers to predict outcomes based on probability such as:

- Mineral prospectivity
- Renewable energy project planning
- Agricultural sustainability
- Geotechnical risk
- Environmental risk
- Conservation planning

Spatial modelling uses multiple layers or themes related to the object or occurrence being searched for to statistically predict areas where it is most likely to be found. For example, conservation workers may be trying to find and protect a rare animal. They can use existing digital data about the conditions of the region they are searching in (e.g. vegetation types) along with their knowledge of the animals preferred habitat to model areas where the animal is likely to befound. This animal might always live in an alpine climate, mostly be found in rock bluffs or boulder piles, and might commonly be found on north-facing slopes. Spatial modelling can then use themes of climate, land cover and topographic slope to identify places where the rare animal might live.

The key to the success of spatial data modelling is related to the way it takes into account of how strongly a particular theme is related to the occurrences being modelled (i.e. what the probability is of an occurrence happening in the area of that theme) and then combines all the themes weighted according to their importance to make a prediction. When the three themes (soil, slope, and environment) are combined a probability map (like the one on the left here) is produced that shows the probability of an occurrence in any given area. The probability map then allows you to rank the likely occurrence of the animal. The modelling and ranking would then allow the conservation worker to focus their conservation efforts into the areas where the animal is most likely to be found.

As you can imagine, spatial modelling can be a very powerful tool that can be applied widely. Since the initial use of Weights of Evidence modelling in medical diagnosis and research, spatial data modelling has been successfully applied to mineral prospectivity, forestry, conservation, petroleum exploration, landslide occurrence, and could even be used to locate ideal housing and community locations for new families!

Kenex specialises in all aspects of spatial modelling and importantly are experienced in its application in a business environment. Kenex also has been involved in the research and application of spatial data modelling to digital mapping software and regularly runs workshops at international conferences.

The simplest type of predictive spatial analysis is where maps, with the chosen input variable(s) represented by a series of integer values, are combined together using arithmetic operators. This type of analysis takes no account of the relative importance of the variables being used and is based on expert opinion. Fuzzy Logic techniques address the problem of the relative importance of data being used, but this technique still relies on expert opinion to derive weights that rank the relative importance of the variable for the map combination. Weights of Evidence, in contrast uses statistical analysis of the map layers being used with a training data to make less subjective decisions on how the map layers in any model are combined. Neural network techniques have been developed to mimic the thought process of the human brain and are entirely data driven techniques that are difficult to interpret. More details of the particular techniques and their application are given in our links page.

The spatial modelling is based on these three techniques are outlined below:

## Weights of Evidence modelling

Weights of Evidence is a Bayesian statistical approach that allows for the analysis and combination of data to predict the occurrence of events. It is based on the presence or absence of a characteristic or pattern and the occurrence of an event. The technique was initially developed as a diagnostic tool in medicine. In spatial analysis, it has been used extensively in the exploration and mining fields.

An estimate of the (prior) probability of the occurrence of a training dataset to the map pattern being modelled can be calculated from the total number of occurrences distributed over the region being targeted divided by the area of that region. Two probabilities can be computed for each class in the themes of the model. For each class, a W+ probability value is computed from the presence of a feature (or training point) in the class area and a W- probability value from its absence from the class area. The contrast value C is calculated from the difference between the two and can be used as measure of correlation strength between the theme being tested and the occurrence of the feature being modelled e.g. the correlation between the theme of Alpine Tussock and a rare Powelliphanta New Zealand Land Snail. A unit area is chosen that represents the potential area extent of the occurrences being modelled and is used as a grid for the spatial calculations. A probability or statistical value of importance can then be calculated for all variables that are to be input into the model. This probability is based on the prior probability and the presence or absence of the variable in question. The odds of occurrence (logits) are then used to combine the various statistically valid variables that represent the model to produce a probability map.

## Fuzzy Logic modelling

Fuzzy Logic deals with the concept of 'partial truth', i.e. truth values between completely true and completely false. It was introduced by Dr. Lotfi Zadeh of UC/Berkeley in the 1960's as a means to model the uncertainty of natural language. Zadeh says that rather than regarding fuzzy theory as a single theory, we should regard the process of fuzzification as a methodology to generalize any specific theory. Thus researchers have also introduced 'fuzzy calculus', 'fuzzy differential equations', and so on.

Fuzzy Logic is a popular and easily understood method for combining mineral exploration datasets using subjective judgment. Each exploration dataset to be used is weighted using a fuzzy membership function, which expresses the degree of importance of the various map layers as predictors of the deposit type under consideration. Themes may be combined by a variety of fuzzy combination operators (fuzzy AND, fuzzy OR, fuzzy gamma, etc) according to a scheme that may be represented with an inference network. The output from the fuzzy logic module is a map showing mineral favourability, combining the effects of the input evidential themes. No prior knowledge of mineral occurrence locations is required, so this method complements the 'data-driven' weights of evidence method, which requires that a set of training points (mineral occurrences) be known within the study area.

Example of a Fuzzy Logic Decision Tree

## Neural Network modelling

Neural Network analysis is a popular method used for multivariate prediction and two techniques are currently used.

**Self-organizing Neural Network** automatically groups the sample of input feature
vectors into classes. This classification process extracts knowledge from the
data in that known properties of a member of a class usually belong to other
members. The process is iterative in that a coarse arbitrary grouping is made
initially for which iterations change and refine it until the clusters do not
change further. Each output feature vector has an associated fuzzy value of
membership in one or more clusters.

**Fuzzy Neural Network** uses extra weights and relationships between variables to
better model the output as a function of the inputs. It requires the user to
associate the output values with the appropriate input feature vectors
(fuzzy-membership values) and present all of the associated data to the network.
The network then learns these input-output associations and will interpolate any
given input feature vector in terms of the learned ones to provide an output
fuzzy-membership value or vector (the combined data membership value or values).
The main advantage of neural networks over the fuzzy logic, weights of evidence
and logistic regression methods is that nonlinear relationships can be more
readily modelled.