Lecture 7: Understanding raster data and cell-based modeling

 

(Modified from Chapters 4, 5, & 7 in “Using ArcGIS Spatial Analyst”)

 

1.   Understanding a raster dataset

2.   Coordinate space and the raster dataset

3.   Discrete and continuous data

4.   The resolution of a raster dataset

5.   Raster encoding

6.   Representing features in a raster dataset

7.   Assigning attributes to a raster dataset

8.   Using feature data directly in Spatial Analyst

9.   Deriving raster datasets from existing maps

 

Lecture Objectives:

·       About the structure of raster datasets

·       The importance of coordinate space and raster datasets

·       The difference between discrete and continuous types of raster datasets

·       About the resolution or cell size when creating a raster dataset

·       How raster datasets are encoded and how points, lines, and polygons are represented as cells

·       Other issues you need to be aware of, such as when adding other attributes to raster datasets and creating raster datasets from existing maps

 

1. Understanding a raster dataset

·       Raster data is generally divided into two categories: thematic data and image data.

·       The values in thematic raster data represent some measured quantity or classification of a particular phenomena such as elevation, pollution concentration, or population.

·       For example, in a landcover map the value 5 may represent forest, and the value 7 may represent water.

·       The values of cells in an image represent reflected or emitted light or energy such as that of a satellite image or a scanned photograph.

·       The analysis tools of Spatial Analyst are primarily intended for use on thematic raster data.

·       ERDAS Imagine will be used in the next section of the course to analyze satellite imagery

·       All Spatial Analyst functions process the first band of any raster dataset. This section will provide an overview of raster data and how it is created.

 

 

The composition of a raster dataset

·       Raster datasets describe the location and characteristics of an area.

·       Raster datasets describe a single theme (landsuse, soils, roads, elevation, etc.) and provide complete coverage of an area through cells.

 

 

The cell

·       Cells, or pixels, make up a raster dataset

·       All cells in a raster must be the same size

·       Cells can be any size you desire

 

 

Rows and columns

·       Cells are arranged in rows and columns where rows represent the x-axis of a Cartesian plane and the columns the y-axis

·       Each cell has a unique row and column address

 

Values

·       Each cell has a specific values assigned to it

·       Values can represent magnitude, distance, or relationship of the cell on a continuous surface (e.g., elevation, soil, aspect)

·       Values can also represent categorical data such as soil type, soil texture, or landuse class

·       Both integer and floating-point values are supported in Spatial Analyst.

o      Integer values are best used to represent categorical data, and floating-point values to represent continuous surfaces (elevation, slope, flow accumulation).

o      Floating-point data takes up a lot more disk space and is more difficult to handle

o      8-bit integer can store 256 values, so often it makes sense to recalculate floating point values into integer values.

 

Zones

·       Any two or more cells with the same value belong to the same zone.

·       A zone can consist of cells that are connected, disconnected, or both.

·       Zones, therefore, are NOT analogous to polygons, which are only contiguous closed areas.

·       Zones whose cells are connected usually represent single features of an area, such as a building, a lake, a road, or a power line.

·       Assemblages of entities, such as forest stands in a state, soil types in a county, or the single-family houses in a town, are features of an area that will most likely be represented by zones made up of many disconnected groups of connected cells.

·       Every cell in a raster belongs to a zone.

·       Some raster datasets contain only a few zones, while others contain many.

 

 

Regions

·       Each group of connected cells in a zone is considered a region.

·       A zone that consists of a single group of connected cells has only one region.

·       Zones can be composed of as many regions as are necessary to represent a feature; the number of cells that make up a region has no practical limits.

·       Spatial Analyst provides the tools needed to turn regions into individual zones.

·       In the raster dataset above, Zone 2 consists of two regions, Zone 4 of three regions, and Zone 5 of only one region.

 

NoData

·       If a cell is assigned the NoData value, then either no information or insufficient information about the particular characteristics of the location the cell represents is available.

·       The NoData value, sometimes also referred to as the null value, is treated differently from any other value by all operators and functions.

 

 

·       Cells with NoData values are processed in one of two ways:

1.     Assigning NoData to the output cell location if the NoData value exists:

o      for the location on any of the inputs in an operator or local function,

o      in its neighborhood in a focal function,

o      or in its zone in a zonal function.

2.     Ignoring the NoData cell and completing the calculations with all valid values.

 

·       The second option, to ignore the NoData cell, is not possible when using operators between two datasets or with local functions.

·       When a NoData cell is within the neighborhood of a cell in a focal function or a zone of a zonal function, by default, the sum, median, variety, majority, or minority of all cells with known values can be calculated and assigned to the output raster dataset (this default can be overridden). 

 

The associated table

·       Integer (categorical) raster datasets usually have an attribute table associated with them (VAT = value attribute table).

·       The first item in the table is Value, which stores the value assigned to each zone of a raster.

·       A second item, Count, stores the total number of cells in the dataset that belong to each zone. Both Value and Count are mandatory items.

·       An essentially limitless number of optional items can be incorporated into the table to represent the other attributes of the zone.

 

 

Name

·       Converting a raster dataset from a nonreal-world coordinate system (image space) to a real-world coordinate system is called georeferencing.

·       For a raster dataset, the orientation of the cells is determined by the x- and y-axes of the coordinate system.

·       Cell boundaries are parallel to the x- and y-axis, and the cells are square in map coordinates.

·       Cells are always referenced by an (x,y) location in map coordinate space and never by specifying a row and column location.

·       The x,y Cartesian coordinate system associated with a raster dataset that is in a real-world coordinate space is defined with respect to a map projection.

·       Map projections transform the three-dimensional surface of the earth to allow the raster to be displayed and stored as a two-dimensional map.

·       The process of rectifying a raster dataset to map coordinates or converting a raster dataset from one projection to another is referred to as geometric transformation.

 

2. Coordinate space and the raster dataset

 

Georeferencing a raster dataset

·       To georeference a raster dataset from image space to a real-world coordinate system, you need to know the location of recognizable features in both coordinate spaces.

 

Polynomial transformation

·       A polynomial transformation computed using the specified control points is applied so the input locations approximate the specified output locations using a least-square fit.

 

Projecting raster datasets

·       The cells of a raster dataset will always be square and of equal area with respect to the Cartesian coordinate system (map coordinate space) associated with the raster dataset.

·       The shape and area a cell represents on the surface of the earth will never be constant across a raster dataset.

·       Since the area represented (on the face of the earth) by the cells will vary across the raster dataset, the output cell size and the number of rows and columns may change when projected. Converting from one projection to another can also change the shape and area a cell represents on the surface of the earth.

·       Each projection treats the relationship between a three-dimensional world and a two-dimensional one differently.

·       You should be aware of the properties and assumptions for each projection before selecting one. When displaying and performing analysis with raster datasets, they should be in the same coordinate space and in the same projection.

·       If two raster datasets are in different coordinate systems, the values of the coordinates are on different scales.

·       Errors will occur when comparing such datasets because they will represent different locations.

 

Geometric transformation

·       When you rectify a raster dataset, project it, convert the raster dataset from one projection to another, or change the cell size, you are performing a geometric transformation.

·       Geometric transformation is the process of changing the geometry of a raster dataset from one coordinate space to another.

·       Because the cell centers of the output cells do not match the cell centers of the input raster, a resampling technique must be used to derive a value for the center on the output raster

 

Resampling is the process of determining new values for cells in an output raster that result from applying a geometric transformation to an input raster dataset.

 

The three techniques for determining output values are nearest neighbor assignment, bilinear interpolation, and cubic convolution.

 

Nearest neighbor assignment

·       Nearest neighbor assignment determines the location of the closest cell center on the input raster and assigns the value of that cell to the cell on the output raster.

·       Always used for categorical data (nominal or ordinal) – why?

 

Bilinear interpolation

·       Bilinear interpolation uses the value of the four nearest input cell centers to determine the value on the output raster.

·       The new value for the output cell is a weighted average of these four values, adjusted to account for their distance from the center of the output cell in the input raster.

·       This interpolation method results in a smoother-looking surface than can be obtained using nearest neighbor.

·       Preferred for continuous data (elevation, slope, salinity, etc.)

 

Cubic convolution

·       The weighted average is calculated from the 16 nearest input cell centers and their values.

·       Cubic convolution will have a tendency to sharpen the data more than bilinear interpolation since more cells are involved in the calculation of the output value.

 

3. Discrete and continuous data

·       It is important to understand the type of data you are modeling, whether it is continuous or discrete, when making decisions based on the resulting values.

·       Discrete data, which is sometimes called categorical or discontinuous data, mainly represents objects in both the feature and raster data storage systems.

 

 

·       A discrete object has known and definable boundaries (ie., a lake, a building)

·       A continuous surface represents phenomena where each location on the surface is a measure of the concentration level or its relationship from a fixed point in space or from an emitting source.

·       Continuous data is also referred to as field, nondiscrete, or surface data.

·       One type of continuous surface is derived from those characteristics that define a surface, where each location is measured from a fixed registration point.

o      elevation

o      aspect

 

·       Another type of continuous surface includes phenomena that progressively vary as they move across a surface from a source.

o      fluid and air movement.

o      characterized by the type or manner in which the phenomenon moves.

o      diffusion or any other locomotion where the phenomenon moves from areas with high concentration to those with less concentration, until the concentration level evens out.

o      salt concentration moving through either the ground or water, contamination level moving away from a hazardous spill or a nuclear reactor, and heat from a forest fire. In this type of continuous surface, there has to be a source.

o      concentration is always greater near the source, and diminishes as a function of distance and the medium the substance is moving through.

 

·       The means of locomotion affect the surface concentration.

·       Other locomotion surfaces include dispersal of animal populations and the spreading of a disease.

·       When representing and modeling many features, the boundaries are not clearly continuous or discrete.

·       A continuum is created in representing geographic features, with the extremes being pure discrete and pure continuous features.

·       Most features fall somewhere between the extremes. Illustrations of features that fall along the continuum are soil types, edges of forests, boundaries of wetlands, and geographic markets influenced from a television advertising campaign.

·       The determining factor for where a feature falls on the continuous-to-discrete spectrum is the ease in defining the feature’s boundaries.

·       No matter where on the continuum the feature falls, the grid-cell storage can represent it to a greater or lesser accuracy.

·       The validity and accuracy of boundaries of the input data must be understood.

 

                  

 

4. The resolution of a raster dataset

 

·       The size chosen for a raster cell of a study area depends on the data resolution required for the most detailed analysis.

·       The cell must be small enough to capture the required detail, but large enough so that computer storage and analysis can be performed efficiently.

·       The more homogeneous an area is for critical variables such as topography and landuse, the larger the cell size can be without affecting accuracy.

·       Before specifying the cell size, the following factors should be considered:

·       The resolution of the input data

·       The size of the resultant database and disk capacity

·       The desired response time

·       The application and analysis that is to be performed

·       A cell size finer than the input resolution will not produce more accurate data than the input data.

·       It is generally accepted that the resultant raster dataset should be the same or coarser than the input data.

·       Spatial Analyst allows for raster datasets of different resolutions to be stored and analyzed together in the same database

 

 

 

 

5. Raster encoding

·       The process of creating a raster dataset is like draping a fishnet containing square cells over the study area.

·       A code is assigned to each cell according to the feature that is at the center of the cell.

·       The code or value of a cell is a numeric value that corresponds to an attribute type.

·       Numeric values speed processing and allow for data compression.

·       Each cell represents a specified portion of the world.

·       The main consideration is that the size be appropriate for the analysis

 

6. Representing features in a raster dataset

 

When converting points, polylines, and polygons to a raster, you should be aware of how the raster dataset will represent the features.

 

Linear data

·       Linear data is all of those features that, at a certain resolution, appear only as a polyline such as a road, a stream, or a power line.

·       A line by definition does not have area.

·       In Spatial Analyst, a polyline can be represented only by a series of connected cells.

·       As with a point, the accuracy of the representation will vary according to the scale of the data and the resolution of the raster dataset.

 

Polygon data

·       Polygonal or areal data is best represented by a series of connected cells that best portrays its shape.

·       Trying to represent the smooth boundaries of a polygon with a series of square cells does present some problems, the most infamous of which is called the jaggies, an effect that resembles stair steps.

 

Point data

·       A point feature is any object at a given resolution that can be identified as being without area. Although a well, a telephone pole, or the location of an endangered plant are all features that can be rendered as points at some resolutions, at other resolutions they do in fact have area.

·       Point features are represented by the smallest unit of a raster, a cell. It is important to remember that a cell has area as a property.

·       The smaller the cell size, the smaller the area and thus the closer the representation of the point feature.

·       Points with area have an accuracy of plus or minus half the cell size.

·       This is the trade-off that must be made when working with a cell-based system. Having all data types: points, polylines, and polygons in the same format and being able to use them interchangeably in the same language are more important to many users than a loss of accuracy.

·       The accuracy of the above representation is dependent on:

o      the scale of the data

o      the size of the cell.

 

 

7. Assigning attributes to a raster dataset

 

·       The value associated with a cell is an identifier that defines to which class, group, category, or member the cell belongs.

·       There is usually a one-to-many relationship between the cell values (or codes) and the number of cells that are assigned the code.

·       The field you use in the conversion process will affect the analysis that you can perform on the dataset.

·       If you have a polygon feature dataset that contains landuse type and owner for each parcel in a town, you can use either attribute.

·       If you use landuse type you will be able to ask questions, such as Where are all the agricultural areas that are available for building?

·       However, you cannot add (join) the owner attribute to the raster dataset because this is a many-to-one relationship.

·       You can also relate the landuse type to the relational table because each parcel will have one landuse type.

·       Where this logic collapses is when one person owns multiple parcels with different landuse types.

·       In this case, you may have to use parcel-ID or some other unique feature when converting.

·       Usually with continuous data, each cell has a unique value and will have no attribute table, so there will be no attributes to relate.

·       In this case, the many-to-one issue will not be applicable.

·       When in doubt, the most detailed breakdown should be used.

·       Information can be grouped much easier than split.

 

 

8. Using feature data directly in Spatial Analyst

·       Several of the Spatial Analyst dialog boxes allow you to enter a point, polyline, or polygon feature directly into the function.

·       There are two ways that features are handled in Spatial Analyst.

·       It either processes the feature data directly, or it converts it to a raster and then processes it.

·       The output resolution can be a specific cell size or the maximum or minimum cell size of the other input raster datasets into the function.

1.     The default is set to the coarsest input raster dataset into the function.

·       You will know if the input can be either a feature or a raster dataset because when you open the browser to enter the input, the browser will say Raster datasets and feature classes in the Show of type input field, and both feature and raster data will be displayed in the browser.

·       If only rasters are allowed as input, then the browser will say Raster Datasets in the Show of type input field, and only raster datasets will be displayed.

·       Some browsers will allow the input of both feature and raster data

 

         

 

 

9. Deriving raster datasets from existing maps

When creating raster datasets from existing maps (data entry), several factors must be considered to allow for full utilization of the input data.

 

Selecting maps

When selecting maps for the creation of the database, you must be aware of:

·       The age and date of the map

·       The cartographic accuracy

·       The resolution and detail

·       The compatibility of the map with other input maps

 

Potential errors

Even with current maps that are accurate, at the same resolution, with the desired amount of detail, and compatible, errors can still occur. Some of the most common errors include:

·       Drafting errors

·       Different cartographic projections used to draft the original data

·       Different photographic projections used to draft the original data

·       Physical changes in the materials used for the maps (shrinking or swelling)

 

 

Understanding cell-based modeling (Chapter 5)

 

1.   Understanding analysis in Spatial Analyst

2.   The operators and functions of Spatial Analyst

3.   NoData and how it affects analysis

4.   Values and what they represent

5.   The analysis environment

6.   The cell size and analysis

7.   Handling projections during analysis

 

 

1. Understanding analysis in Spatial Analyst

 

·       The easiest way to understand cell-based modeling is from the perspective of an individual cell (the worm’s-eye approach) as opposed to from the entire raster (the bird’s-eye approach).

·       To do so, think of yourself as a cell in a raster dataset.

·       You represent a location, and you have a value.

·       All Spatial Analyst operators and functions will ask you to manipulate your value (or remain the same) based on a set series of rules.

·       For you to calculate an output value for your location using any Spatial Analyst operation or function, there are three things you need to know:

1.     your value.

2.     the manipulation of the operator or function.

3.     which other cell locations and their values need to be included in your calculations.

 

2. The operators and functions of Spatial Analyst

 

The functions associated with raster-cell cartographic modeling can be divided into five types:

A.   Those that work on single cells (local functions)

B.   Those that work on cells within a neighborhood (focal functions)

C.   Those that work on cells within zones (zonal functions)

D.   Those that work on all cells within the raster (global functions)

E.   Those that, when combined in a series, perform a specific application (application functions)

 

A. Local functions

·       Local, or per-cell, functions compute an output raster dataset where the output value at each location is a function of the value associated with that location on one or more raster datasets.

·       A per-cell (local) function can be applied to a single raster dataset or to multiple raster datasets.

·       For a single dataset, examples of per-cell functions are the trigonometric functions (for example, sin), or the exponential and logarithmic functions (for example, exponential or log).

·       Examples of local functions that work on multiple raster datasets are functions that return the minimum, maximum, majority, or minority value for all the values of the input raster datasets at each cell location.

 

 

B. Focal functions

·       Focal, or neighborhood, functions produce an output raster dataset in which the output value at each location is a function of the input value at a location and the values of the cells in a specified neighborhood around that location.

·       A neighborhood configuration determines which cells surrounding the processing cell should be used in the calculation of each output value.

·       Neighborhood functions can return the mean, standard deviation, sum, or range of values within the immediate or extended neighborhood.

·       Generally used for ratio (true zero) data, but can be used with categorical data as well (majority).

 

 

C. Zonal functions

·       Zonal functions compute an output raster dataset where the output value for each location depends on the value of the cell at the location and the association that location has within a cartographic zone.

·       Zonal functions are similar to focal functions except that the definition of which cells to include in the processing (the neighborhood) in a zonal function is defined by the configuration of the zones or features in the input zone dataset, not by a specified neighborhood shape.

·       Each zone can be unique.

·       Operations that can be completed on these cells return the mean, sum, minimum, maximum, or range of values from the first dataset that fall within a specified zone of the second.

 

         

D. Global functions

·       Global, or per-raster, functions compute an output raster dataset in which the output value at each cell location is potentially a function of all the cells in the input raster datasets.

·       There are two groups of global functions: Euclidean distance and weighted distance.

o      Euclidean distance global functions assign to each cell in the output raster dataset its distance from the closest source cell (a source may be the location from which to start a new road).

o      The direction of the closest source cell can also be assigned as the value of each cell location in an additional output raster dataset.

o      By applying a global function to a weighted (cost) surface, you can determine the cost of moving from a destination cell (the location where you wish to end the road) to the nearest source cell.

o      To take this one step further, the shortest path over a cost surface can be calculated over a non-networked surface from a source cell to a destination cell. In all the global calculations, knowledge of the entire surface is necessary to return the solution.

 

 

 

E. Application functions

·       There is a wide series of cell-based modeling functions that are developed to solve specific applications.

·       Some of the application functions are more general in scope, such as surface analysis, while other application functions are more narrowly defined, such as the hydrologic analysis functions.

·       The categorization of the application functions is an aid to group and understand the wide variety of Spatial Analyst operators and functions.

 

Density

The Density function distributes a measured quantity of an input point layer throughout a landscape to produce a continuous surface.

 

Surface generation (Spatial statistics)

·       The surface functions use the surface representation of a raster dataset to represent height, concentration, or magnitude (for example, elevation, pollution, or noise).

·       Surface generation functions, called surface interpolators, create a continuous surface from sampled point values.

·       Surface generation functions make predictions for all locations in a raster dataset whether a measurement has been taken at the location or not.

o      Inverse Distance Weighted (IDW) - as the distance increases, you will inversely weight the values.

o      Polynomial trend surface is conceptually similar to taking a piece of paper and trying to pass it through measured points that are raised to the height of their values. That paper is fitted so that overall it fits best to all the points.

o      Spline is conceptually like taking a rubber membrane and, once the measured points are raised to the height of their values, trying to fit it through the points the best you can. The criterion imposed on fitting this membrane is that it must pass through the measured points.

o      Kriging is a statistical method that quantifies the correlation of the measured points through variography. When making a prediction for an unknown location, kriging weights the nearby measured points by their configuration around the prediction location and uses the fitted model from variography to determine a value.

o      Geostatistical Analyst provides additional tools for more advanced surface generation.

 

Surface analysis - the premise behind the surface analysis functions is that additional information can be derived by producing new data and identifying patterns in existing surfaces.

 

·       Slope identifies the slope, or maximum rate of change, from each cell to its neighbors. An output slope raster dataset can be calculated as either a percentage of slope (for example, 10 percent slope) or a degree of slope (for example, 45-degree slope).

 

·       Aspect identifies the steepest downslope direction from each cell to its neighbors. The value of the output raster dataset represents the compass direction of the aspect: 0 is true north, a 90-degree aspect is to the east, and so forth.

 

·       Hillshade is used to determine the hypothetical illumination of a surface for either analysis or graphical display. For analysis, hillshade can be used to determine the length of time and intensity of the sun in a given location. For graphical display, hillshade can greatly enhance the relief of a surface.

 

·       Viewshed identifies either how many of the observation points specified on the input observation raster dataset can be seen from each cell or which cell locations can be seen from each observation point.

 

·       Curvature measures the slope of the surface at each cell. It calculates the second derivative of the input-surface raster dataset - the slope of the slope.

o      The result of the curvature function can be used to describe the physical characteristics of a surface, such as the erosion and runoff processes within a landscape.

o      The slope identifies the overall rate of downward movement, and aspect defines the direction of flow.

o      The profile curvature is the shape of the surface in the direction of the slope.

o      The planform curvature defines the shape of the surface perpendicular to the direction of the slope.

 

·       Contour produces an output polyline dataset.

o      The value of each line represents all contiguous locations with the same height, magnitude, or concentration of whatever the values on the input dataset represent.

o      The function does not connect cell centers; it interpolates a line that represents locations with the same magnitude.

 

Hydrologic analysis

·       The shape of a surface determines how water will flow across it.

·       The hydrologic modeling functions provide methods for describing the hydrologic characteristics of a surface.

·       Using an elevation raster dataset as input, it is possible to model where water will flow, create watersheds and stream networks, and derive other hydrologic characteristics.

·       The hydrologic modeling functions are available through the RasterHydrologyOp or through Map Algebra via the Raster Calculator.

 

 

Watersheds for each section of a stream network

 

 

Geometric transformation

·       The geometric transformation functions either change the location of each cell in the raster dataset or alter the geometric distribution of the cells within a dataset to correct a distortion.

·       The mosaicking functions (another geometric transformation) combine multiple raster datasets representing adjacent areas into a single raster dataset.

 

Generalization

·       Sometimes a raster dataset contains data that is erroneous or irrelevant to the analysis at hand or is more detailed than you need.

·       For instance, if a raster dataset was derived from the classification of a satellite image, it may contain many small and isolated areas that are misclassified.

·       The generalization functions assist with identifying such areas and automating the assignment of more reliable values to the cells that make up the areas.

·       The generalization functions are available through the RasterGeneralizeOp or through Map Algebra via the Raster Calculator.

·       These tools provide capabilities for aggregation, edge smoothing, intelligent noise removal, and more.

·       Nibble - removes single, misclassified cells in the classified image.

·       BoundaryClean and MajorityFilter - smooth the boundaries between different zones

·       Expand - expands specified zones

·       Shrink - which shrinks specified zones

·       Thin -  thins linear features in a raster

 

 

The base classification from a satellite image

 

Effect of Nibble applied to the base classification

 

Effects of MajorityFilter applied to the output from Nibble

 

 

Resolution altering

·       The resolution altering functions change the resolution of an existing raster dataset.

·       If you have one raster dataset at a finer resolution than the rest of the raster datasets, you may wish to resample the finer resolution dataset to the same resolution of the coarser ones to make all the raster datasets the same resolution.

·       This speeds up processing and reduces the data size.

·       The two principal ways to determine values when changing the resolution of a raster dataset are interpolation and aggregation.

·       One group of resampling interpolation functions use either the nearest-neighbor, bilinear, or cubic methods on the values of the input raster dataset.

·       A second group of resampling interpolation functions uses a specified statistical aggregation method within a neighborhood to derive values.

·       Unlike the cell size setting in the analysis environment, the resolution altering functions are applied only to the resultant dataset.

·       The aggregation functions group a series of cells to the same value.

·       To perform an aggregation, the block functions are implemented.

·       With a block function, Spatial Analyst calculates a specified statistic within nonoverlapping neighborhoods.

 

 

 

The effect on the raster of resampling to a coarser resolution

 

 

 

3. NoData and how it affects analysis

·       Every cell location in a raster has a value assigned to it.

·       When inadequate information is available for a cell location, the location can be assigned NoData. NoData and 0 are not the same

·       The fact that a location can have NoData instead of a valid value has ramifications in operators and functions.

·       NoData means that not enough information is known about a cell location to assign it a value. There are two ways that a location with NoData can be treated in the computation of an expression:

·       Return NoData for the location no matter what.

·       Ignore the NoData and compute with the available values.

·                 If NoData exists in any of the input raster datasets in the Spatial Analyst expression, the output values will be affected.

·                 The behavior of NoData is addressed for each operator and function in the online command references.

·                 It is important to understand how NoData is handled in a particular function before making a decision.

·                 You may need to know if a location with NoData on the output ever had a value, or if it received NoData from the operator or function.

·                 Sometimes, when locations receive values, it may be important to know if the output value really is the actual minimum or maximum value, or if it is the minimum or maximum value of the existing known values.

 

4. Values and what they represent

 

Measurement values can be broken into four types: ratio, interval, ordinal, and nominal.

 

·       Ratio - the values from the ratio measurement system are derived relative to a fixed zero point on a linear scale.

o      Mathematical operations can be used on these values with predictable and meaningful results.

o      Examples of ratio measurements are age, distance, weight, and volume.  

 

·       Interval - time of day, years on a calendar, the Fahrenheit temperature scale, and pH value are all examples of interval measurements.

o      These are values on a linear calibrated scale, but they are not relative to a true zero point in time or space.

o      Because there is no true zero point, relative comparisons can be made between the measurements, but ratio and proportion determinations are not as useful.

 

·       Ordinal - values determine position.

o      These measurements show place, such as first, second, and third, but they do not establish magnitude or relative proportions.

o      How much better, worse, prettier, healthier, or stronger something is cannot be demonstrated from ordinal numbers.

 

 

·       Nominal - values associated with this measurement system are used to identify one instance from another.

 

 

o      They may also establish the group, class, member, or category with which the object is associated.

o      These values are qualities, not quantities, with no relation to a fixed point or a linear scale.

o      Coding schemes for landuse, soil types, or any other attribute qualify as a nominal measurement.

o      Other nominal values are social security numbers, ZIP Codes, and telephone numbers.

o      Spatial Analyst does not distinguish between the four different types of measurements when asked to process or manipulate the values.

o      Most mathematical operations work well on ratio values, but when interval, ordinal, or nominal values are multiplied, divided, or evaluated for the square root, the results are typically meaningless.

o      On the other hand, subtraction, addition, and Boolean determinations can be very meaningful when used on interval and ordinal values.

o      Attribute handling within and between raster datasets is most effective and efficient when using nominal measurements.

 

5. The analysis environment

 

Spatial Analyst allows you to process a subset of cells and to specify the resolution in which to process them.

·       analysis extent

 

·       mask

 

 

·       cell size

 

6. The cell size and analysis

 

·       Cells in different raster datasets do not need to be stored in the same resolution.

·       But when processing between multiple datasets, the cell resolution, as is the case with the registration, needs to be the same.

·       When multiple raster datasets are input into any Spatial

·       Analyst function and their resolutions are different, one or more of the input datasets will be automatically resampled, using the nearest neighbor assignment to the coarsest input.

·       The nearest neighbor assignment resampling technique is used since it is applicable to both discrete and continuous value types, while bilinear and cubic are only applicable to continuous data.

·       A resampling technique is necessary because rarely do the centers of the input cells align with the transformed cell centers of the desired resolution.

·       The default resampling to the coarsest resolution of the input rasters can be changed in the Cell Size tab of the Options dialog box to a specific cell size or to the minimum of the input raster datasets.

·       Caution must be taken when specifying a finer cell size than the coarsest input because the resolution of the output cannot be more accurate than the coarsest of the inputs.

·       Specifying a cell size of 50 meters when the input raster datasets are 100 meters creates an output raster with a cell size of 50 meters, but the accuracy is still 100.

·       When performing analysis, make sure you are asking appropriate questions of the cell size.

·       That is, you will not study mouse movement when the cell size is five kilometers, and you will not want to use five kilometer cells when studying the effects of global warming over the earth.

 

7.   Handling projections during analysis

 

·       Raster datasets must be registered with one another before completing any analysis or processing between them.

·       Each location on the ground must be represented by the same x,y cell address on the different input datasets.

·       This means that the input raster datasets have to be in the same coordinate space or coordinate system (in the same projection).

·       The coordinate space of the output will be dependent on the coordinate space of the input datasets.

·       If two or more input raster datasets in an expression are not in the same coordinate space, Spatial Analyst will automatically put them in the same space using the following rules.

 

The default behavior:

·       If only one raster dataset is input, the output will be in the same coordinate space as the input (a very common situation).

·       If multiple raster and feature datasets are in the same coordinate space, the output will also be in that same coordinate space.

·       If more than one raster dataset is input, the output will be in the same coordinate space as the first input.

·       If feature and raster data with different coordinate spaces are input to the same function, the feature dataset will be projected to the coordinate space of the raster; the output will be in the coordinate space of the raster.

·       If feature data is input, the output will be in the same coordinate space as the first input.

 

Overriding the default

·       On the General tab of the Options dialog box, you can set the coordinate space of all output raster datasets to be the same as that specified for the data frame.

·       Automatically transforming a raster or feature dataset into the common coordinate system in the cases identified above is referred to as projecting on the fly.

·       To maintain the speed of on-the-fly projection, a low-order polynomial transformation is applied to the dataset.

·       The on-the-fly projection transformation is less accurate than if you project the dataset using the geometric transformation.

 

 

Performing spatial analysis

(From chapter 7:  you will be working through the examples in Lab 4 described in this chapter)

 

1.    Mapping distance

2.    Performing surface analysis

3.    Calculating cell statistics

4.    Calculating neighborhood statistics

5.    Calculating zonal statistics

6.    Reclassifying your data

7.    Using the raster calculator

8.    Mapping density

9.    Converting your data

 

1. Mapping distance

What are distance mapping functions?

·       The distance mapping functions are global functions. They compute an output raster dataset where the output value at each location is potentially a function of all the cells in the input raster datasets. There are several distance mapping tools for measuring both straight line (Euclidean) distance and distance measured in terms of other factors, such as the cost to travel over the landscape. The outputs from the Straight Line Distance functions are normally used directly, while the outputs from the Cost Weighted Distance functions are most commonly used to compute shortest (or least-cost) paths.

 

Straight Line Distance functions

·       The Straight Line Distance function measures the straight line distance from each cell to the closest source (the source identifies the objects of interest, such as wells, roads, or a school). The distance is measured from cell center to cell center.

o      Example of usage: What is the distance to the closest town?

 

Cost Weighted Distance functions

·       The Cost Weighted Distance function modifies the Straight Line Distance by some other factor, which is a cost to travel through any given cell. For example, it may be shorter to climb over the mountain to the destination, but it is faster to walk around it.

·       The Cost Weighted Allocation function identifies the nearest source cell based on accumulated travel cost.

·       The Cost Weighted Direction function provides a road map, identifying the route to take from any cell, along the least-cost path, back to the nearest source.

·       The Distance and Direction raster datasets are normally created to serve as inputs to the pathfinding function, the shortest (or least-cost) path.

 

Why is it useful to map distance?

·       By mapping distance, you can find out information such as the distance to the nearest hospital from certain areas for an emergency helicopter, or find all fire hydrants within 500 meters of a burning building.

·       Alternatively, you could find the shortest (or least-cost) path from one location to another, based on some cost factor.

 

 

Straight line distance

What are the Straight Line Distance functions?

The Straight Line Distance functions describe each cell’s relationship to a source or a set of sources. There are three potential outputs from this function.

 

Primary output:

Straight Line Distance gives the distance from each cell in the raster to the closest source.

 

Example of usage: What is the distance to the closest town?

Optional outputs:

Straight Line Allocation identifies the cells that are to be allocated to which source based on closest proximity.

 

Example of usage: Which town am I closest to?

 

Straight Line Direction gives the direction from each cell to the closest source.

Example of usage: What is the direction to the closest town?

 

The source

The source identifies the location of the objects of interest, such as wells, shopping malls, roads, forest stands, and so on. If the source is a raster, it must contain only the values of the source cells - all other cells must be NoData. If the source is a feature, it will internally be transformed into a grid when you run the function.

 

The straight line distance raster

The straight line distance raster contains the measured distance from every cell to the nearest source. The distances are measured in projection units, such as feet or meters, and are computed from cell center to cell center.

 

The Straight Line Distance function is used frequently as a standalone function for applications such as finding the nearest hospital for an emergency helicopter. Alternatively, this function can be used when creating a suitability map, when you need to include data representing the distance from a certain object.

 

In the example below, the distance to each town is found. This sort of information could be extremely useful for planning a hiking trip. You may want to stay within a certain distance of a town in case of emergencies, or know how much further you have to travel to pick up supplies.

 

The straight line distance to the nearest town from every location.

 

·       The Straight Line Allocation function assigns each cell the value of the source to which it is closest. The nearest source is determined by the Straight Line Distance.

o      Example of usage: Which town am I closest to?

 

·       The Straight Line Direction function computes the direction to the nearest source, measured in degrees.

o      Example of usage: What is the direction to the closest town?


 


 

Allocation

 

What is the Allocation function?

The Allocation function allows you to identify which cells belong to which source based on closest proximity (in a straight line). An output raster is produced that records the identity of the closest source cell for each cell. Each cell in an allocation raster receives the value of the source cell to which it will be allocated. Note that the Allocation function can also be performed via the Straight Line Distance function or the Cost Weighted Distance function.

 

Performing the Allocation function via the Straight Line Distance function allows you to find the cells that are to be allocated to which source based on closest proximity, in a straight line.

 

Performing the Allocation function via the Cost Weighted Distance function takes the cost of traveling over the land into consideration rather than the straight line distance

 

Why use the Allocation function?

Use the Allocation function to perform analyses such as:

  • Identifying the customers served by a series of stores
  • Finding out which hospital is the closest
  • Finding areas with a shortage of fire hydrants
  • Locating areas that are not served by a chain of supermarkets

 

The example to the below identifies the areas of land supported by a recreation site. You can easily identify the areas that may be in need of more recreation sites (mainly areas in the northeast half of the raster).

 

 

 

Cost weighted distance

What is cost weighted distance mapping?

Cost weighted distance mapping finds the least accumulative cost from each cell to the nearest, cheapest source. Cost can be money, time, or preference.

 

The functions that perform cost weighted distance mapping are similar to the Straight Line Distance functions, but instead of calculating the actual distance from one point to another, they compute the accumulative cost of traveling from each cell to the nearest source, based on the cell’s distance from each source and the cost to travel through it (for example, it is easier to walk through a meadow than a swamp).

 

Why use the Cost Weighted Distance function?

 

·       Cost weighted distance modeling is useful whenever movement is based on geographic factors, such as animal migration studies or consumer travel behavior.

·       Cost weighted distance may also be used to minimize construction costs for routing new roads, transmission lines, or pipelines.

·       The straight line distance between two points is not necessarily the best. In the graphic to the left, the shortest path over the mountain takes three hours.

·       The longer path around only takes two hours. If time were a cost, then the route with the longer distance should be taken.

·       However, the aim may be to climb over the mountain. Applying cost weighted distance enables you to specify preferences in your input data.

·       It may, for example, take longer to travel over the mountain due to steep slopes, so steep slopes should be given a higher cost when finding a suitable path from A to B.

 

 

Example: Finding the least-cost route for a road

·       In the following example, the Cost Weighted Distance functions are used to find the least-cost path for a new road.

·       The Cost Weighted Distance function is the prerequisite to the Shortest Path function, which is discussed in the next section.

·       The Shortest Path function determines the actual route for the road.

·       To calculate the least accumulative cost from each cell to the nearest source, the Cost Weighted Distance function needs a source and a cost raster.

 

The source

The source, as you can see in the graphic below (in red), is the starting point for the proposed road.

 

 

The cost raster

·       The cost raster identifies the cost of traveling through every cell.

·       To create this raster, you need to identify the cost of constructing a road through each cell.

·       Although the cost raster is a single dataset, it is often used to represent several criteria.

·       In the following example, landuse and slope influence the construction costs.

·       These datasets are in different measurement systems (landuse type and percent slope), so they cannot be compared relative to one another and must be reclassified to a common scale.

 

Creating a cost raster:

 

Reclassifying your datasets to a common scale

·       In this example, slope and landuse have been reclassified on a scale of 1ń10.

·       The attributes of each dataset should be examined, in turn, to decide on their contribution to the cost of building a road.

·       For example, it is more costly to traverse steep slopes, so steeper slopes will be assigned higher costs when reclassifying this dataset.

·       The graphics below display the results.

 

Reclassified Landuse

 

Reclassified Slope

 

 

Weighting datasets according to percent influence

·       The next step in producing the cost raster is to add the reclassified datasets together.

·       The simplest approach is to just add them together.

·       However, you may know that some factors are more important than others.

·       For instance, avoiding steep slopes may be twice as important as the landuse type, so you might, for example, give this dataset an influence of 66 percent and the landuse dataset an influence of 34 percent (to make 100%).

·       The following diagram shows the conceptual process:

 

Slope

Slope * 0.66 =

 

Landuse

Landuse * 0.34 =

         

 

Combining the datasets

The final cost raster is the result of adding the weighted datasets together.

+

          =

 

Taking this example, the following diagram shows the final cost raster, the result of reclassifying the datasets of slope and landuse, weighting each by 0.66 and 0.34, respectively, then combining the weighted datasets.

 

The cells shaded dark blue are the most suitable cells through which to route the road, as they are the least costly.

 

 

The Cost Weighted Distance function

·       Using the cost raster and the source, the Cost Weighted Distance function produces an output raster in which each cell is assigned a value that is the least accumulative cost of getting back to the source.

·       Using our example, the function takes the cost raster and calculates a value for each cell in the output cost-weighted distance raster that is the accumulated least cost of getting from that cell to the nearest source.

·       Every cell in the cost-weighted distance raster is assigned a value that represents the sum of the minimum travel costs that would be incurred by traveling back along the least-cost path to its nearest source.

·       In the example below, the accumulated least costly way of getting from the cell colored dark red to the school is 10.5.

 

 

·       Two additional outputs: direction and allocation rasters can be created from the Cost Weighted Distance function.

·       These are explained on the following pages.Both the cost-weighted distance and direction rasters are required if you want to go on to calculate the least-cost (shortest) path between source locations and destination locations.

 

Direction

·       The cost-weighted distance raster tells you the least accumulated cost of getting from each cell to the nearest source, but it doesn’t tell you which way to go to get there.

·       The direction raster provides a road map, identifying the route to take from any cell, along the least-cost path, back to the nearest source.

 

Cost Weighted

Direction

Direction Coding

 

·       The algorithm for computing the direction raster assigns a code to each cell that identifies which one of its neighboring cells is on the least-cost path back to the nearest source.

·       In the direction coding diagram above, 0 represents every cell in the cost-weighted distance raster.

·       Each cell is assigned a value representing the direction of the nearest, cheapest cell on the route of the least costly path to the nearest source.

 

·       For example, in the graphic above, the cheapest way to get from the cell with a value of 10.5 is to go diagonally, through the cell with a value of 5.7, to the source, the school site.

·       The direction algorithm assigns a value of 4 to the cell with a value of 10.5, and 4 to the cell with a value of 5.7, because this is the direction of the least-cost path back to the source from each of these cells.

·       This process is done for all cells in the cost-weighted distance raster to produce the direction raster, which tells you the direction to travel from every cell in the cost-weighted distance raster back to the source.

 

         

 

Cost Weighted Distance

 

 

Direction

 

Both the cost-weighted distance and direction rasters are required if you want to go on to calculate the least-cost (shortest) path between source locations and destination locations.

 

Allocation

The cost-allocation raster identifies the nearest source from each cell in the cost-weighted distance raster. It is conceptually similar to the Straight Line Distance Allocation function, where each cell is assigned to its nearest source cell. However, near is expressed in terms of accumulated travel cost.

 

All cells are allocated to the school source.

 

 

 

Shortest Path

What is the Shortest Path function?

·       The Shortest Path function determines the path from a destination point to a source.

·       Once you have performed the Cost Weighted Distance function, creating distance and direction rasters, you can then compute the least-cost (or shortest) path from a chosen destination to your source point, which in our original example was the starting point for the new road.

Why find the shortest path?

·       The shortest path travels from the destination to the source and is guaranteed to be the cheapest route (relative to the cost units defined by the original cost raster).

·       Use it to find the best route for a new road in terms of construction costs, or to identify the path to take from several suburban locations (sources) to the closest shopping mall (destinations).

 

 

·       You can see two potential paths for the new road in the diagram above (in purple and red) to illustrate an important point.

·       The purple line represents the path created using a cost raster where each input raster (landuse and slope) had the same influence.

·       The red line represents the path created using a cost raster where the slope input raster had a weight (or influence) of 66 percent.

·       By giving the slope input raster a higher weight, more attention was given to avoiding steeper slopes in the red path.

·       It is important to spend time considering how to weight the rasters that make up the cost raster.

·       How you weight your rasters depends on your application and the results you wish to achieve.

 

2. Performing surface analysis

·       You can gain additional information by producing a new dataset that identifies a specific pattern within an original dataset.

·       Patterns that were not readily apparent in the original surface can be derived, such as contours, angle of slope, steepest downslope direction (aspect), shaded relief (hillshade), and viewshed.

·       Contours can be useful for finding areas of the same value.

·       You may be interested in obtaining elevation values for specific locations and examining the overall gradation of the land.

·       You may, for instance, want to know the variations in the slope of the landscape because you want to find the areas most at risk of landslide based on the angle of slopes in an area (steeper slopes being those most at risk).

 

 

                   

Input Elevation Raster                          Output Contours

Steeper angle of slope

Output Slope

 

You may be a farmer interested in locating a field on an area with a southerly aspect. You can create a hillshade for both analytical and graphical purposes.

Output Aspect

 

Graphically, a hillshade can provide an attractive and realistic backdrop showing how other layers are distributed in relation to the terrain relief.

 

Output hillshade

 

From an analytical point of view, you can, for instance, analyze how the landscape is illuminated at various times of the day by lowering and raising the sun angle used in the analysis.

             

          Azimuth 45°                                         Azimuth 315°

 

Calculating viewshed is useful when you want to know how visible objects will be. For instance, you might want to find the location with the most expansive view in an area because you want to know the best location for a lookout. Display a hillshade transparently underneath the result from the Viewshed function.

 

                

          Input elevation raster                           Output viewshed                        Combined

 

Contour

What are contours?

·       Contours are polylines that connect points of equal value (such as elevation, temperature, precipitation, pollution, or atmospheric pressure).

·       The distribution of the polylines shows how values change across a surface.

·       Where there is little change in a value, the polylines are spaced farther apart.

·       Where the values rise or fall rapidly, the polylines are closer together.

 

Why create contours?

·       By following the polyline of a particular contour, you can identify which locations have the same value.

·       Contours are also a useful surface representation because they allow you to simultaneously visualize flat and steep areas (distance between contours) and

·       ridges and valleys (converging and diverging polylines).

·       The example below shows an input elevation dataset and the output contour dataset.

·       The areas where the contours are closer together indicate the steeper locations.

·       They correspond with the areas of higher elevation (in white on the input elevation dataset).

 

        

Input elevation dataset                         Output contour dataset

 

The Contour attribute table contains an elevation attribute for each contour polyline.

 

 

 

What is slope?

The Slope function calculates the maximum rate of change between each cell and its neighbors - for example, the steepest downhill descent for the cell (the maximum change in elevation over distance between the cell and its eight neighbors). Every cell in the output raster has a slope value. The lower the slope value, the flatter the terrain; the higher the slope value, the steeper the terrain. The output slope dataset can be calculated as percent slope or degree of slope.

 

Degree of slope = Θ                   Percent of slope = rise / run * 100

 

tan Θ = rise / run

 

 

 

Degree of slope =             30                                 45                     76

Percent slope =                58                               100                    375

 

·       When the slope angle equals 45 degrees, the rise is equal to the run.

·       Expressed as a percentage, the slope of this angle is 100 percent.

·       Note that as the slope approaches vertical (90°), the percentage slope approaches infinity.

 

·       The Slope function is most frequently run on an elevation dataset, as the following diagrams show.

·       Steeper slopes are shaded red on the output slope dataset.

·       It can also be used with other types of continuous data, such as population, to identify sharp changes in value. 

 

What is aspect?

·       Aspect identifies the steepest downslope direction from each cell to its neighbors.

·       It can be thought of as slope direction or the compass direction a hill faces.

·       It is measured clockwise in degrees from 0 (due north) to 360 (again due north, coming full circle).

·       The value of each cell in an aspect dataset indicates the direction the cell’s slope faces.

·       Flat slopes have no direction and are given a value of -1.

·       The diagram below shows an input elevation dataset and the output aspect raster.

 



 

 

Why use the Aspect function?

With the Aspect function, you can:

·       Find all north-facing slopes on a mountain as part of a search for the best slopes for ski runs.

·       Calculate the solar illumination for each location in a region as part of a study to determine the diversity of life at each site.

·       Find all southerly slopes in a mountainous region to identify locations where the snow is likely to melt first as part of a study to identify those residential locations that are likely to be hit by meltwater first.

·       Identify areas of flat land to find an area for a plane to land in an emergency.

 

Hillshade

·       The Hillshade function obtains the hypothetical illumination of a surface by determining illumination values for each cell in a raster.

·       It does this by setting a position for a hypothetical light source and calculating the illumination values of each cell in relation to neighboring cells.

·       It can greatly enhance the visualization of a surface for analysis or graphical display.

·       By default, shadow and light are shades of gray associated with integers from 0 to 255 (increasing from black to white).

 

Azimuth is the angular direction of the sun, measured from north in clockwise degrees from 0 to 360. An azimuth of 90 is east. The default is 315 (NW).

Altitude is the slope or angle of the illumination source above the horizon. The units are in degrees, from 0 (on the horizon) to 90†degrees (overhead). The default is 45 degrees.

 

  

 

Using hillshading for display

·       By placing an elevation raster on top of a created hillshade, then making the elevation raster transparent, you can create realistic images of the landscape.

·       Add other layers, such as roads or streams, to further increase the