Generating random locations along roads using PostGIS
[Category: Programming and Databases] [link] [Date: 2013-12-16 01:32:41]
After my previous article pertaining to rolling accumulated totals using windowed functions (link), it dawned on me that this same technique could be used to generate random placements along a set of roads. In order to do this, each segment is defined as a start and end point of one continuous line - then a random location is chosen along this combined line. Thus, road segments that are longer are more likely to receive placements. The query could be re-written a number of ways to accommodate different needs such as equal-spaced points, no weighting based on segment length, or weighting based on some other variable.
Random locations along a line segment are commonly used as either representations of some form of demand, or as sample points for a study or survey. Perhaps random locations along creeks within a township need to be generated to take an unbiased sample of water quality, or locations are created within an area to represent buildings or people. In my demonstration below, I use 2010 Tiger Edges in Kenton County, KY provided by the Census Bureau and filter only to secondary roads (mtfcc='S1200') to limit my result set.
The picture above shows data generated using the query below.
select ss.id ,sr.tlid ,st_line_interpolate_point(st_linemerge(sr.geom) ,(ss.randlength-sr.startlength)/(sr.endlength-sr.startlength)) geom from ( select n.n id,random()*l.totallength randlength from data.numbers n cross join (select sum(st_length(geom)) totallength from data.edges2010 where mtfcc='S1200') l where n.n<=100 ) ss join ( select tlid ,geom ,sum(st_length(geom)) over (partition by 1 order by tlid)-st_length(geom) startlength ,sum(st_length(geom)) over (partition by 1 order by tlid) endlength from data.edges2010 where mtfcc='S1200' ) sr on ss.randlength between sr.startlength and sr.endlength
There's quite a bit going on in the query above. The ss subquery uses a numbers table containing a single column counting from 1 to 1,000,000. Numbers tables are useful for expanding information in a query. In this case, the expansion is on the number of random locations to generate. PostgreSQL also has a generate_series() function which can be used to achieve the same result. Within the sr subquery, the magic of windowed function support for aggregates is seen in the startlength and endlength fields. Finally, the linear referencing function st_line_interpolate_point() is used to locate the point along the segment in question.