Query to select dangling line segments

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Query to select dangling line segments

Brian Stempin
Hi All!
I'm currently trying to write a query to do the following: I need to find all line segments that have an end point or a start point that doesn't overlap with everything.  Here's what I had so far:

SELECT t1.osm_id, ST_StartPoint(t1.way)
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
"OSMData".osm_mn_data_highway_20101129_101234 t2
WHERE t1.way && t2.way
AND t1.osm_id <> t2.osm_id
AND NOT ST_Intersects(ST_StartPoint(t1.way), t2.way)
UNION
SELECT t1.osm_id, ST_EndPoint(t1.way)
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
"OSMData".osm_mn_data_highway_20101129_101234 t2
WHERE t1.way && t2.way
AND t1.osm_id <> t2.osm_id
AND NOT ST_Intersects(ST_EndPoint(t1.way), t2.way)

way = GEOMETRY
osm_id = INTEGER (id)

(I'm working on some OpenStreetMap data in case you're curious)
This query doesn't give me what I want.  If there exists a t2 within t1's bounding box that doesn't intersect t1's start/end point, then it gets included into the list (false positives).  I want to select rows who's geometry's start/end point does not intersect *anything*.  I have a feeling that I should be using some sort of grouping operator, but I'm lost.

Any help would be greatly appreciated.

Brian Stempin

_______________________________________________
postgis-users mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Query to select dangling line segments

Kevin Neufeld-2
On 11/29/2010 11:54 AM, Brian Stempin wrote:
I want to select rows who's geometry's start/end point does not intersect *anything*

Is your dataset fully noded (where the only intersection between geometries occur at the endpoints)?

If so, you could perform a very quick GROUP BY query to find degree-1 nodes in your network.

SELECT min(osm_id), pt
FROM (
   SELECT osm_id, ST_StartPoint(way) AS pt
   FROM "OSMData".osm_mn_data_highway_20101129_101234

   UNION ALL

   SELECT osm_id, ST_EndPoint(way) AS pt
   FROM "OSMData".osm_mn_data_highway_20101129_101234
) AS grouped
GROUP BY pt
HAVING count(*) = 1;



Note though, that this GROUP BY approach uses the bounding boxes of the endpoints, *not* the geometries themselves.  So for this to work, the precision of your dataset must be less than what can be represented in the float4 representation of the bounding box.
ie.
SELECT st_astext(st_collect(column1))
FROM ( VALUES
  ('POINT(0 0)'::geometry),
  ('POINT(0 1)'::geometry),
  ('POINT(0 0.0000001)'::geometry)
) AS foo
GROUP BY column1;
        st_astext       
-------------------------
 MULTIPOINT(0 0,0 1e-07)
 MULTIPOINT(0 1)
(2 rows)


See how the first and last points have the same bounding box though different geometries?.

As long as your data is fully noded and the precision is less than what can be represented as a float4, then this approach works very fast (no expensive spatial predicate operations)

Cheers,
Kevin



_______________________________________________
postgis-users mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Query to select dangling line segments

Brian Stempin
Hi Kevin,
Thanks for taking a stab at my problem.

Response in-line:

On Mon, Nov 29, 2010 at 4:36 PM, Kevin Neufeld <kneufeld.ca@gmail.com> wrote:
On 11/29/2010 11:54 AM, Brian Stempin wrote:
I want to select rows who's geometry's start/end point does not intersect *anything*

Is your dataset fully noded (where the only intersection between geometries occur at the endpoints)?

There are places where segments run through endpoints and segments run through other segments.
 

If so, you could perform a very quick GROUP BY query to find degree-1 nodes in your network.

SELECT min(osm_id), pt
FROM (
   SELECT osm_id, ST_StartPoint(way) AS pt

   FROM "OSMData".osm_mn_data_highway_20101129_101234

   UNION ALL

   SELECT osm_id, ST_EndPoint(way) AS pt

   FROM "OSMData".osm_mn_data_highway_20101129_101234
) AS grouped
GROUP BY pt
HAVING count(*) = 1;

Yeah, I'm getting waaay too many results from this query.  I take it that this is because my data is not fully noded?
 
Note though, that this GROUP BY approach uses the bounding boxes of the endpoints, *not* the geometries themselves.  So for this to work, the precision of your dataset must be less than what can be represented in the float4 representation of the bounding box.

My data is precise enough to support this.
 
ie.
SELECT st_astext(st_collect(column1))
FROM ( VALUES
  ('POINT(0 0)'::geometry),
  ('POINT(0 1)'::geometry),
  ('POINT(0 0.0000001)'::geometry)
) AS foo
GROUP BY column1;
        st_astext       
-------------------------
 MULTIPOINT(0 0,0 1e-07)
 MULTIPOINT(0 1)
(2 rows)


See how the first and last points have the same bounding box though different geometries?.

As long as your data is fully noded and the precision is less than what can be represented as a float4, then this approach works very fast (no expensive spatial predicate operations)

Cheers,
Kevin

Thanks for the stab.  I'll stew on this more when I get home.  Time to catch the train!
Brian 

_______________________________________________
postgis-users mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Query to select dangling line segments

Kevin Neufeld-2
On 11/29/2010 2:16 PM, Brian Stempin wrote:
Is your dataset fully noded (where the only intersection between geometries occur at the endpoints)?

There are places where segments run through endpoints and segments run through other segments.

Ah.  Then in that case the GROUP BY approach won't work for you.  I used it quite successfully over a large fully-noded stream network I used to work on where I needed to look for locations of interest (head waters, degree-2 nodes, confluences of 3 streams coming together, etc).

So for you, what if you reversed the logic of your query?  Do a self-join as you specified, but look for the ids that intersect and then negate it.
ie.

-- list all ids where the startpoint doesn't intersect
SELECT osm_id
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (

   -- list all ids where the startpoint intersects something.
   SELECT t1.osm_id
   FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
   "OSMData".osm_mn_data_highway_20101129_101234 t2
   WHERE t1.osm_id <> t2.osm_id
   AND ST_Intersects(ST_StartPoint(t1.way), t2.way)

)

-- do the same for endpoints.


Alternatively, you should get similar results by doing a LEFT JOIN and filter all cases that don't match.
-- This is of course untested, but here's the idea
SELECT t1.osm_id
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
   LEFT JOIN "OSMData".osm_mn_data_highway_20101129_101234 t2
      ON (t1.osm_id <> t2.osm_id AND
          ST_Intersects(ST_StartPoint(t1.way), t2.way))
WHERE t2.osm_id IS NULL;


Cheers,
Kevin


_______________________________________________
postgis-users mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Query to select dangling line segments

Brian Stempin
Clever!  I think that first one will do it for me.  I'll try it later today and get back to you.  Thanks for the boost!

Brian

_______________________________________________
postgis-users mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Query to select dangling line segments

Brian Stempin

So, I decided to run with the following query:

SELECT osm_id 
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (

   -- list all ids where the startpoint intersects something.
   SELECT t1.osm_id
   FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
   "OSMData".osm_mn_data_highway_20101129_101234 t2
   WHERE t1.osm_id <> t2.osm_id
   AND ST_Intersects(ST_StartPoint(t1.way), t2.way)

)
UNION
SELECT osm_id 
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (

   -- list all ids where the startpoint intersects something.
   SELECT t1.osm_id
   FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
   "OSMData".osm_mn_data_highway_20101129_101234 t2
   WHERE t1.osm_id <> t2.osm_id
   AND ST_Intersects(ST_EndPoint(t1.way), t2.way)

)

In my dataset, this takes ~ 8 seconds to run.  Being a bit of a performance junkee, I modified it to the following:

SELECT osm_id, ST_StartPoint(way)
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (

   -- list all ids where the startpoint intersects something.
   SELECT t1.osm_id
   FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
   "OSMData".osm_mn_data_highway_20101129_101234 t2
   WHERE t1.osm_id <> t2.osm_id
   AND t2.way ~ t1.way
   AND ST_Intersects(ST_StartPoint(t1.way), t2.way)

)
UNION
SELECT osm_id, ST_EndPoint(way)
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (

   -- list all ids where the startpoint intersects something.
   SELECT t1.osm_id
   FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
   "OSMData".osm_mn_data_highway_20101129_101234 t2
   WHERE t1.osm_id <> t2.osm_id
   AND t2.way ~ t1.way
   AND ST_Intersects(ST_EndPoint(t1.way), t2.way)

)

Note the addition of the "t2.way ~ t1.way" bit.  "A ~ B" means "A's bounding box contains B's bounding box."  Since in my case B is a point, I eliminate a lot of comparisons by only looking at shapes who's bounding box B is contained in.  This reduced my query time from 8.x seconds to 3.4x seconds.

Thanks a ton, Kevin!

Brian
PS -- anyone else spot any other improvements that I can make?  I plan on writing about this in a blog article as part of a larger piece later tonight/tomorrow.

_______________________________________________
postgis-users mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Query to select dangling line segments

Kevin Neufeld-2
Glad you got it working. 

Just a little note of warning ... I noticed you are using "UNION", not "UNION ALL".  The difference is that the later will perform a simple concatenation of the two results sets.  The former is kind of like a set union, returning results that are in the first or second query, removing duplicate rows.  In your 8 second query, not a big deal, but in your 3 second query, you've include a geometry object.  In determining record equivalence to filter duplicate rows, it is the = operator that is invoked against two geometry objects, which goes back to comparing bounding box equivalence, not geometry equivalence.

i.e.
SELECT 'POINT(0 0)'::geometry
UNION
SELECT 'POINT(0 0.0000001)'::geometry;
                  geometry                 
--------------------------------------------
 010100000000000000000000000000000000000000
(1 row)


SELECT 'POINT(0 0)'::geometry
UNION ALL
SELECT 'POINT(0 0.0000001)'::geometry;
                  geometry                 
--------------------------------------------
 010100000000000000000000000000000000000000
 0101000000000000000000000048AFBC9AF2D77A3E
(2 rows)


Not a big deal, but I just wanted to make sure you knew what was happening with your query.
UNION ALL should also be faster as it's not trying to merge two results sets together.

Cheers,
Kevin


On 11/30/2010 2:34 PM, Brian Stempin wrote:

So, I decided to run with the following query:

SELECT osm_id 
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (

   -- list all ids where the startpoint intersects something.
   SELECT t1.osm_id
   FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
   "OSMData".osm_mn_data_highway_20101129_101234 t2
   WHERE t1.osm_id <> t2.osm_id
   AND ST_Intersects(ST_StartPoint(t1.way), t2.way)

)
UNION
SELECT osm_id 
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (

   -- list all ids where the startpoint intersects something.
   SELECT t1.osm_id
   FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
   "OSMData".osm_mn_data_highway_20101129_101234 t2
   WHERE t1.osm_id <> t2.osm_id
   AND ST_Intersects(ST_EndPoint(t1.way), t2.way)

)

In my dataset, this takes ~ 8 seconds to run.  Being a bit of a performance junkee, I modified it to the following:

SELECT osm_id, ST_StartPoint(way)
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (

   -- list all ids where the startpoint intersects something.
   SELECT t1.osm_id
   FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
   "OSMData".osm_mn_data_highway_20101129_101234 t2
   WHERE t1.osm_id <> t2.osm_id
   AND t2.way ~ t1.way
   AND ST_Intersects(ST_StartPoint(t1.way), t2.way)

)
UNION
SELECT osm_id, ST_EndPoint(way)
FROM "OSMData".osm_mn_data_highway_20101129_101234 t1
WHERE osm_id NOT IN (

   -- list all ids where the startpoint intersects something.
   SELECT t1.osm_id
   FROM "OSMData".osm_mn_data_highway_20101129_101234 t1,
   "OSMData".osm_mn_data_highway_20101129_101234 t2
   WHERE t1.osm_id <> t2.osm_id
   AND t2.way ~ t1.way
   AND ST_Intersects(ST_EndPoint(t1.way), t2.way)

)

Note the addition of the "t2.way ~ t1.way" bit.  "A ~ B" means "A's bounding box contains B's bounding box."  Since in my case B is a point, I eliminate a lot of comparisons by only looking at shapes who's bounding box B is contained in.  This reduced my query time from 8.x seconds to 3.4x seconds.

Thanks a ton, Kevin!

Brian
PS -- anyone else spot any other improvements that I can make?  I plan on writing about this in a blog article as part of a larger piece later tonight/tomorrow.
_______________________________________________ postgis-users mailing list [hidden email] http://postgis.refractions.net/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Query to select dangling line segments

Brian Stempin
Good catch!  This could have really hurt if I wasn't paying attention.  Thanks!
Speaking of not paying attention:  it turns out that "~" doesn't work in this case.  I'm guessing that since the point's BB isn't completely inside of it's line's BB (they usually share an edge), I end up getting every single start/end point.  The "&&" operator, however, does work and appears to speed things up.

Brian

_______________________________________________
postgis-users mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-users