Recheck on Points broken

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Recheck on Points broken

Regina Obe-2
I know this is a long thread:

https://trac.osgeo.org/postgis/ticket/3418#comment:9

So I'll try to summarize.

We've got a problem with Recheck when comparing 2 geometry points.

Presumably the issue is because our index calculation (point / point) is
returning a distance > actual distance which is not allowed by recheck
logic.

I'm assuming this has something to do with our float/float8 whatever calcs
(though I thought we switched to float8 for index (or was that just for
bbox) and index is still based on float4? )

Anyrate if there is nothing that can be done about ensuring index distance
is always less or equal to actual distance, I think the best course of
action is to disable recheck for point geometries (which is what PostgreSQL
code does for their point to point).

I'm not quite sure how to determine that in this code to do that?

http://postgis.net/docs/doxygen/2.4/df/d5e/gserialized__gist__2d_8c_source.h
tml#l01210

1204         {
 1205                 /* In all cases, since we only have keys (boxes) we'll
return */
 1206                 /* the minimum possible distance, which is the
box2df_distance */
 1207                 /* and let the recheck sort things out in the case of
leaves */
 1208                 distance = (double)box2df_distance(entry_box,
&query_box);
 1209
 1210                 if (GIST_LEAF(entry))
 1211                         *recheck = true;
 1212         }


Thanks,
Regina

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recheck on Points broken

Sandro Santilli-3
On Tue, Jan 03, 2017 at 12:53:59AM -0500, Regina Obe wrote:

> https://trac.osgeo.org/postgis/ticket/3418#comment:9

> I'm assuming this has something to do with our float/float8 whatever calcs
> (though I thought we switched to float8 for index (or was that just for
> bbox) and index is still based on float4? )

Index is still float4

> Anyrate if there is nothing that can be done about ensuring index distance
> is always less or equal to actual distance

I remember we were all happy about having *real* distance with a more
recent PostgreSQL. If that's not the case (as the user reports) we should
try to find a way to provide both mechanisms. Two different operators, maybe ?

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recheck on Points broken

Regina Obe-2

>> Anyrate if there is nothing that can be done about ensuring index
>> distance is always less or equal to actual distance

> I remember we were all happy about having *real* distance with a more recent PostgreSQL. If that's not the case (as the user reports) we should try to find a way to provide both mechanisms.
> Two different operators, maybe ?
> --strk;

We've already got two different operators.  We have <#> which doesn't use recheck and that's what I told the user to use, until we resolve this.

The thing is the <-> I think pretty much for all reasonable purposes returns the right order for points (and in fact in this case yields the same value and <-> I think in all cases it will).  

So it's just the float4/float8 mess why it screws up and it Errors noisily in PostgreSQL land where we cannot control.

I hate telling people to if you have point / point use <#> and if you have others you can safely use  <->.  That's sooo unfriendly.

I think the best thing is just to not have point-point checks use recheck.  Can we do that easily? That's what PostgreSQL point/point does.
That way we don't fall into the PostgreSQL box trap of

"Hey -- how come your index distance is bigger than your computed distance.  What kind of monster are you? Screw you!"

Thanks,
Regina

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recheck on Points broken

Paul Ramsey-3
Are we 100% sure this is a float/double issue? The distance between two float boxes should always be *smaller* than the actual distance between two objects, since the float box extrema are always moved *away* from the original object when the float boxes are created. So from a "theoretical mechanics" perspective this shouldn't actually be the source of the problem.

From a real world point-of-view, how could it still be the problem?
- maybe the box generation code has a short circuit for points which doesn't do the correct thing? Pretty sure it doesn't, but.
- something else?

P


On Tue, Jan 3, 2017 at 4:54 AM, Regina Obe <[hidden email]> wrote:

>> Anyrate if there is nothing that can be done about ensuring index
>> distance is always less or equal to actual distance

> I remember we were all happy about having *real* distance with a more recent PostgreSQL. If that's not the case (as the user reports) we should try to find a way to provide both mechanisms.
> Two different operators, maybe ?
> --strk;

We've already got two different operators.  We have <#> which doesn't use recheck and that's what I told the user to use, until we resolve this.

The thing is the <-> I think pretty much for all reasonable purposes returns the right order for points (and in fact in this case yields the same value and <-> I think in all cases it will).

So it's just the float4/float8 mess why it screws up and it Errors noisily in PostgreSQL land where we cannot control.

I hate telling people to if you have point / point use <#> and if you have others you can safely use  <->.  That's sooo unfriendly.

I think the best thing is just to not have point-point checks use recheck.  Can we do that easily? That's what PostgreSQL point/point does.
That way we don't fall into the PostgreSQL box trap of

"Hey -- how come your index distance is bigger than your computed distance.  What kind of monster are you? Screw you!"

Thanks,
Regina

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel


_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recheck on Points broken

Regina Obe-2

Nope not 100% sure was just a guess based on what Robert Haas said and my assumption that we don't generate boxes for points?

I thought we don't cause it's just baggage, which means we are using the point where as in other cases it is a box distance.

 

I did try to put a sniffer on where it's measuring distance adist[i], bdist[i] around 396 but always got a backend crash.

 

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/executor/nodeIndexscan.c;h=3143bd94ec4499fba94b41693538b785c4b32e6c;hb=HEAD#l396

 

372 static int

373 cmp_orderbyvals(const Datum *adist, const bool *anulls,

374                 const Datum *bdist, const bool *bnulls,

375                 IndexScanState *node)

376 {

377     int         i;

378     int         result;

379

 380     for (i = 0; i < node->iss_NumOrderByKeys; i++)

381     {

382         SortSupport ssup = &node->iss_SortSupport[i];

383

 384         /*

385          * Handle nulls.  We only need to support NULLS LAST ordering, because

386          * match_pathkeys_to_index() doesn't consider indexorderby

387          * implementation otherwise.

388          */

389         if (anulls[i] && !bnulls[i])

390             return 1;

391         else if (!anulls[i] && bnulls[i])

392             return -1;

393         else if (anulls[i] && bnulls[i])

394             return 0;

395

 396         result = ssup->comparator(adist[i], bdist[i], ssup);

397         if (result != 0)

398             return result;

399     }

400

 401     return 0;

402 }

 

You will probably have much better luck since I was just fumbling my way thru the code.

 

Thanks,

Regina

 

 

 

From: postgis-devel [mailto:[hidden email]] On Behalf Of Paul Ramsey
Sent: Tuesday, January 03, 2017 10:00 AM
To: PostGIS Development Discussion <[hidden email]>
Subject: Re: [postgis-devel] Recheck on Points broken

 

Are we 100% sure this is a float/double issue? The distance between two float boxes should always be *smaller* than the actual distance between two objects, since the float box extrema are always moved *away* from the original object when the float boxes are created. So from a "theoretical mechanics" perspective this shouldn't actually be the source of the problem.

 

From a real world point-of-view, how could it still be the problem?

- maybe the box generation code has a short circuit for points which doesn't do the correct thing? Pretty sure it doesn't, but.

- something else?

P

 

On Tue, Jan 3, 2017 at 4:54 AM, Regina Obe <[hidden email]> wrote:


>> Anyrate if there is nothing that can be done about ensuring index
>> distance is always less or equal to actual distance

> I remember we were all happy about having *real* distance with a more recent PostgreSQL. If that's not the case (as the user reports) we should try to find a way to provide both mechanisms.
> Two different operators, maybe ?
> --strk;

We've already got two different operators.  We have <#> which doesn't use recheck and that's what I told the user to use, until we resolve this.

The thing is the <-> I think pretty much for all reasonable purposes returns the right order for points (and in fact in this case yields the same value and <-> I think in all cases it will).

So it's just the float4/float8 mess why it screws up and it Errors noisily in PostgreSQL land where we cannot control.

I hate telling people to if you have point / point use <#> and if you have others you can safely use  <->.  That's sooo unfriendly.

I think the best thing is just to not have point-point checks use recheck.  Can we do that easily? That's what PostgreSQL point/point does.
That way we don't fall into the PostgreSQL box trap of

"Hey -- how come your index distance is bigger than your computed distance.  What kind of monster are you? Screw you!"

Thanks,
Regina


_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel

 


_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recheck on Points broken

Regina Obe-2
In reply to this post by Paul Ramsey-3

Just to add to the mix, here is Robert Haas' later comment to my further questions from full conversation thread –

 

https://www.postgresql.org/message-id/flat/000001d26260%24d4a2d7b0%247de88710%24%40pcorp.us#000001d26260$d4a2d7b0$7de88710$@pcorp.us

 

On Tue, Jan 3, 2017 at 12:36 AM, Regina Obe <lr(at)pcorp(dot)us> wrote:

>> cmp would return 0 if the estimated distance returned by the index AM were greater than the actual distance.

>> The estimated distance can be less than the actual distance, but it isn't allowed to be more.  See gist_bbox_distance for an example of a "lossy" distance calculation, and more generally "git show 35fcb1b3d038a501f3f4c87c05630095abaaadab".

> 

> Did you mean would return < 0 ?

 

Yes, sorry.

 

> Since I thought 0 meant exact and not where it's Erroring?

> 

> I think for points then maybe we should turn it off, as this could just be floating point issues with the way we compute the index.

> That would explain why it doesn't happen for other cases like  polygon / point in our code

> or polygon /polygon in our code since the box box distance in our code would always be <= actual distance for those.

> 

> So maybe the best course of action is just for us inspect the geometries and if both are points just disable recheck.

> 

> It's still not quite clear to me even looking at that git commit, why those need to error instead of going thru recheck aside from efficiency.

 

The code that reorders the returned tuples assumes that (1) the actual

distance is always greater than or equal to the estimated distance and

(2) the index returns the tuples in order of increasing estimated

distance.  Imagine that the estimated distances are 0, 1, 2, 3... and

the real distances are 2,3,4,5...  When it sees the

estimated-distance-0 tuple it computes that the actual distance is 2,

but it doesn't know whether there's going to be a tuple later with an

actual distance between 0 and 2, so it buffers the tuple. When it sees

the estimated-distance-1 tuple it computes that the actual distance is

2, and now it knows there won't be any more estimated or actual

distances between 0 and 1, but there could still be a tuple with an

estimated distance of 1 and 2 whose actual distance is also between 1

and 2, so it buffers the second tuple as well.  When it sees the third

tuple, with estimated distance 2, it now knows that there won't be any

further tuples whose estimated or actual distance is less than 2.  So

now it can emit the first tuple that it saw, because that had an

actual distance of 2 and from this point forward the index will never

return anything with a smaller estimated or actual distance.  The

estimated-distance-1 tuple still has to stay in the buffer, though,

until we see a tuple whose estimated distance is greater than that

tuple's actual distance (3).

 

 

From: Regina Obe [mailto:[hidden email]]
Sent: Tuesday, January 03, 2017 1:53 PM
To: 'PostGIS Development Discussion' <[hidden email]>
Subject: RE: [postgis-devel] Recheck on Points broken

 

Nope not 100% sure was just a guess based on what Robert Haas said and my assumption that we don't generate boxes for points?

I thought we don't cause it's just baggage, which means we are using the point where as in other cases it is a box distance.

 

I did try to put a sniffer on where it's measuring distance adist[i], bdist[i] around 396 but always got a backend crash.

 

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/executor/nodeIndexscan.c;h=3143bd94ec4499fba94b41693538b785c4b32e6c;hb=HEAD#l396

 

372 static int

373 cmp_orderbyvals(const Datum *adist, const bool *anulls,

374                 const Datum *bdist, const bool *bnulls,

375                 IndexScanState *node)

376 {

377     int         i;

378     int         result;

379

 380     for (i = 0; i < node->iss_NumOrderByKeys; i++)

381     {

382         SortSupport ssup = &node->iss_SortSupport[i];

383

 384         /*

385          * Handle nulls.  We only need to support NULLS LAST ordering, because

386          * match_pathkeys_to_index() doesn't consider indexorderby

387          * implementation otherwise.

388          */

389         if (anulls[i] && !bnulls[i])

390             return 1;

391         else if (!anulls[i] && bnulls[i])

392             return -1;

393         else if (anulls[i] && bnulls[i])

394             return 0;

395

 396         result = ssup->comparator(adist[i], bdist[i], ssup);

397         if (result != 0)

398             return result;

399     }

400

 401     return 0;

402 }

 

You will probably have much better luck since I was just fumbling my way thru the code.

 

Thanks,

Regina

 

 

 

From: postgis-devel [[hidden email]] On Behalf Of Paul Ramsey
Sent: Tuesday, January 03, 2017 10:00 AM
To: PostGIS Development Discussion <[hidden email]>
Subject: Re: [postgis-devel] Recheck on Points broken

 

Are we 100% sure this is a float/double issue? The distance between two float boxes should always be *smaller* than the actual distance between two objects, since the float box extrema are always moved *away* from the original object when the float boxes are created. So from a "theoretical mechanics" perspective this shouldn't actually be the source of the problem.

 

From a real world point-of-view, how could it still be the problem?

- maybe the box generation code has a short circuit for points which doesn't do the correct thing? Pretty sure it doesn't, but.

- something else?

P

 

On Tue, Jan 3, 2017 at 4:54 AM, Regina Obe <[hidden email]> wrote:


>> Anyrate if there is nothing that can be done about ensuring index
>> distance is always less or equal to actual distance

> I remember we were all happy about having *real* distance with a more recent PostgreSQL. If that's not the case (as the user reports) we should try to find a way to provide both mechanisms.
> Two different operators, maybe ?
> --strk;

We've already got two different operators.  We have <#> which doesn't use recheck and that's what I told the user to use, until we resolve this.

The thing is the <-> I think pretty much for all reasonable purposes returns the right order for points (and in fact in this case yields the same value and <-> I think in all cases it will).

So it's just the float4/float8 mess why it screws up and it Errors noisily in PostgreSQL land where we cannot control.

I hate telling people to if you have point / point use <#> and if you have others you can safely use  <->.  That's sooo unfriendly.

I think the best thing is just to not have point-point checks use recheck.  Can we do that easily? That's what PostgreSQL point/point does.
That way we don't fall into the PostgreSQL box trap of

"Hey -- how come your index distance is bigger than your computed distance.  What kind of monster are you? Screw you!"

Thanks,
Regina


_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel

 


_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recheck on Points broken

Regina Obe-2
In reply to this post by Paul Ramsey-3

Okay I did some more snooping of the distance outputs.

 

Using this query that consistently fails:

 

create table test (geo geometry);

insert into test values

  ('0101000020E61000007D91D0967329E4BF6631B1F9B8D64A40'::geometry),

  ('0101000020E6100000E2AFC91AF510C1BFCDCCCCCCCCAC4A40'::geometry);

 

set enable_seqscan = false;

select *,  '0101000020E610000092054CE0D6DDE5BFCDCCCCCCCCAC4A40'::geometry <-> geo , ST_Distance('0101000020E610000092054CE0D6DDE5BFCDCCCCCCCCAC4A40'::geometry, geo)

from test ORDER BY geo <->

('0101000020E610000092054CE0D6DDE5BFCDCCCCCCCCAC4A40'::geometry);

 

     ?column?      |    st_distance

-------------------+-------------------

0.331823813642119 | 0.331823813642119

              0.55 |              0.55

(2 rows)

 

The issue seems to be with the second record that the approximate distance 0.5500000111921  is just a wee bit bigger than the actual distances.

Which as you said should never happen.

 

NOTICE:  bbox2df_distance: (0.331818528656)

NOTICE:  bbox2df_distance: (0.550000011921)

NOTICE:  index returned tuples in wrong order. cmp: -1

 

Does this create beaming lights in your head?

 

From: postgis-devel [mailto:[hidden email]] On Behalf Of Paul Ramsey
Sent: Tuesday, January 03, 2017 10:00 AM
To: PostGIS Development Discussion <[hidden email]>
Subject: Re: [postgis-devel] Recheck on Points broken

 

Are we 100% sure this is a float/double issue? The distance between two float boxes should always be *smaller* than the actual distance between two objects, since the float box extrema are always moved *away* from the original object when the float boxes are created. So from a "theoretical mechanics" perspective this shouldn't actually be the source of the problem.

 

From a real world point-of-view, how could it still be the problem?

- maybe the box generation code has a short circuit for points which doesn't do the correct thing? Pretty sure it doesn't, but.

- something else?

P

 

On Tue, Jan 3, 2017 at 4:54 AM, Regina Obe <[hidden email]> wrote:


>> Anyrate if there is nothing that can be done about ensuring index
>> distance is always less or equal to actual distance

> I remember we were all happy about having *real* distance with a more recent PostgreSQL. If that's not the case (as the user reports) we should try to find a way to provide both mechanisms.
> Two different operators, maybe ?
> --strk;

We've already got two different operators.  We have <#> which doesn't use recheck and that's what I told the user to use, until we resolve this.

The thing is the <-> I think pretty much for all reasonable purposes returns the right order for points (and in fact in this case yields the same value and <-> I think in all cases it will).

So it's just the float4/float8 mess why it screws up and it Errors noisily in PostgreSQL land where we cannot control.

I hate telling people to if you have point / point use <#> and if you have others you can safely use  <->.  That's sooo unfriendly.

I think the best thing is just to not have point-point checks use recheck.  Can we do that easily? That's what PostgreSQL point/point does.
That way we don't fall into the PostgreSQL box trap of

"Hey -- how come your index distance is bigger than your computed distance.  What kind of monster are you? Screw you!"

Thanks,
Regina


_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel

 


_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Loading...