I'd like to remove this Toast discussion

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

I'd like to remove this Toast discussion

Regina Obe-2
https://postgis.net/docs/performance_tips.html#small_tables_large_objects

This is the last bit of business before I can release PostGIS 2.3.3.

I'd like to get rid of this whole discussion of TOAST as I don't think it's
relevant anymore.

Am I correct in assuming that?

My assumption of why I think it's not relevant anymore is that we store
cached boxes for most geometries except for Points and 2 point lines.
So isn't it so that even if there is no index on a geometry, we do not need
to fully detoast (or possibly not to detoast at all) the geometry to read
the bounding box.
That it is part of the head of the geometry?  

I did a quick test of my states table - 52 records (granted they aren't
huge, but toast size is about 14 MB)

When I do something like below without forcing enable_seqscan = false; it
doesn't use an index (well sometimes it does even without urging)

SELECT stusps
from tiger_data.state_all
WHERE the_geom &&
ST_Buffer('0101000020AD100000242F4BF9462754C0BD0F7DA33F524340'::geometry,
5);

But whether it uses an index or not, the timing is still about the same -
under 10ms.

Though interestingly switching to

SELECT stusps
FROM tiger_data.state_all
WHERE ST_Intersects(the_geom,
ST_Buffer('0101000020AD100000242F4BF9462754C0BD0F7DA33F524340'::geometry,
5));

Does always force an index scan.


Thanks,
Regina


_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: I'd like to remove this Toast discussion

Paul Ramsey-3


On Thu, Jun 29, 2017 at 10:31 PM, Regina Obe <[hidden email]> wrote:
https://postgis.net/docs/performance_tips.html#small_tables_large_objects

This is the last bit of business before I can release PostGIS 2.3.3.

I'd like to get rid of this whole discussion of TOAST as I don't think it's
relevant anymore.

Am I correct in assuming that?

My assumption of why I think it's not relevant anymore is that we store
cached boxes for most geometries except for Points and 2 point lines.
So isn't it so that even if there is no index on a geometry, we do not need
to fully detoast (or possibly not to detoast at all) the geometry to read
the bounding box.
That it is part of the head of the geometry?

No, that is not so. If the object is large enough that it requires toasting, the whole thing is stuffed off to the side table and replaced with a reference in the main tuple. In order to get at that bbox on the header, we have to .... go to the toast tables and re-assemble the thing. 

"Worse", since we use compression on large objects, the object is compressed in the side tables and is pulled back in its entirety, then decompressed, then just the header is read.

Changing that would involve storing objects uncompressed and implementing any compression we wanted internal to the postgis library. So we'd have some logic to compress anything larger than N vertices, but would always leave the header uncompressed so we could peak at it for free.

That would *still* leave the toasting problem, which I'd have to re-confirm whether a pg_datum_slice cleanly pulls out just one piece of the toasted object or actually pulls the whole thing back first.

P.
 

I did a quick test of my states table - 52 records (granted they aren't
huge, but toast size is about 14 MB)

When I do something like below without forcing enable_seqscan = false; it
doesn't use an index (well sometimes it does even without urging)

SELECT stusps
from tiger_data.state_all
WHERE the_geom &&
ST_Buffer('0101000020AD100000242F4BF9462754C0BD0F7DA33F524340'::geometry,
5);

But whether it uses an index or not, the timing is still about the same -
under 10ms.

Though interestingly switching to

SELECT stusps
FROM tiger_data.state_all
WHERE ST_Intersects(the_geom,
ST_Buffer('0101000020AD100000242F4BF9462754C0BD0F7DA33F524340'::geometry,
5));

Does always force an index scan.


Thanks,
Regina


_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel


_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: I'd like to remove this Toast discussion

Javier Santana-2

When I implemented the st_getgeometrytype version [1] (that Paul finally merged into postgis master) it was faster but Paul is right. My proposal here would be to change the pglz_decompress [2] function, called from PG_DETOAST_DATUM_SLICE (which is the macro we use to get a slice of the data). 

The algorithm seems easy enough to change and it might get into postgres 10.

In any case, there are two things we could do to improve this:
1) use the same pglz_compress postgres has but calling it from postgis source and have a HEADER + COMPRESSED_GEOMETRY
2) if we do that, we could encode the geometry by component so it  compress better

On Fri, Jun 30, 2017 at 5:18 PM Paul Ramsey <[hidden email]> wrote:
On Thu, Jun 29, 2017 at 10:31 PM, Regina Obe <[hidden email]> wrote:
https://postgis.net/docs/performance_tips.html#small_tables_large_objects

This is the last bit of business before I can release PostGIS 2.3.3.

I'd like to get rid of this whole discussion of TOAST as I don't think it's
relevant anymore.

Am I correct in assuming that?

My assumption of why I think it's not relevant anymore is that we store
cached boxes for most geometries except for Points and 2 point lines.
So isn't it so that even if there is no index on a geometry, we do not need
to fully detoast (or possibly not to detoast at all) the geometry to read
the bounding box.
That it is part of the head of the geometry?

No, that is not so. If the object is large enough that it requires toasting, the whole thing is stuffed off to the side table and replaced with a reference in the main tuple. In order to get at that bbox on the header, we have to .... go to the toast tables and re-assemble the thing. 

"Worse", since we use compression on large objects, the object is compressed in the side tables and is pulled back in its entirety, then decompressed, then just the header is read.

Changing that would involve storing objects uncompressed and implementing any compression we wanted internal to the postgis library. So we'd have some logic to compress anything larger than N vertices, but would always leave the header uncompressed so we could peak at it for free.

That would *still* leave the toasting problem, which I'd have to re-confirm whether a pg_datum_slice cleanly pulls out just one piece of the toasted object or actually pulls the whole thing back first.

P.
 

I did a quick test of my states table - 52 records (granted they aren't
huge, but toast size is about 14 MB)

When I do something like below without forcing enable_seqscan = false; it
doesn't use an index (well sometimes it does even without urging)

SELECT stusps
from tiger_data.state_all
WHERE the_geom &&
ST_Buffer('0101000020AD100000242F4BF9462754C0BD0F7DA33F524340'::geometry,
5);

But whether it uses an index or not, the timing is still about the same -
under 10ms.

Though interestingly switching to

SELECT stusps
FROM tiger_data.state_all
WHERE ST_Intersects(the_geom,
ST_Buffer('0101000020AD100000242F4BF9462754C0BD0F7DA33F524340'::geometry,
5));

Does always force an index scan.


Thanks,
Regina


_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
--
-- CARTO CTO 

_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: I'd like to remove this Toast discussion

Paul Ramsey-3
I feel like if we did it PgSQL side we'd have to be further up the chain, working around the area of detoast_datum_slice. We might also need a change to the decompressor though, to support partial decompression. Annoying, I know I've looked at this problem several times and put it aside each time, but I cannot remember why :)

P

On Fri, Jun 30, 2017 at 8:44 AM, Javier Santana <[hidden email]> wrote:

When I implemented the st_getgeometrytype version [1] (that Paul finally merged into postgis master) it was faster but Paul is right. My proposal here would be to change the pglz_decompress [2] function, called from PG_DETOAST_DATUM_SLICE (which is the macro we use to get a slice of the data). 

The algorithm seems easy enough to change and it might get into postgres 10.

In any case, there are two things we could do to improve this:
1) use the same pglz_compress postgres has but calling it from postgis source and have a HEADER + COMPRESSED_GEOMETRY
2) if we do that, we could encode the geometry by component so it  compress better

On Fri, Jun 30, 2017 at 5:18 PM Paul Ramsey <[hidden email]> wrote:
On Thu, Jun 29, 2017 at 10:31 PM, Regina Obe <[hidden email]> wrote:
https://postgis.net/docs/performance_tips.html#small_tables_large_objects

This is the last bit of business before I can release PostGIS 2.3.3.

I'd like to get rid of this whole discussion of TOAST as I don't think it's
relevant anymore.

Am I correct in assuming that?

My assumption of why I think it's not relevant anymore is that we store
cached boxes for most geometries except for Points and 2 point lines.
So isn't it so that even if there is no index on a geometry, we do not need
to fully detoast (or possibly not to detoast at all) the geometry to read
the bounding box.
That it is part of the head of the geometry?

No, that is not so. If the object is large enough that it requires toasting, the whole thing is stuffed off to the side table and replaced with a reference in the main tuple. In order to get at that bbox on the header, we have to .... go to the toast tables and re-assemble the thing. 

"Worse", since we use compression on large objects, the object is compressed in the side tables and is pulled back in its entirety, then decompressed, then just the header is read.

Changing that would involve storing objects uncompressed and implementing any compression we wanted internal to the postgis library. So we'd have some logic to compress anything larger than N vertices, but would always leave the header uncompressed so we could peak at it for free.

That would *still* leave the toasting problem, which I'd have to re-confirm whether a pg_datum_slice cleanly pulls out just one piece of the toasted object or actually pulls the whole thing back first.

P.
 

I did a quick test of my states table - 52 records (granted they aren't
huge, but toast size is about 14 MB)

When I do something like below without forcing enable_seqscan = false; it
doesn't use an index (well sometimes it does even without urging)

SELECT stusps
from tiger_data.state_all
WHERE the_geom &&
ST_Buffer('0101000020AD100000242F4BF9462754C0BD0F7DA33F524340'::geometry,
5);

But whether it uses an index or not, the timing is still about the same -
under 10ms.

Though interestingly switching to

SELECT stusps
FROM tiger_data.state_all
WHERE ST_Intersects(the_geom,
ST_Buffer('0101000020AD100000242F4BF9462754C0BD0F7DA33F524340'::geometry,
5));

Does always force an index scan.


Thanks,
Regina


_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
--
-- CARTO CTO 

_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel


_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: I'd like to remove this Toast discussion

Javier Santana-2
Yes, we'd need to change all the way to pglz_compress. 

So maybe the second option would be better (implement postgis own compression/decompression)

On Fri, Jun 30, 2017 at 5:51 PM Paul Ramsey <[hidden email]> wrote:
I feel like if we did it PgSQL side we'd have to be further up the chain, working around the area of detoast_datum_slice. We might also need a change to the decompressor though, to support partial decompression. Annoying, I know I've looked at this problem several times and put it aside each time, but I cannot remember why :)


P

On Fri, Jun 30, 2017 at 8:44 AM, Javier Santana <[hidden email]> wrote:

When I implemented the st_getgeometrytype version [1] (that Paul finally merged into postgis master) it was faster but Paul is right. My proposal here would be to change the pglz_decompress [2] function, called from PG_DETOAST_DATUM_SLICE (which is the macro we use to get a slice of the data). 

The algorithm seems easy enough to change and it might get into postgres 10.

In any case, there are two things we could do to improve this:
1) use the same pglz_compress postgres has but calling it from postgis source and have a HEADER + COMPRESSED_GEOMETRY
2) if we do that, we could encode the geometry by component so it  compress better

On Fri, Jun 30, 2017 at 5:18 PM Paul Ramsey <[hidden email]> wrote:
On Thu, Jun 29, 2017 at 10:31 PM, Regina Obe <[hidden email]> wrote:
https://postgis.net/docs/performance_tips.html#small_tables_large_objects

This is the last bit of business before I can release PostGIS 2.3.3.

I'd like to get rid of this whole discussion of TOAST as I don't think it's
relevant anymore.

Am I correct in assuming that?

My assumption of why I think it's not relevant anymore is that we store
cached boxes for most geometries except for Points and 2 point lines.
So isn't it so that even if there is no index on a geometry, we do not need
to fully detoast (or possibly not to detoast at all) the geometry to read
the bounding box.
That it is part of the head of the geometry?

No, that is not so. If the object is large enough that it requires toasting, the whole thing is stuffed off to the side table and replaced with a reference in the main tuple. In order to get at that bbox on the header, we have to .... go to the toast tables and re-assemble the thing. 

"Worse", since we use compression on large objects, the object is compressed in the side tables and is pulled back in its entirety, then decompressed, then just the header is read.

Changing that would involve storing objects uncompressed and implementing any compression we wanted internal to the postgis library. So we'd have some logic to compress anything larger than N vertices, but would always leave the header uncompressed so we could peak at it for free.

That would *still* leave the toasting problem, which I'd have to re-confirm whether a pg_datum_slice cleanly pulls out just one piece of the toasted object or actually pulls the whole thing back first.

P.
 

I did a quick test of my states table - 52 records (granted they aren't
huge, but toast size is about 14 MB)

When I do something like below without forcing enable_seqscan = false; it
doesn't use an index (well sometimes it does even without urging)

SELECT stusps
from tiger_data.state_all
WHERE the_geom &&
ST_Buffer('0101000020AD100000242F4BF9462754C0BD0F7DA33F524340'::geometry,
5);

But whether it uses an index or not, the timing is still about the same -
under 10ms.

Though interestingly switching to

SELECT stusps
FROM tiger_data.state_all
WHERE ST_Intersects(the_geom,
ST_Buffer('0101000020AD100000242F4BF9462754C0BD0F7DA33F524340'::geometry,
5));

Does always force an index scan.


Thanks,
Regina


_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
--
-- CARTO CTO 

_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel

_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
--
-- CARTO CTO 

_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: I'd like to remove this Toast discussion

Regina Obe-2
In reply to this post by Paul Ramsey-3

You probably pushed it aside because all the solutions sound painful and in the large scheme of things, it probably isn't that big of a deal compared to other issues we have.

 

I actually think we make too big of a deal of it in the docs.  I don't think it affects that many people to warrant it being in front and center cover.

 

1)      First of all, am I correct in saying this issue only affects folks where they happen to have so few records relative toast size that the planner chooses not to use an index scan.

2)      For a table that big with so few records, I bet it is used often in which case, it is pushed into shared memory and already detoasted.  So the pain might only be felt once.

3)      I suspect index-only scan logic also mitigates some of the issue encouraging it in many cases to use the index.

 

Thanks,

Regina

 

 

From: postgis-devel [mailto:[hidden email]] On Behalf Of Paul Ramsey
Sent: Friday, June 30, 2017 11:52 AM
To: PostGIS Development Discussion <[hidden email]>
Subject: Re: [postgis-devel] I'd like to remove this Toast discussion

 

I feel like if we did it PgSQL side we'd have to be further up the chain, working around the area of detoast_datum_slice. We might also need a change to the decompressor though, to support partial decompression. Annoying, I know I've looked at this problem several times and put it aside each time, but I cannot remember why :)

P

 

On Fri, Jun 30, 2017 at 8:44 AM, Javier Santana <[hidden email]> wrote:

 

When I implemented the st_getgeometrytype version [1] (that Paul finally merged into postgis master) it was faster but Paul is right. My proposal here would be to change the pglz_decompress [2] function, called from PG_DETOAST_DATUM_SLICE (which is the macro we use to get a slice of the data). 

 

The algorithm seems easy enough to change and it might get into postgres 10.

 

In any case, there are two things we could do to improve this:
1) use the same pglz_compress postgres has but calling it from postgis source and have a HEADER + COMPRESSED_GEOMETRY
2) if we do that, we could encode the geometry by component so it  compress better

 

On Fri, Jun 30, 2017 at 5:18 PM Paul Ramsey <[hidden email]> wrote:

On Thu, Jun 29, 2017 at 10:31 PM, Regina Obe <[hidden email]> wrote:

https://postgis.net/docs/performance_tips.html#small_tables_large_objects

This is the last bit of business before I can release PostGIS 2.3.3.

I'd like to get rid of this whole discussion of TOAST as I don't think it's
relevant anymore.

Am I correct in assuming that?

My assumption of why I think it's not relevant anymore is that we store
cached boxes for most geometries except for Points and 2 point lines.
So isn't it so that even if there is no index on a geometry, we do not need
to fully detoast (or possibly not to detoast at all) the geometry to read
the bounding box.
That it is part of the head of the geometry?

 

No, that is not so. If the object is large enough that it requires toasting, the whole thing is stuffed off to the side table and replaced with a reference in the main tuple. In order to get at that bbox on the header, we have to .... go to the toast tables and re-assemble the thing. 

 

"Worse", since we use compression on large objects, the object is compressed in the side tables and is pulled back in its entirety, then decompressed, then just the header is read.

 

Changing that would involve storing objects uncompressed and implementing any compression we wanted internal to the postgis library. So we'd have some logic to compress anything larger than N vertices, but would always leave the header uncompressed so we could peak at it for free.

 

That would *still* leave the toasting problem, which I'd have to re-confirm whether a pg_datum_slice cleanly pulls out just one piece of the toasted object or actually pulls the whole thing back first.

 

P.

 


I did a quick test of my states table - 52 records (granted they aren't
huge, but toast size is about 14 MB)

When I do something like below without forcing enable_seqscan = false; it
doesn't use an index (well sometimes it does even without urging)

SELECT stusps
from tiger_data.state_all
WHERE the_geom &&
ST_Buffer('0101000020AD100000242F4BF9462754C0BD0F7DA33F524340'::geometry,
5);

But whether it uses an index or not, the timing is still about the same -
under 10ms.

Though interestingly switching to

SELECT stusps
FROM tiger_data.state_all
WHERE ST_Intersects(the_geom,
ST_Buffer('0101000020AD100000242F4BF9462754C0BD0F7DA33F524340'::geometry,
5));

Does always force an index scan.


Thanks,
Regina


_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel

_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel

--

-- CARTO CTO 


_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel

 


_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Loading...