Hello PostGIS raster experts,
I have loaded a raster representing slope for an area of Washington State (USA). The source was a large Imagine (img/ige) file (about 4.3 GB, 31935 x 34649 pixels at 10 m pixel size), which I copied to the db using raster2pgsql -I -C -e -Y -F -d -s 2926 -t 1000x1000 slope_ps.img gis.slope | psql dbname I used the tiling of 1000x1000 after numerous attempts to load the raster without tiling, and using several different file types (TIFF, ArcInfo grid, ArcInfo ASCII), each of which caused a Segmentation Error/Core dump. Now that I have a working raster, I can get a single point value using this query: SELECT rid, ST_Value(rast, the_geom, false) FROM slope, (SELECT the_geom FROM twins_withn_wa LIMIT 1) AS foo; rid | st_value -----+----------------- ... | 589 | 8.1965799331665 ... | (1120 rows) Time: 4768.648 ms 4 s seems like a long time to get one value from a raster. Here is the output of EXPLAIN ANALYZE, which seems to show that the nested loop is taking most of the time: QUERY PLAN ---------------------------------------------------------------------- Nested Loop (cost=0.00..312.67 rows=1120 width=164) (actual time=7.604..4736.852 rows=1120 loops=1) -> Limit (cost=0.00..0.26 rows=1 width=128) (actual time=0.005..0.005 rows=1 loops=1) -> Seq Scan on twins_withn_wa (cost=0.00..3043.02 rows=11702 width=128) (actual time=0.004..0.004 rows=1 loops=1) -> Seq Scan on slope (cost=0.00..21.20 rows=1120 width=36) (actual time=0.004..0.890 rows=1120 loops=1) Total runtime: 4737.278 ms (5 rows) I am running on a fairly beefy RHEL 6.3 machine with 16 G RAM (8 GB allocated to PostgreSQL), PostGIS 2.0.1 on PostgreSQL 9.1.1: select postgis_full_version(); postgis_full_version ----------------------------------------------------------------------- POSTGIS="2.0.1 r9979" GEOS="3.3.5-CAPI-1.7.5" PROJ="Rel. 4.7.1, 23 September 2009" GDAL="GDAL 1.9.0, released 2011/12/29" LIBXML="2.7.6" RASTER (1 row) select version(); version -------------------------------------------------------------------------- PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.5 20110214 (Red Hat 4.4.5-6), 64-bit (1 row) I have two questions: Q1: What effect does different tile size have on storage and performance? Q2: We will need to be getting raster values from a large number of point tables over a large number of raster tables. Is there any method to speed up the queries? Any pointers will be greatly appreciated! -- -P. ************************************************************** Philip M. Hurvitz, PhD | Research Assistant Professor | UW-CBE Urban Form Lab | 1107 NE 45th Street, Suite 535 | Box 354802 University of Washington, Seattle, Washington 98195-4802, USA [hidden email] | http://gis.washington.edu/phurvitz "What is essential is invisible to the eye." -de Saint-Exupéry ************************************************************** _______________________________________________ postgis-users mailing list [hidden email] http://postgis.refractions.net/mailman/listinfo/postgis-users |
Hello,
I'm not really what you can call an expert, but I can try to help you Q1 : Tiling will help you if you make an index on your raster table (using GIST and st_convexhull). I don't see any index in what you described. Q2 : Tiling +Index will speed up your query. You should also reduce time using st_intersects(rast,the_geom) in the where clause. Hugues PS : My mail problem should be resolved so if anyone see this mail as a new thread, please tell me. -------- Message d'origine-------- De: [hidden email] de la part de Phil Hurvitz Date: mer. 19/09/2012 03:11 À: [hidden email] Objet : [postgis-users] raster loading and ST_Value performance Hello PostGIS raster experts, I have loaded a raster representing slope for an area of Washington State (USA). The source was a large Imagine (img/ige) file (about 4.3 GB, 31935 x 34649 pixels at 10 m pixel size), which I copied to the db using raster2pgsql -I -C -e -Y -F -d -s 2926 -t 1000x1000 slope_ps.img gis.slope | psql dbname I used the tiling of 1000x1000 after numerous attempts to load the raster without tiling, and using several different file types (TIFF, ArcInfo grid, ArcInfo ASCII), each of which caused a Segmentation Error/Core dump. Now that I have a working raster, I can get a single point value using this query: SELECT rid, ST_Value(rast, the_geom, false) FROM slope, (SELECT the_geom FROM twins_withn_wa LIMIT 1) AS foo; rid | st_value -----+----------------- ... | 589 | 8.1965799331665 ... | (1120 rows) Time: 4768.648 ms 4 s seems like a long time to get one value from a raster. Here is the output of EXPLAIN ANALYZE, which seems to show that the nested loop is taking most of the time: QUERY PLAN ---------------------------------------------------------------------- Nested Loop (cost=0.00..312.67 rows=1120 width=164) (actual time=7.604..4736.852 rows=1120 loops=1) -> Limit (cost=0.00..0.26 rows=1 width=128) (actual time=0.005..0.005 rows=1 loops=1) -> Seq Scan on twins_withn_wa (cost=0.00..3043.02 rows=11702 width=128) (actual time=0.004..0.004 rows=1 loops=1) -> Seq Scan on slope (cost=0.00..21.20 rows=1120 width=36) (actual time=0.004..0.890 rows=1120 loops=1) Total runtime: 4737.278 ms (5 rows) I am running on a fairly beefy RHEL 6.3 machine with 16 G RAM (8 GB allocated to PostgreSQL), PostGIS 2.0.1 on PostgreSQL 9.1.1: select postgis_full_version(); postgis_full_version ----------------------------------------------------------------------- POSTGIS="2.0.1 r9979" GEOS="3.3.5-CAPI-1.7.5" PROJ="Rel. 4.7.1, 23 September 2009" GDAL="GDAL 1.9.0, released 2011/12/29" LIBXML="2.7.6" RASTER (1 row) select version(); version -------------------------------------------------------------------------- PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.5 20110214 (Red Hat 4.4.5-6), 64-bit (1 row) I have two questions: Q1: What effect does different tile size have on storage and performance? Q2: We will need to be getting raster values from a large number of point tables over a large number of raster tables. Is there any method to speed up the queries? Any pointers will be greatly appreciated! -- -P. ************************************************************** Philip M. Hurvitz, PhD | Research Assistant Professor | UW-CBE Urban Form Lab | 1107 NE 45th Street, Suite 535 | Box 354802 University of Washington, Seattle, Washington 98195-4802, USA [hidden email] | http://gis.washington.edu/phurvitz "What is essential is invisible to the eye." -de Saint-Exupéry ************************************************************** _______________________________________________ postgis-users mailing list [hidden email] http://postgis.refractions.net/mailman/listinfo/postgis-users _______________________________________________ postgis-users mailing list [hidden email] http://postgis.refractions.net/mailman/listinfo/postgis-users |
In reply to this post by Philip M. Hurvitz
> Q1: What effect does different tile size have on storage and performance?
Biggest tiles = longer to load in memory and less efficient indexing I suggest you experiment with much smaller tiles: 100x100 and 10x10 > Q2: We will need to be getting raster values from a large number of > point tables over a large number of raster tables. Is there any method > to speed up the queries? If your rasters are all of the same size and cover the same footprint I would suggest to make them different bands of the same raster. So for the same point you will search for the right tile only once. Pierre _______________________________________________ postgis-users mailing list [hidden email] http://postgis.refractions.net/mailman/listinfo/postgis-users |
In reply to this post by Philip M. Hurvitz
Thanks to "Francois Hugues" <[hidden email]> and Pierre
Racine <[hidden email]> for their excellent suggestions. > I have two questions: > > Q1: What effect does different tile size have on storage and performance? > > Q2: We will need to be getting raster values from a large number of > point tables over a large number of raster tables. Is there any > method to speed up the queries? Hugues suggested: > Q1 : Tiling will help you if you make an index on your raster table (using GIST and st_convexhull). I don't see any index in what you described. The "-I" flag in raster2pgsql created the indexes, so all my test rasters had indexes built at load time. > Q2 : Tiling +Index will speed up your query. You should also reduce time using st_intersects(rast,the_geom) in the where clause. Using the ST_Intersects sped things up greatly, from >4 s to 30 ms for one point. Wow! SELECT rid, ST_Value(rast, the_geom, false) FROM slope, (SELECT the_geom FROM twins_withn_wa LIMIT 1) AS foo WHERE ST_Intersects(rast, the_geom); rid | st_value -----+----------------- 589 | 8.1965799331665 (1 row) Time: 30.720 ms Pierre suggested: >> Q1: What effect does different tile size have on storage and performance? > > Biggest tiles = longer to load in memory and less efficient indexing > > I suggest you experiment with much smaller tiles: 100x100 and 10x10 Smaller tiles also helped speed things up. The rasters are shown with names corresponding to the tile size (1000, 100, and 10). Getting ~9000 point values with the 1000x1000 raster took about 42 s (pretty fast with Hugues' ST_Intersects suggestion), but for the 10x10 tile, the values were returned in about 1 s. Wow wow! SELECT rid, ST_Value(rast, the_geom, true) FROM slope1000, (SELECT the_geom FROM twins_withn_wa) AS foo WHERE ST_Intersects(rast, the_geom); rid | st_value ------+---------------------- 11 | 3.69029211997986 11 | 1.24964261054993 ... Time: 42094.863 ms SELECT rid, ST_Value(rast, the_geom, true) FROM slope100, (SELECT the_geom FROM twins_withn_wa) AS foo WHERE ST_Intersects(rast, the_geom); rid | st_value --------+---------------------- 58365 | 8.1965799331665 36233 | ... Time: 1606.054 ms SELECT rid, ST_Value(rast, the_geom, true) FROM slope10, (SELECT the_geom FROM twins_withn_wa) AS foo WHERE ST_Intersects(rast, the_geom); rid | st_value ----------+---------------------- 5843074 | 8.1965799331665 3619526 | ... Time: 1099.805 ms >> Q2: We will need to be getting raster values from a large number of >> point tables over a large number of raster tables. Is there any method >> to speed up the queries? > > If your rasters are all of the same size and cover the same footprint I would suggest to make them different bands of the same raster. So for the same point you will search for the right tile only once. I will keep that suggestion for future use. We will have that situation for some of our data. In any case, if I had to wait only 1 s for each query, that would be sufficient without needing to band the raster data. -P. ************************************************************** Philip M. Hurvitz, PhD | Research Assistant Professor | UW-CBE Urban Form Lab | 1107 NE 45th Street, Suite 535 | Box 354802 University of Washington, Seattle, Washington 98195-4802, USA [hidden email] | http://gis.washington.edu/phurvitz "What is essential is invisible to the eye." -de Saint-Exupéry ************************************************************** On 2012-09-18 6:11 PM, Phil Hurvitz wrote: > Hello PostGIS raster experts, > > I have loaded a raster representing slope for an area of Washington > State (USA). The source was a large Imagine (img/ige) file (about 4.3 > GB, 31935 x 34649 pixels at 10 m pixel size), which I copied to the db > using > > raster2pgsql -I -C -e -Y -F -d -s 2926 -t 1000x1000 slope_ps.img > gis.slope | psql dbname > > I used the tiling of 1000x1000 after numerous attempts to load the > raster without tiling, and using several different file types (TIFF, > ArcInfo grid, ArcInfo ASCII), each of which caused a Segmentation > Error/Core dump. > > Now that I have a working raster, I can get a single point value using > this query: > > SELECT rid, > ST_Value(rast, the_geom, false) FROM slope, > (SELECT the_geom FROM twins_withn_wa LIMIT 1) AS foo; > > rid | st_value > -----+----------------- > ... | > 589 | 8.1965799331665 > ... | > (1120 rows) > Time: 4768.648 ms > > 4 s seems like a long time to get one value from a raster. Here is the > output of EXPLAIN ANALYZE, which seems to show that the nested loop is > taking most of the time: > > QUERY PLAN > ---------------------------------------------------------------------- > Nested Loop (cost=0.00..312.67 rows=1120 width=164) (actual > time=7.604..4736.852 rows=1120 loops=1) > -> Limit (cost=0.00..0.26 rows=1 width=128) (actual > time=0.005..0.005 rows=1 loops=1) > -> Seq Scan on twins_withn_wa (cost=0.00..3043.02 rows=11702 > width=128) (actual time=0.004..0.004 rows=1 loops=1) > -> Seq Scan on slope (cost=0.00..21.20 rows=1120 width=36) (actual > time=0.004..0.890 rows=1120 loops=1) > Total runtime: 4737.278 ms > (5 rows) > > > I am running on a fairly beefy RHEL 6.3 machine with 16 G RAM (8 GB > allocated to PostgreSQL), PostGIS 2.0.1 on PostgreSQL 9.1.1: > > select postgis_full_version(); > postgis_full_version > ----------------------------------------------------------------------- > POSTGIS="2.0.1 r9979" GEOS="3.3.5-CAPI-1.7.5" PROJ="Rel. 4.7.1, 23 > September 2009" GDAL="GDAL 1.9.0, released 2011/12/29" LIBXML="2.7.6" > RASTER > (1 row) > > select version(); > version > -------------------------------------------------------------------------- > PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) > 4.4.5 20110214 (Red Hat 4.4.5-6), 64-bit > (1 row) > > I have two questions: > > Q1: What effect does different tile size have on storage and performance? > > Q2: We will need to be getting raster values from a large number of > point tables over a large number of raster tables. Is there any method > to speed up the queries? > > Any pointers will be greatly appreciated! > postgis-users mailing list [hidden email] http://postgis.refractions.net/mailman/listinfo/postgis-users |
Free forum by Nabble | Edit this page |