Quantcast

raster loading and ST_Value performance

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

raster loading and ST_Value performance

Phil Hurvitz-2
Hello PostGIS raster experts,

I have loaded a raster representing slope for an area of Washington
State (USA). The source was a large Imagine (img/ige) file (about 4.3
GB, 31935 x 34649 pixels at 10 m pixel size), which I copied to the db
using

raster2pgsql  -I -C -e -Y -F -d -s 2926 -t 1000x1000 slope_ps.img
gis.slope | psql dbname

I used the tiling of 1000x1000 after numerous attempts to load the
raster without tiling, and using several different file types (TIFF,
ArcInfo grid, ArcInfo ASCII), each of which caused a Segmentation
Error/Core dump.

Now that I have a working raster, I can get a single point value using
this query:

SELECT rid,
ST_Value(rast, the_geom, false) FROM slope,
(SELECT the_geom FROM twins_withn_wa LIMIT 1) AS foo;

  rid |    st_value
-----+-----------------
  ... |
  589 | 8.1965799331665
  ... |
(1120 rows)
Time: 4768.648 ms

4 s seems like a long time to get one value from a raster. Here is the
output of EXPLAIN ANALYZE, which seems to show that the nested loop is
taking most of the time:

                                                          QUERY PLAN
----------------------------------------------------------------------
  Nested Loop  (cost=0.00..312.67 rows=1120 width=164) (actual
time=7.604..4736.852 rows=1120 loops=1)
    ->  Limit  (cost=0.00..0.26 rows=1 width=128) (actual
time=0.005..0.005 rows=1 loops=1)
          ->  Seq Scan on twins_withn_wa  (cost=0.00..3043.02 rows=11702
width=128) (actual time=0.004..0.004 rows=1 loops=1)
    ->  Seq Scan on slope  (cost=0.00..21.20 rows=1120 width=36) (actual
time=0.004..0.890 rows=1120 loops=1)
  Total runtime: 4737.278 ms
(5 rows)


I am running on a fairly beefy RHEL 6.3 machine with 16 G RAM (8 GB
allocated to PostgreSQL), PostGIS 2.0.1 on PostgreSQL 9.1.1:

select postgis_full_version();
postgis_full_version
-----------------------------------------------------------------------
  POSTGIS="2.0.1 r9979" GEOS="3.3.5-CAPI-1.7.5" PROJ="Rel. 4.7.1, 23
September 2009" GDAL="GDAL 1.9.0, released 2011/12/29" LIBXML="2.7.6" RASTER
(1 row)

select version();
                                                    version
--------------------------------------------------------------------------
  PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.5 20110214 (Red Hat 4.4.5-6), 64-bit
(1 row)

I have two questions:

Q1: What effect does different tile size have on storage and performance?

Q2: We will need to be getting raster values from a large number of
point tables over a large number of raster tables. Is there any method
to speed up the queries?

Any pointers will be greatly appreciated!

--
-P.

**************************************************************
Philip M. Hurvitz, PhD | Research Assistant Professor | UW-CBE
Urban Form Lab  | 1107 NE 45th Street, Suite 535  | Box 354802
University of Washington, Seattle, Washington  98195-4802, USA
[hidden email] | http://gis.washington.edu/phurvitz
"What is essential is invisible to the eye." -de Saint-Exupéry
**************************************************************
_______________________________________________
postgis-users mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: raster loading and ST_Value performance

Hugues François
Hello,

I'm not really what you can call an expert, but I can try to help you

Q1 : Tiling will help you if you make an index on your raster table (using GIST and st_convexhull). I don't see any index in what you described.

Q2 : Tiling +Index will speed up your query. You should also reduce time using st_intersects(rast,the_geom) in the where clause.

Hugues

PS : My mail problem should be resolved so if anyone see this mail as a new thread, please tell me.


-------- Message d'origine--------
De: [hidden email] de la part de Phil Hurvitz
Date: mer. 19/09/2012 03:11
À: [hidden email]
Objet : [postgis-users] raster loading and ST_Value performance
 
Hello PostGIS raster experts,

I have loaded a raster representing slope for an area of Washington
State (USA). The source was a large Imagine (img/ige) file (about 4.3
GB, 31935 x 34649 pixels at 10 m pixel size), which I copied to the db
using

raster2pgsql  -I -C -e -Y -F -d -s 2926 -t 1000x1000 slope_ps.img
gis.slope | psql dbname

I used the tiling of 1000x1000 after numerous attempts to load the
raster without tiling, and using several different file types (TIFF,
ArcInfo grid, ArcInfo ASCII), each of which caused a Segmentation
Error/Core dump.

Now that I have a working raster, I can get a single point value using
this query:

SELECT rid,
ST_Value(rast, the_geom, false) FROM slope,
(SELECT the_geom FROM twins_withn_wa LIMIT 1) AS foo;

  rid |    st_value
-----+-----------------
  ... |
  589 | 8.1965799331665
  ... |
(1120 rows)
Time: 4768.648 ms

4 s seems like a long time to get one value from a raster. Here is the
output of EXPLAIN ANALYZE, which seems to show that the nested loop is
taking most of the time:

                                                          QUERY PLAN
----------------------------------------------------------------------
  Nested Loop  (cost=0.00..312.67 rows=1120 width=164) (actual
time=7.604..4736.852 rows=1120 loops=1)
    ->  Limit  (cost=0.00..0.26 rows=1 width=128) (actual
time=0.005..0.005 rows=1 loops=1)
          ->  Seq Scan on twins_withn_wa  (cost=0.00..3043.02 rows=11702
width=128) (actual time=0.004..0.004 rows=1 loops=1)
    ->  Seq Scan on slope  (cost=0.00..21.20 rows=1120 width=36) (actual
time=0.004..0.890 rows=1120 loops=1)
  Total runtime: 4737.278 ms
(5 rows)


I am running on a fairly beefy RHEL 6.3 machine with 16 G RAM (8 GB
allocated to PostgreSQL), PostGIS 2.0.1 on PostgreSQL 9.1.1:

select postgis_full_version();
postgis_full_version
-----------------------------------------------------------------------
  POSTGIS="2.0.1 r9979" GEOS="3.3.5-CAPI-1.7.5" PROJ="Rel. 4.7.1, 23
September 2009" GDAL="GDAL 1.9.0, released 2011/12/29" LIBXML="2.7.6" RASTER
(1 row)

select version();
                                                    version
--------------------------------------------------------------------------
  PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.5 20110214 (Red Hat 4.4.5-6), 64-bit
(1 row)

I have two questions:

Q1: What effect does different tile size have on storage and performance?

Q2: We will need to be getting raster values from a large number of
point tables over a large number of raster tables. Is there any method
to speed up the queries?

Any pointers will be greatly appreciated!

--
-P.

**************************************************************
Philip M. Hurvitz, PhD | Research Assistant Professor | UW-CBE
Urban Form Lab  | 1107 NE 45th Street, Suite 535  | Box 354802
University of Washington, Seattle, Washington  98195-4802, USA
[hidden email] | http://gis.washington.edu/phurvitz
"What is essential is invisible to the eye." -de Saint-Exupéry
**************************************************************
_______________________________________________
postgis-users mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-users

winmail.dat (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: raster loading and ST_Value performance

Pierre Racine-2
In reply to this post by Phil Hurvitz-2
> Q1: What effect does different tile size have on storage and performance?

Biggest tiles = longer to load in memory and less efficient indexing

I suggest you experiment with much smaller tiles: 100x100 and 10x10

> Q2: We will need to be getting raster values from a large number of
> point tables over a large number of raster tables. Is there any method
> to speed up the queries?

If your rasters are all of the same size and cover the same footprint I would suggest to make them different bands of the same raster. So for the same point you will search for the right tile only once.

Pierre
_______________________________________________
postgis-users mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: raster loading and ST_Value performance (summary)

Phil Hurvitz-2
In reply to this post by Phil Hurvitz-2
Thanks to "Francois Hugues" <[hidden email]> and Pierre
Racine <[hidden email]> for their excellent suggestions.

 > I have two questions:
 >
 > Q1: What effect does different tile size have on storage and performance?
 >
 > Q2: We will need to be getting raster values from a large number of
 > point tables over a large number of raster tables. Is there any
 > method to speed up the queries?

Hugues suggested:

> Q1 : Tiling will help you if you make an index on your raster table (using GIST and st_convexhull). I don't see any index in what you described.

The "-I" flag in raster2pgsql created the indexes, so all my test
rasters had indexes built at load time.

> Q2 : Tiling +Index will speed up your query. You should also reduce time using st_intersects(rast,the_geom) in the where clause.

Using the ST_Intersects sped things up greatly, from >4 s to 30 ms for
one point. Wow!

SELECT rid, ST_Value(rast, the_geom, false) FROM slope, (SELECT the_geom
FROM twins_withn_wa LIMIT 1) AS foo WHERE ST_Intersects(rast, the_geom);
  rid |    st_value
-----+-----------------
  589 | 8.1965799331665
(1 row)

Time: 30.720 ms

Pierre suggested:

>> Q1: What effect does different tile size have on storage and performance?
>
> Biggest tiles = longer to load in memory and less efficient indexing
>
> I suggest you experiment with much smaller tiles: 100x100 and 10x10

Smaller tiles also helped speed things up. The rasters are shown with
names corresponding to the tile size (1000, 100, and 10). Getting ~9000
point values with the 1000x1000 raster took about 42 s (pretty fast with
Hugues' ST_Intersects suggestion), but for the 10x10 tile, the values
were returned in about 1 s. Wow wow!

SELECT rid, ST_Value(rast, the_geom, true) FROM slope1000,
(SELECT the_geom FROM twins_withn_wa) AS foo
WHERE ST_Intersects(rast, the_geom);
  rid  |       st_value
------+----------------------
    11 |     3.69029211997986
    11 |     1.24964261054993
  ...
Time: 42094.863 ms

SELECT rid, ST_Value(rast, the_geom, true) FROM slope100,
(SELECT the_geom FROM twins_withn_wa) AS foo
WHERE ST_Intersects(rast, the_geom);
   rid   |       st_value
--------+----------------------
   58365 |      8.1965799331665
   36233 |
  ...
Time: 1606.054 ms

SELECT rid, ST_Value(rast, the_geom, true) FROM slope10,
(SELECT the_geom FROM twins_withn_wa) AS foo
WHERE ST_Intersects(rast, the_geom);
    rid    |       st_value
----------+----------------------
   5843074 |      8.1965799331665
   3619526 |
  ...
Time: 1099.805 ms


>> Q2: We will need to be getting raster values from a large number of
>> point tables over a large number of raster tables. Is there any method
>> to speed up the queries?
>
> If your rasters are all of the same size and cover the same footprint I would suggest to make them different bands of the same raster. So for the same point you will search for the right tile only once.

I will keep that suggestion for future use. We will have that situation
for some of our data. In any case, if I had to wait only 1 s for each
query, that would be sufficient without needing to band the raster data.

-P.

**************************************************************
Philip M. Hurvitz, PhD | Research Assistant Professor | UW-CBE
Urban Form Lab  | 1107 NE 45th Street, Suite 535  | Box 354802
University of Washington, Seattle, Washington  98195-4802, USA
[hidden email] | http://gis.washington.edu/phurvitz
"What is essential is invisible to the eye." -de Saint-Exupéry
**************************************************************

On 2012-09-18 6:11 PM, Phil Hurvitz wrote:

> Hello PostGIS raster experts,
>
> I have loaded a raster representing slope for an area of Washington
> State (USA). The source was a large Imagine (img/ige) file (about 4.3
> GB, 31935 x 34649 pixels at 10 m pixel size), which I copied to the db
> using
>
> raster2pgsql  -I -C -e -Y -F -d -s 2926 -t 1000x1000 slope_ps.img
> gis.slope | psql dbname
>
> I used the tiling of 1000x1000 after numerous attempts to load the
> raster without tiling, and using several different file types (TIFF,
> ArcInfo grid, ArcInfo ASCII), each of which caused a Segmentation
> Error/Core dump.
>
> Now that I have a working raster, I can get a single point value using
> this query:
>
> SELECT rid,
> ST_Value(rast, the_geom, false) FROM slope,
> (SELECT the_geom FROM twins_withn_wa LIMIT 1) AS foo;
>
>   rid |    st_value
> -----+-----------------
>   ... |
>   589 | 8.1965799331665
>   ... |
> (1120 rows)
> Time: 4768.648 ms
>
> 4 s seems like a long time to get one value from a raster. Here is the
> output of EXPLAIN ANALYZE, which seems to show that the nested loop is
> taking most of the time:
>
>                                                           QUERY PLAN
> ----------------------------------------------------------------------
>   Nested Loop  (cost=0.00..312.67 rows=1120 width=164) (actual
> time=7.604..4736.852 rows=1120 loops=1)
>     ->  Limit  (cost=0.00..0.26 rows=1 width=128) (actual
> time=0.005..0.005 rows=1 loops=1)
>           ->  Seq Scan on twins_withn_wa  (cost=0.00..3043.02 rows=11702
> width=128) (actual time=0.004..0.004 rows=1 loops=1)
>     ->  Seq Scan on slope  (cost=0.00..21.20 rows=1120 width=36) (actual
> time=0.004..0.890 rows=1120 loops=1)
>   Total runtime: 4737.278 ms
> (5 rows)
>
>
> I am running on a fairly beefy RHEL 6.3 machine with 16 G RAM (8 GB
> allocated to PostgreSQL), PostGIS 2.0.1 on PostgreSQL 9.1.1:
>
> select postgis_full_version();
> postgis_full_version
> -----------------------------------------------------------------------
>   POSTGIS="2.0.1 r9979" GEOS="3.3.5-CAPI-1.7.5" PROJ="Rel. 4.7.1, 23
> September 2009" GDAL="GDAL 1.9.0, released 2011/12/29" LIBXML="2.7.6"
> RASTER
> (1 row)
>
> select version();
>                                                     version
> --------------------------------------------------------------------------
>   PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
> 4.4.5 20110214 (Red Hat 4.4.5-6), 64-bit
> (1 row)
>
> I have two questions:
>
> Q1: What effect does different tile size have on storage and performance?
>
> Q2: We will need to be getting raster values from a large number of
> point tables over a large number of raster tables. Is there any method
> to speed up the queries?
>
> Any pointers will be greatly appreciated!
>
_______________________________________________
postgis-users mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-users
Loading...