CPU tuning

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

CPU tuning

Olivier Leprêtre

Hi,

 

I have a pgsql postgis function that last about an hour on an Hp Envy (W10, i7-6500 CPU 2,5 Ghz 8go). This function is calculating minimum distance between each nodes of table x (12000 nodes) and table y (42000 nodes)

 

FOR row IN

                   SELECT code,ST_AsEwkt(ST_StartPoint(geom)) as geom1  FROM t_x

LOOP

                   FOR row1 IN

                            SELECT code, ST_Distance(ST_AsEwkt(geom)::text,row.geom1) FROM t_y order by ST_Distance(ST_AsEwkt(geom)::text,row.geom1) LIMIT 1

 

                   LOOP

                   END LOOP;

....

 

 

What I don't understand is that postgresql server process CPU is stable during all that time at 30/31% with no other process running. Details of each processor 1 to 3 shows that all 4 uses 30%. Disk and network usage are 0% (database is local) Ram usage is 47% and Postgresql server uses process only 7,4 Mo.

 

My question is "Why postgres process uses such a precise and constant cpu and doesn't grow up to more than 30% ?" I didn't find any parameter forcing such a limit in order to reduce duration of this calculation.

 

Thanks for any answer,

 

Olivier




Avast logo

L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
www.avast.com



_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CPU tuning

Rémi Cura
Hey,
I
​'m afraid you may not use the most efficient approach.

Assuming you want to find for each node of table x the closest node of table y,
it takes less than 1 second on my computer.



DROP TABLE  IF EXISTS  test_1;
CREATE TABLE test_1 AS
SELECT s AS gid, ST_makePoint(random()*1000,random()*1000) AS geom
FROM generate_series(1,12000) AS s ;
CREATE INDEX ON test_1 USING GIST(geom) ;
ALTER TABLE test_1 ADD PRIMARY KEY (gid) ;


DROP TABLE  IF EXISTS  test_2;
CREATE TABLE test_2 AS
SELECT s AS gid, ST_makePoint( random()*1000,random()*1000) AS geom
FROM generate_series(1,50000) AS s ;
CREATE INDEX ON test_2 USING GIST(geom) ;
ALTER TABLE test_2 ADD PRIMARY KEY (gid) ; 

-- option 1 : you have lots of ram : brute force
DROP TABLE IF EXISTS test_results ;
CREATE TABLE test_results AS
SELECT DISTINCT ON (test_1.gid ) test_1.gid AS gid1, test_2.gid AS gid2-- , ST_MakeLine(test_1.geom,test_2.geom) AS sline
FROM test_1, test_2
ORDER BY test_1.gid ,ST_Distance(test_1.geom,test_2.geom) ASC ;


--option2  : you have limited ram : snipping : force index use

DROP FUNCTION IF EXISTS test_sdist(geom1 geometry) ;
CREATE OR REPLACE FUNCTION test_sdist(geom1 geometry, OUT gid2 bigint, OUT geom2 geometry) AS
 $$
    --given a point, find the closes one in test_2
    DECLARE
    BEGIN
        SELECT gid, geom INTO gid2, geom2
        FROM test_2
        ORDER BY ST_Distance(geom2,test_2.geom) ASC
        LIMIT 1  ;
        RETURN ;
    END;
$$ LANGUAGE plpgsql;

DROP TABLE IF EXISTS test_result_2 ;
CREATE TABLE IF NOT EXISTS test_result_2 AS
SELECT test_1.gid AS gid1, f.gid2 AS gid2
FROM test_1, test_sdist(test_1.geom) AS f

--option 3 : you could use a recursive CTE to avoid creating a dedicated dummy function,
--           this would be a pain to write though


Cheers,
Rémi-C​


2017-01-08 16:36 GMT+01:00 Olivier Leprêtre <[hidden email]>:

Hi,

 

I have a pgsql postgis function that last about an hour on an Hp Envy (W10, i7-6500 CPU 2,5 Ghz 8go). This function is calculating minimum distance between each nodes of table x (12000 nodes) and table y (42000 nodes)

 

FOR row IN

                   SELECT code,ST_AsEwkt(ST_StartPoint(geom)) as geom1  FROM t_x

LOOP

                   FOR row1 IN

                            SELECT code, ST_Distance(ST_AsEwkt(geom)::text,row.geom1) FROM t_y order by ST_Distance(ST_AsEwkt(geom)::text,row.geom1) LIMIT 1

 

                   LOOP

                   END LOOP;

....

 

 

What I don't understand is that postgresql server process CPU is stable during all that time at 30/31% with no other process running. Details of each processor 1 to 3 shows that all 4 uses 30%. Disk and network usage are 0% (database is local) Ram usage is 47% and Postgresql server uses process only 7,4 Mo.

 

My question is "Why postgres process uses such a precise and constant cpu and doesn't grow up to more than 30% ?" I didn't find any parameter forcing such a limit in order to reduce duration of this calculation.

 

Thanks for any answer,

 

Olivier




Avast logo

L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
www.avast.com



_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CPU tuning

Olivier Leprêtre

Hi,

 

Thanks for your answer and this way of calculation. I don't know which machine you have but on mine, Option 1 launched from pgadmin takes 2,564,423 ms  (about 45 mn) a bit less than my original code (2,823,272 ms). So if this takes less than 1 second on yours, there is definitively a problem on my configuration or my machine which is nevertheless a very slow one : Hp Envy (Windows 10, i7-6500 CPU 2,5 Ghz 8go). I'm using postgresql/postgis 9.4 with its standard configuration.

 

During the option 1 query, the processor used again 30/35% CPU during all time. Difference is about memory, it grows up to 1Gb (shared buffer is set to this value) and then begin using disk about 10 Mo/s.  So I'm still wondering why the cpu doesn't go further than 30% and why this take such a difference on your machine. Is it a linux OS ?

 

Olivier

 

 

De : postgis-users [mailto:[hidden email]] De la part de Rémi Cura
Envoyé : lundi 9 janvier 2017 10:52
À : PostGIS Users Discussion
Objet : Re: [postgis-users] CPU tuning

 

Hey,

I

'm afraid you may not use the most efficient approach.

Assuming you want to find for each node of table x the closest node of table y,

it takes less than 1 second on my computer.




DROP TABLE  IF EXISTS  test_1;
CREATE TABLE test_1 AS
SELECT s AS gid, ST_makePoint(random()*1000,random()*1000) AS geom
FROM generate_series(1,12000) AS s ;
CREATE INDEX ON test_1 USING GIST(geom) ;
ALTER TABLE test_1 ADD PRIMARY KEY (gid) ;


DROP TABLE  IF EXISTS  test_2;
CREATE TABLE test_2 AS
SELECT s AS gid, ST_makePoint( random()*1000,random()*1000) AS geom
FROM generate_series(1,50000) AS s ;
CREATE INDEX ON test_2 USING GIST(geom) ;
ALTER TABLE test_2 ADD PRIMARY KEY (gid) ; 

-- option 1 : you have lots of ram : brute force
DROP TABLE IF EXISTS test_results ;
CREATE TABLE test_results AS
SELECT DISTINCT ON (test_1.gid ) test_1.gid AS gid1, test_2.gid AS gid2-- , ST_MakeLine(test_1.geom,test_2.geom) AS sline
FROM test_1, test_2
ORDER BY test_1.gid ,ST_Distance(test_1.geom,test_2.geom) ASC ;


--option2  : you have limited ram : snipping : force index use

DROP FUNCTION IF EXISTS test_sdist(geom1 geometry) ;
CREATE OR REPLACE FUNCTION test_sdist(geom1 geometry, OUT gid2 bigint, OUT geom2 geometry) AS
 $$
    --given a point, find the closes one in test_2
    DECLARE
    BEGIN
        SELECT gid, geom INTO gid2, geom2
        FROM test_2
        ORDER BY ST_Distance(geom2,test_2.geom) ASC
        LIMIT 1  ;
        RETURN ;
    END;
$$ LANGUAGE plpgsql;

DROP TABLE IF EXISTS test_result_2 ;
CREATE TABLE IF NOT EXISTS test_result_2 AS
SELECT test_1.gid AS gid1, f.gid2 AS gid2
FROM test_1, test_sdist(test_1.geom) AS f

--option 3 : you could use a recursive CTE to avoid creating a dedicated dummy function,

--           this would be a pain to write though

Cheers,

Rémi-C

 

 

2017-01-08 16:36 GMT+01:00 Olivier Leprêtre <[hidden email]>:

Hi,

 

I have a pgsql postgis function that last about an hour on an Hp Envy (W10, i7-6500 CPU 2,5 Ghz 8go). This function is calculating minimum distance between each nodes of table x (12000 nodes) and table y (42000 nodes)

 

FOR row IN

                   SELECT code,ST_AsEwkt(ST_StartPoint(geom)) as geom1  FROM t_x

LOOP

                   FOR row1 IN

                            SELECT code, ST_Distance(ST_AsEwkt(geom)::text,row.geom1) FROM t_y order by ST_Distance(ST_AsEwkt(geom)::text,row.geom1) LIMIT 1

 

                   LOOP

                   END LOOP;

....

 

 

What I don't understand is that postgresql server process CPU is stable during all that time at 30/31% with no other process running. Details of each processor 1 to 3 shows that all 4 uses 30%. Disk and network usage are 0% (database is local) Ram usage is 47% and Postgresql server uses process only 7,4 Mo.

 

My question is "Why postgres process uses such a precise and constant cpu and doesn't grow up to more than 30% ?" I didn't find any parameter forcing such a limit in order to reduce duration of this calculation.

 

Thanks for any answer,

 

Olivier

 


Avast logo

L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
www.avast.com

 


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users

 




Avast logo

L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
www.avast.com



_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CPU tuning

Rémi Cura
Hey,
option 2 is the fastest one ;-)
, option 1 is simply a pure SQL rewrite of your query.
The machine I used for test is way weaker than yours (I'd say 2x weaker).


postgres 9.4 is non-parallelised, what you see is that postgres use 100% of a core, but the core beig used is rotated,
which mean the average usage of your cpu is 100% / 4 (your number of core),
which might be the explaination for your 30%.

What you see is expected : postgres is going to use the max amount of ram, then use the harddrive when no more ram is available.
This is not the case with option 2, as very few ram is needed.

Don't forget to create the appropriate indexes on your table t_y.

Cheers,
Rémi-C

2017-01-09 13:56 GMT+01:00 Olivier Leprêtre <[hidden email]>:

Hi,

 

Thanks for your answer and this way of calculation. I don't know which machine you have but on mine, Option 1 launched from pgadmin takes 2,564,423 ms  (about 45 mn) a bit less than my original code (2,823,272 ms). So if this takes less than 1 second on yours, there is definitively a problem on my configuration or my machine which is nevertheless a very slow one : Hp Envy (Windows 10, i7-6500 CPU 2,5 Ghz 8go). I'm using postgresql/postgis 9.4 with its standard configuration.

 

During the option 1 query, the processor used again 30/35% CPU during all time. Difference is about memory, it grows up to 1Gb (shared buffer is set to this value) and then begin using disk about 10 Mo/s.  So I'm still wondering why the cpu doesn't go further than 30% and why this take such a difference on your machine. Is it a linux OS ?

 

Olivier

 

 

De : postgis-users [mailto:[hidden email]] De la part de Rémi Cura
Envoyé : lundi 9 janvier 2017 10:52
À : PostGIS Users Discussion
Objet : Re: [postgis-users] CPU tuning

 

Hey,

I

'm afraid you may not use the most efficient approach.

Assuming you want to find for each node of table x the closest node of table y,

it takes less than 1 second on my computer.




DROP TABLE  IF EXISTS  test_1;
CREATE TABLE test_1 AS
SELECT s AS gid, ST_makePoint(random()*1000,random()*1000) AS geom
FROM generate_series(1,12000) AS s ;
CREATE INDEX ON test_1 USING GIST(geom) ;
ALTER TABLE test_1 ADD PRIMARY KEY (gid) ;


DROP TABLE  IF EXISTS  test_2;
CREATE TABLE test_2 AS
SELECT s AS gid, ST_makePoint( random()*1000,random()*1000) AS geom
FROM generate_series(1,50000) AS s ;
CREATE INDEX ON test_2 USING GIST(geom) ;
ALTER TABLE test_2 ADD PRIMARY KEY (gid) ; 

-- option 1 : you have lots of ram : brute force
DROP TABLE IF EXISTS test_results ;
CREATE TABLE test_results AS
SELECT DISTINCT ON (test_1.gid ) test_1.gid AS gid1, test_2.gid AS gid2-- , ST_MakeLine(test_1.geom,test_2.geom) AS sline
FROM test_1, test_2
ORDER BY test_1.gid ,ST_Distance(test_1.geom,test_2.geom) ASC ;


--option2  : you have limited ram : snipping : force index use

DROP FUNCTION IF EXISTS test_sdist(geom1 geometry) ;
CREATE OR REPLACE FUNCTION test_sdist(geom1 geometry, OUT gid2 bigint, OUT geom2 geometry) AS
 $$
    --given a point, find the closes one in test_2
    DECLARE
    BEGIN
        SELECT gid, geom INTO gid2, geom2
        FROM test_2
        ORDER BY ST_Distance(geom2,test_2.geom) ASC
        LIMIT 1  ;
        RETURN ;
    END;
$$ LANGUAGE plpgsql;

DROP TABLE IF EXISTS test_result_2 ;
CREATE TABLE IF NOT EXISTS test_result_2 AS
SELECT test_1.gid AS gid1, f.gid2 AS gid2
FROM test_1, test_sdist(test_1.geom) AS f

--option 3 : you could use a recursive CTE to avoid creating a dedicated dummy function,

--           this would be a pain to write though

Cheers,

Rémi-C

 

 

2017-01-08 16:36 GMT+01:00 Olivier Leprêtre <[hidden email]>:

Hi,

 

I have a pgsql postgis function that last about an hour on an Hp Envy (W10, i7-6500 CPU 2,5 Ghz 8go). This function is calculating minimum distance between each nodes of table x (12000 nodes) and table y (42000 nodes)

 

FOR row IN

                   SELECT code,ST_AsEwkt(ST_StartPoint(geom)) as geom1  FROM t_x

LOOP

                   FOR row1 IN

                            SELECT code, ST_Distance(ST_AsEwkt(geom)::text,row.geom1) FROM t_y order by ST_Distance(ST_AsEwkt(geom)::text,row.geom1) LIMIT 1

 

                   LOOP

                   END LOOP;

....

 

 

What I don't understand is that postgresql server process CPU is stable during all that time at 30/31% with no other process running. Details of each processor 1 to 3 shows that all 4 uses 30%. Disk and network usage are 0% (database is local) Ram usage is 47% and Postgresql server uses process only 7,4 Mo.

 

My question is "Why postgres process uses such a precise and constant cpu and doesn't grow up to more than 30% ?" I didn't find any parameter forcing such a limit in order to reduce duration of this calculation.

 

Thanks for any answer,

 

Olivier

 


Avast logo

L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
www.avast.com

 


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users

 




Avast logo

L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
www.avast.com



_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CPU tuning

Marcone

2017-01-09 12:21 GMT-02:00 Rémi Cura <[hidden email]>:
postgres 9.4 is non-parallelised, what you see is that postgres use 100% of a core, but the core beig used is rotated,
which mean the average usage of your cpu is 100% / 4 (your number of core),
which might be the explaination for your 30%.

Exactly! I was write something like this.

Please, try the query:

FOR row in
SELECT DISTINCT ON (t_x.code, t_y.code)
t_x.code,
t_y.code,
ST_Distance(t_x.geom, t_y.geom)
FROM t_x, t_y
ORDER BY t_x.code,
t_y.code,
ST_Distance(t_x.geom, t_y.geom)
LOOP
-- Do something here
END LOOP;

The query above use less conversions, may be more efficient.



_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CPU tuning

Olivier Leprêtre

Well, thanks, could be the explanation.

 

I tried the query but I get SQLSTATE 53200, out of memory after a few seconds.

 

De : postgis-users [mailto:[hidden email]] De la part de Marcone
Envoyé : lundi 9 janvier 2017 15:31
À : PostGIS Users Discussion
Objet : Re: [postgis-users] CPU tuning

 

 

2017-01-09 12:21 GMT-02:00 Rémi Cura <[hidden email]>:

postgres 9.4 is non-parallelised, what you see is that postgres use 100% of a core, but the core beig used is rotated,

which mean the average usage of your cpu is 100% / 4 (your number of core),

which might be the explaination for your 30%.


Exactly! I was write something like this.

 

Please, try the query:

 

FOR row in

            SELECT DISTINCT ON (t_x.code, t_y.code)

                        t_x.code,

                        t_y.code,

                        ST_Distance(t_x.geom, t_y.geom)

            FROM t_x, t_y

            ORDER BY t_x.code,

                        t_y.code,

                        ST_Distance(t_x.geom, t_y.geom)

LOOP

            -- Do something here

END LOOP;


The query above use less conversions, may be more efficient.

 

 




Avast logo

L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
www.avast.com



_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Loading...