Duplicate tags appearing in postgis git repo

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Duplicate tags appearing in postgis git repo

John Harvey
Hello,

Today I noticed that a bunch of tags appeared in the postgis git repository that I think were added accidentally.

To see what I mean, try this command:

Every tag seems to be duplicated, but with single-quotes around them.
Example:
22140385fe341da6c7df020abc245efc00dba3c2 refs/tags/'2.3.2'
22140385fe341da6c7df020abc245efc00dba3c2 refs/tags/2.3.2

Is this something that can/should be cleaned up?

Thank you!
  -John

_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate tags appearing in postgis git repo

Sandro Santilli-3
On Wed, May 17, 2017 at 04:29:37PM -0400, John Harvey wrote:

> Every tag seems to be duplicated, but with single-quotes around them.
> Example:
> 22140385fe341da6c7df020abc245efc00dba3c2 refs/tags/'2.3.2'
> 22140385fe341da6c7df020abc245efc00dba3c2 refs/tags/2.3.2

Thanks for the report. Problem fixed with this change:

    https://git.osgeo.org/gogs/postgis/repository-sync-scripts/commit/ae8f95154311e9a5e6df8a7517f33615a032117d

And mess cleaned up with this:

  git remote | while read remote; do
    git ls-remote --tags $remote | grep "'" |
        awk '{print $2}' | sed 's/^/:/' | tr -s '\n' '\0' |
        xargs -0 git push $remote;
  done

Let me know if you find any other issue with the
[CodeMirrors](http://trac.osgeo.org/postgis/wiki/CodeMirrors)

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate tags appearing in postgis git repo

John Harvey
Hi strk,

Thanks for the fix!  It is definitely cleaner now (I can confirm that the quote marks are gone).

Unfortunately, I've found a secondary issue.
Some tags have now moved with the new method of tagging.

Below is the complete list of tags that have moved:

refs/tags/2.0.3:
from 02d28254bf3437a13be5cbab9012302684794572
to   a3b3de2840ed773a9cd5fc2d7575a0b0cedacb08

refs/tags/2.1.0beta1:
from dd1d97cd2c2bd53b1ccacf122f35121196b9889d
to   d80c727c9748d04bc7027dede60137c5656ed909

refs/tags/2.1.3:
from f3ec236dca1151b756a1b9bdbe8256f07595c01c
to   9e63c55cc0968b991c8893f086c4bb7e81133ffb

refs/tags/2.1.6:
from 4771ab13d58bf41cffbe4f25b951278dc2de90fd
to   5bf45bc83b5b5e01afc8022d229d8f6987ef1226

refs/tags/2.1.7:
from 719d4afebe6d735fa080c98ecc83cafccddea3d8
to   478b1f9267fbec154b681bb740774c44bd20662e

refs/tags/2.2.0:
from 142fe43e2c73af4cdf0fcfe3d3f4cf335fc91047
to   04baae8928e98e2efab49b2b5d9662c0b2b3739e

refs/tags/2.2.0rc1:
from b9fbdaeb2b88767bf4002a8cfe00523e3d47ccef
to   443882c0517384dac0053584742eb2fda5089b96

If I were to make a guess, it appears that the new tagging method applies tags on the latest commit of certain branches (at least that's how it appears while looking at the git-tree).  However, such an algorithm would not take into account the case where tags were applied but yet further commits occurred along those branches after-the-fact.  I haven't seen the code, though-- this is just a guess from looking at the tree itself.

I'm guessing that this might be an issue.

Regards,
  -John


On Thu, May 18, 2017 at 4:00 PM, Sandro Santilli <[hidden email]> wrote:
On Wed, May 17, 2017 at 04:29:37PM -0400, John Harvey wrote:

> Every tag seems to be duplicated, but with single-quotes around them.
> Example:
> 22140385fe341da6c7df020abc245efc00dba3c2 refs/tags/'2.3.2'
> 22140385fe341da6c7df020abc245efc00dba3c2 refs/tags/2.3.2

Thanks for the report. Problem fixed with this change:

    https://git.osgeo.org/gogs/postgis/repository-sync-scripts/commit/ae8f95154311e9a5e6df8a7517f33615a032117d

And mess cleaned up with this:

  git remote | while read remote; do
    git ls-remote --tags $remote | grep "'" |
        awk '{print $2}' | sed 's/^/:/' | tr -s '\n' '\0' |
        xargs -0 git push $remote;
  done

Let me know if you find any other issue with the
[CodeMirrors](http://trac.osgeo.org/postgis/wiki/CodeMirrors)

--strk;


_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate tags appearing in postgis git repo

Sandro Santilli-3
On Fri, May 19, 2017 at 10:17:38AM -0400, John Harvey wrote:
> Hi strk,
>
> Thanks for the fix!  It is definitely cleaner now (I can confirm that the
> quote marks are gone).
>
> Unfortunately, I've found a secondary issue.
> Some tags have now moved with the new method of tagging.

[...]

> If I were to make a guess, it appears that the new tagging method applies
> tags on the latest commit of certain branches (at least that's how it
> appears while looking at the git-tree).  However, such an algorithm would
> not take into account the case where tags were applied but yet further
> commits occurred along those branches after-the-fact.  I haven't seen the
> code, though-- this is just a guess from looking at the tree itself.
>
> I'm guessing that this might be an issue.

The tagging method can be seen on the now public repository:
https://git.osgeo.org/gogs/postgis/repository-sync-scripts/src/master/git-svn-sync.sh#L26

I'm afraid SVN doesn't have a way to publish immutable tags, so
if a "tag branch" was committed into after-the-fact, we'd have no
way to know when the tag was made. Am I right in that reguard ?

The init script reconstructs the mirror from a git repository, but the
original git repository is now lost due to force-pushing new tag
values.  Some local ones may still exist, as yours and mine somewhere.
Do we really want to reconstruct those tags from our local values of
tags ? Note that only recently I added a -f to the `git tag` call in
that syncer, so it could very well be that the previous tag just
pointed to whatever state the SVN "tag/branch" had at time of
mirroring...

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate tags appearing in postgis git repo

John Harvey
Hi strk,

I'm afraid SVN doesn't have a way to publish immutable tags, so
if a "tag branch" was committed into after-the-fact, we'd have no
way to know when the tag was made. Am I right in that reguard ?

If I remember right (it's been a few years), on a past project in SVN, we handled it this way:
There were separate "tags" and "branches" areas, and then when it was time to tag, we copied the branch via "svn cp" into the "tags" area and then I think we locked that tag.  This way, the branch could go forward and the tag could not be touched.

The current postgis project appears to use a model branches and tags, but it appears that the tags are not locked.
Therefore, it is conceivable that tags can move forward in time.
 
The init script reconstructs the mirror from a git repository, but the
original git repository is now lost due to force-pushing new tag
values.  Some local ones may still exist, as yours and mine somewhere.
Do we really want to reconstruct those tags from our local values of
tags ? Note that only recently I added a -f to the `git tag` call in
that syncer, so it could very well be that the previous tag just
pointed to whatever state the SVN "tag/branch" had at time of
mirroring...

From what I see, the "old" tags in git seem to correspond to the time where the release tarballs were made both based on date and also by inspection of commit messages (below are the commit messages):
  • 2.2.0: - Tagged release 2.2.0
  • 2.2.0_rc1 - Set version numbers
  • 2.1.7 - Tagged release 2.1.7
  • 2.1.6 - Tagged release 2.1.6
  • 2.1.3 - Set final version to 2.1.3
  • 2.1.0beta1 - repoint to OSGEO download section
  • 2.0.3 - Release 2.0.3 tagged
I don't think the old tag placement was from an arbitrary point in time-- for these 7 items, it appears that they line up with when the release of the time that the source tarballs were created and released.

The thing that is bothering me about this is that with the tags moved, the project has sacrificed build reproducibility.  I can see that contributors can add (and have added) content to a subversion tag long after a release occurred.  When this happened, it made it so that a build made from source on that tag (in both subversion and in git) no longer matches the release tarball's content.  That is what worries me.

Let's look at a singular example: 2.2.0.
  1. The tagged release happened  on 10/7/2015.
  2. The release was announced: http://postgis.net/2015/10/07/postgis-2.2.0/
  3. The corresponding source tarball was posted: http://download.osgeo.org/postgis/source/postgis-2.2.0.tar.gz
  4. 4 months after release, 7 code commits were added to this tag.  At this point, the subversion tag does not match the released tarball.
  5. A few days ago, the git tag was moved on top of these 7 commits.  At this point, the git tag does not the released tarball.
The correct placement of this tag needs to be he 10/7/2015 commit, where the source tarball was generated.  I'm failing to think of any room for any other answer here-- reproducibility to match that release tarball should be a must-have with any version control system.

From what I can see, there are two ways to fix things:
  1. Delete the erroneous commits on top of the tags.  This will let the automated tag-pusher work correctly.  This also has the benefit of fixing subversion from being mismatched from previously released source tarballs.  Although, people may not want to do this.
  2. Move the specifically noted tags from my previous message back to their original positions.  This fixes git (and may be easy to do), but does not fix subversion.
I'm not really too invested in which path is chosen to fix it, but I do think it's a large CM problem if the release tarballs don't line up with the built source tags.  Others can share their perspective as well; I'm a build/release engineer by trade, so I must admit that I may have some bias in this area.

Thanks!
  -John

_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate tags appearing in postgis git repo

Sandro Santilli-3
On Fri, May 19, 2017 at 01:14:04PM -0400, John Harvey wrote:

> Let's look at a singular example: 2.2.0.
>
>    1. The tagged release happened  on 10/7/2015.
>    2. The release was announced:
>    http://postgis.net/2015/10/07/postgis-2.2.0/
>    3. The corresponding source tarball was posted:
>    http://download.osgeo.org/postgis/source/postgis-2.2.0.tar.gz
>    4. 4 months after release, 7 code commits were added to this tag.  At
>    this point, the subversion tag does not match the released tarball.

Ok, this is weird.

I went looking first in the git repository created
with git version 2.11.0 running the mirroring script's init and in
that repository the 2.2.0 tag does *not* have those commits.

Instead on the git repository used for the mirror sync there are those
extra 7 commits. There I have git version 2.1.4.

Now I'm curious to know if those 7 commits are also really in SVN
and what others do get by running the init scripts.

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate tags appearing in postgis git repo

Sandro Santilli-3
On Fri, May 19, 2017 at 09:50:57PM +0200, Sandro Santilli wrote:
>
> Now I'm curious to know if those 7 commits are also really in SVN
> and what others do get by running the init scripts.

So, those 7 commits are really in SVN:
https://trac.osgeo.org/postgis/log/tags/2.2.0

But with the last commit, the codebase is exactly the same as it was
at the beginning, as the last commits were reverts after realizing the
mistake.

   git diff --stat svn/tags/2.2.0 svn/tags/2.2.0~7 # empty result.

In the other cases there are indeed some persisting changes but chances
are they really got into the packages. I hadn't checked each in turn to
tell for sure. The full report of changes between shifted tags is here:

   http://strk.kbt.io/tmp/postgis-tag-shifts.txt

John, any chance you have easy tooling to make these checks ?
You can see the changes are really small and mostly on documentation
or build script tweaks.

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate tags appearing in postgis git repo

John Harvey
Hi strk,

John, any chance you have easy tooling to make these checks ?
You can see the changes are really small and mostly on documentation
or build script tweaks.

Sounds good to me.

Below are the results of my analysis:
  1. 4 out of 7 of the tags are good to go with their new tag placement.  As you had suspected, the later commits did in fact make it into final the source tarballs.  These include: 2.1.3, 2.1.6, 2.1.7, 2.2.0rc1.
  2. 2.2.0 is also safe to keep in its new location based on your analysis.  There are 3 commits and 3 reversions, making the effective latest commit the same functionally as the old tag.  I'm good with keeping the tag at it's new location.
  3. 2.0.3 is a case where the new tag placement is broken.  The source tarball was created before the latest commit.  So, for consistency, we need a solution here, which may just be as simple as reverting the latest commit in svn and then tagging that location, or moving the tag back to its old location.
  4. I'm unsure if there ever was a release package for 2.1.0beta1.  I could not find it, nor could I find a release notification in the blog history.  If there was one, I'd be happy to inspect it; however, if there wasn't, I'm okay with the new location since it doesn't really matter too much.
Regards,
  -John


_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate tags appearing in postgis git repo

Sandro Santilli-3
On Mon, May 22, 2017 at 08:19:08AM -0400, John Harvey wrote:

>    3. 2.0.3 is a case where the new tag placement is broken.  The source
>    tarball was created before the latest commit.  So, for consistency, we need
>    a solution here, which may just be as simple as reverting the latest commit
>    in svn and then tagging that location, or moving the tag back to its old
>    location.

The commit is this:

    commit a3b3de2840ed773a9cd5fc2d7575a0b0cedacb08
    Author: Regina Obe <[hidden email]>
    Date:   Sun Mar 3 17:00:50 2013 +0000

        correct download links

        git-svn-id: http://svn.osgeo.org/postgis/tags/2.0.3@11132 b70326c6-7e19-0410-871a-916f4a2858ee

I'm fine with reverting it in the 2.0.3 tag (keeping it in 2.0
branch). Regina do you agree ?

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate tags appearing in postgis git repo

Regina Obe-2
Yes agree

-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
Sent: Monday, May 22, 2017 9:22 AM
To: PostGIS Development Discussion <[hidden email]>
Cc: Paragon Corporation <[hidden email]>
Subject: Re: [postgis-devel] Duplicate tags appearing in postgis git repo

On Mon, May 22, 2017 at 08:19:08AM -0400, John Harvey wrote:

>    3. 2.0.3 is a case where the new tag placement is broken.  The source
>    tarball was created before the latest commit.  So, for consistency, we
need
>    a solution here, which may just be as simple as reverting the latest
commit
>    in svn and then tagging that location, or moving the tag back to its
old
>    location.

The commit is this:

    commit a3b3de2840ed773a9cd5fc2d7575a0b0cedacb08
    Author: Regina Obe <[hidden email]>
    Date:   Sun Mar 3 17:00:50 2013 +0000

        correct download links

        git-svn-id: http://svn.osgeo.org/postgis/tags/2.0.3@11132
b70326c6-7e19-0410-871a-916f4a2858ee

I'm fine with reverting it in the 2.0.3 tag (keeping it in 2.0 branch).
Regina do you agree ?

--strk;

_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate tags appearing in postgis git repo

Sandro Santilli-3
On Mon, May 22, 2017 at 10:14:31AM -0400, Regina Obe wrote:
> Yes agree

Can you also do it yourself, from SVN ?
I don't want to mess further up via git-svn :P

--strk;

>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> Sent: Monday, May 22, 2017 9:22 AM
> To: PostGIS Development Discussion <[hidden email]>
> Cc: Paragon Corporation <[hidden email]>
> Subject: Re: [postgis-devel] Duplicate tags appearing in postgis git repo
>
> On Mon, May 22, 2017 at 08:19:08AM -0400, John Harvey wrote:
>
> >    3. 2.0.3 is a case where the new tag placement is broken.  The source
> >    tarball was created before the latest commit.  So, for consistency, we
> need
> >    a solution here, which may just be as simple as reverting the latest
> commit
> >    in svn and then tagging that location, or moving the tag back to its
> old
> >    location.
>
> The commit is this:
>
>     commit a3b3de2840ed773a9cd5fc2d7575a0b0cedacb08
>     Author: Regina Obe <[hidden email]>
>     Date:   Sun Mar 3 17:00:50 2013 +0000
>
>         correct download links
>
>         git-svn-id: http://svn.osgeo.org/postgis/tags/2.0.3@11132
> b70326c6-7e19-0410-871a-916f4a2858ee
>
> I'm fine with reverting it in the 2.0.3 tag (keeping it in 2.0 branch).
> Regina do you agree ?
>
> --strk;
>
_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate tags appearing in postgis git repo

John Harvey
Hi Regina / strk,

Sorry to bump the thread-- just hoping to get this final item closed out.
If it helps, I can try to make the revert in SVN.

Regards,
  -John

On Mon, May 22, 2017 at 10:35 AM, Sandro Santilli <[hidden email]> wrote:
On Mon, May 22, 2017 at 10:14:31AM -0400, Regina Obe wrote:
> Yes agree

Can you also do it yourself, from SVN ?
I don't want to mess further up via git-svn :P

--strk;

>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> Sent: Monday, May 22, 2017 9:22 AM
> To: PostGIS Development Discussion <[hidden email]>
> Cc: Paragon Corporation <[hidden email]>
> Subject: Re: [postgis-devel] Duplicate tags appearing in postgis git repo
>
> On Mon, May 22, 2017 at 08:19:08AM -0400, John Harvey wrote:
>
> >    3. 2.0.3 is a case where the new tag placement is broken.  The source
> >    tarball was created before the latest commit.  So, for consistency, we
> need
> >    a solution here, which may just be as simple as reverting the latest
> commit
> >    in svn and then tagging that location, or moving the tag back to its
> old
> >    location.
>
> The commit is this:
>
>     commit a3b3de2840ed773a9cd5fc2d7575a0b0cedacb08
>     Author: Regina Obe <[hidden email]>
>     Date:   Sun Mar 3 17:00:50 2013 +0000
>
>         correct download links
>
>         git-svn-id: http://svn.osgeo.org/postgis/tags/2.0.3@11132
> b70326c6-7e19-0410-871a-916f4a2858ee
>
> I'm fine with reverting it in the 2.0.3 tag (keeping it in 2.0 branch).
> Regina do you agree ?
>
> --strk;
>
_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel


_______________________________________________
postgis-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-devel