Python Semantic Release’s Version Bumping Algorithm¶
Below is a technical description of the algorithm which Python Semantic Release uses to calculate a new version for a project.
Assumptions¶
At runtime, we are in a Git repository with HEAD referring to a commit on some branch of the repository (i.e. not in detached HEAD state).
We know in advance whether we want to produce a prerelease or not (based on the configuration and command-line flags).
We can parse the tags of the repository into semantic versions, as we are given the format that those Git tags should follow via configuration, but cannot cherry-pick only tags that apply to commits on specific branches. We must parse all tags in order to ensure we have parsed any that might apply to commits in this branch’s history.
If we can identify a commit as a
merge-base
between our HEAD commit and one or more tags, then that merge-base should be unique.We know ahead of time what
prerelease_token
to use for prereleases - e.g.rc
.We know ahead of time whether
major
changes introduced by commits should cause the new version to remain on0.y.z
if the project is already on a0.
version - see major_on_zero.
Implementation¶
Parse all the Git tags of the repository into semantic versions, and sort in descending (most recent first) order according to semver precedence. Ignore any tags which do not correspond to valid semantic vesrions according to
tag_format
.Find the
merge-base
of HEAD and the latest tag according to the sort above. Call this commitM
. If there are no tags in the repo’s history, we setM=HEAD
.Find the latest non-prerelease version whose tag references a commit that is an ancestor of
M
. We do this via a breadth-first search through the commit lineage, starting againstM
, and for each tag checking if the tag corresponds to that commit. We break from the search when we find such a tag. If no such tag is found, see 4a). Else, suppose that tag corresponds to a commitL
- goto 4b).If no commit corresponding to the last non-prerelease version is found, the entire history of the repository is considered. We parse every commit that is an ancestor of HEAD to determine the type of change introduced - either
major
,minor
,patch
,prerelease_revision
orno_release
. We store this levels in aset
as we only require the distinct types of change that were introduced.However, if we found a commit
L
which is the commit against which the last non-prerelease was tagged, then we parse only the commits from HEAD as far back asL
, to understand what changes have been introduced since the previous non-prerelease. We store these levels - eithermajor
,minor
,patch
,prerelease_revision
, orno_release
, in a set, as we only require the distinct types of change that were introduced.We look for tags that correspond to each commit during this process, to identify the latest pre-release that was made within HEAD’s ancestry.
If there have been no changes since the last non-prerelease, or all commits since that release result in a
no_release
type according to the commit parser, then we terminate the algorithm.If we have not exited by this point, we know the following information:
The latest version, by semver precedence, within the whole repository. Call this
LV
. This might not be within the ancestry of HEAD.The latest version, prerelease or non-prerelease, within the whole repository. Call this
LVH
. This might not be within the ancestry of HEAD. This may be the same asLV
.The latest non-prerelease version within the ancestry of HEAD. Call this
LVHF
. This may be the same asLVH
.The most significant type of change introduced by the commits since the previous full release. Call this
level
Whether or not we wish to produce a prerelease from this version increment. Call this a boolean flag,
prerelease
. (Assumption)Whether or not to increment the major digit if a major change is introduced against an existing
0.
version. Call thismajor_on_zero
, a boolean flag. (Assumption)Using this information, the new version is decided according to the following criteria:
If
LV
has a major digit of0
,major_on_zero
isFalse
andlevel
ismajor
, reducelevel
tominor
.If
prerelease=True
, then
Diff
LV
withLVHF
, to understand if themajor
,minor
orpatch
digits have changed. For example, diffing1.2.1
and1.2.0
is apatch
diff, while diffing2.1.1
and1.17.2
is amajor
diff. Call thisDIFF
If
DIFF
is less semantically significant thanlevel
, for example ifDIFF=patch
andlevel=minor
, then
Increment the digit of
LVF
corresponding tolevel
, for example the minor digit iflevel=minor
, setting all less significant digits to zero.Add
prerelease_token
as a suffix result of 1., together with a prerelease revision number of1
. Return this new version and terminate the algorithm.Thus if
DIFF=patch
,level=minor
,prerelease=True
,prerelease_token="rc"
, andLVF=1.1.1
, then the version returned by the algorithm is1.2.0-rc.1
.If
DIFF
is semantically less significant than or equally significant tolevel
, then this means that the significance of change introduced bylevel
is already reflected in a prerelease version that has been created since the last full release. For example, ifLVHF=1.1.1
,LV=1.2.0-rc.1
andlevel=minor
.In this case we:
If the prerelease token of
LV
is different fromprerelease_token
, take the major, minor and patch digits ofLV
and construct a prerelease version using our givenprerelease_token
and a prerelease revision of1
. We then return this version and terminate the algorithm.For example, if
LV=1.2.0-rc.1
andprerelease_token=alpha
, we return1.2.0-alpha.1
.If the prerelease token of
LV
is the same asprerelease_token
, we increment the revision number ofLV
, return this version, andterminate the algorithm. For example, if
LV=1.2.0-rc.1
andprerelease_token=rc
, we return1.2.0-rc.2
.If
prerelease=False
, then
If
LV
is not a prerelease, then we increment the digit ofLV
corresponding tolevel
, for example the minor digit iflevel=minor
, setting all less significant digits to zero. We return the result of this and terminate the algorithm.If
LV
is a prerelease, then:
Diff
LV
withLVHF
, to understand if themajor
,minor
orpatch
digits have changed. Call thisDIFF
If
DIFF
is less semantically significant thanlevel
, then
Increment the digit of
LV
corresponding tolevel
, for example the minor digit iflevel=minor
, setting all less significant digits to zero.Remove the prerelease token and revision number from the result of i., (“Finalize” the result of i.) return the result and terminate the algorithm.
For example, if
LV=1.2.2-alpha.1
andlevel=minor
, we return1.3.0
.
If
DIFF
is semantically less significant than or equally significant tolevel
, then we finalizeLV
, return the result and terminate the algorithm.
Complexity¶
Space:
A list of parsed tags takes O(number of tags)
in space. Parsing each commit during
the breadth-first search between merge-base
and the latest tag in the ancestry
of HEAD takes at worst O(number of commits)
in space to track visited commits.
Therefore worst-case space complexity will be linear in the number of commits in the
repo, unless the number of tags significantly exceeds the number of commits
(in which case it will be linear in the number of tags).
Time:
Assuming using regular expression parsing of each tag is a constant-time operation, then the following steps contribute to the time complexity of the algorithm:
Parsing each tag -
O(number of tags)
Sorting tags by semver precedence -
O(number of tags * log(number of tags))
Finding the merge-base of HEAD and the latest release tag -
O(number of commits)
(worst case)Parsing each commit and checking each tag against each commit -
O(number of commits) + O(number of tags * number of commits)
(worst case)
Overall, assuming that the number of tags is less than or equal to the number of commits in the repository, this would lead to a worst-case time complexity that’s quadratic in the number of commits in the repo.