Commit Parsing

The semver level that should be bumped on a release is determined by the commit messages since the last release. In order to be able to decide the correct version and generate the changelog, the content of those commit messages must be parsed. By default this package uses a parser for the Angular commit message style:

<type>(<scope>): <subject>
<BLANK LINE>
<body>
<BLANK LINE>
<footer>

The body or footer can begin with BREAKING CHANGE: followed by a short description to create a major release.

Note

Python Semantic Release is able to parse more than just the body and footer sections (in fact, they are processed in a loop so you can write as many paragraphs as you need). It also supports having multiple breaking changes in one commit.

However, other tools may not do this, so if you plan to use any similar programs then you should try to stick to the official format.

More information about the style can be found in the angular commit guidelines.

Built-in Commit Parsers

The following parsers are built in to Python Semantic Release:

semantic_release.commit_parser.AngularCommitParser

The default parser, which uses the Angular commit style with the following differences:

  • Multiple BREAKING CHANGE: paragraphs are supported

  • revert is not currently supported

The default configuration options for semantic_release.commit_parser.AngularCommitParser are:

[tool.semantic_release.commit_parser_options]
allowed_tags = [
    "build",
    "chore",
    "ci",
    "docs",
    "feat",
    "fix",
    "perf",
    "style",
    "refactor",
    "test",
]
minor_tags = ["feat"]
patch_tags = ["fix", "perf"]

semantic_release.history.EmojiCommitParser

Parser for commits using one or more emojis as tags in the subject line.

If a commit contains multiple emojis, the one with the highest priority (major, minor, patch, none) or the one listed first is used as the changelog section for that commit. Commits containing no emojis go into an “Other” section.

The default settings are for Gitmoji.

The default configuration options for semantic_release.commit_parser.EmojiCommitParser are:

[tool.semantic_release.commit_parser_options]
major_tags = [":boom:"]
minor_tags = [
    ":sparkles:",
    ":children_crossing:",
    ":lipstick:",
    ":iphone:",
    ":egg:",
    ":chart_with_upwards_trend:",
]
patch_tags = [
    ":ambulance:",
    ":lock:",
    ":bug:",
    ":zap:",
    ":goal_net:",
    ":alien:",
    ":wheelchair:",
    ":speech_balloon:",
    ":mag:",
    ":apple:",
    ":penguin:",
    ":checkered_flag:",
    ":robot:",
    ":green_apple:",
]

semantic_release.history.scipy_parser

A parser for scipy-style commits with the following differences:

  • Beginning a paragraph inside the commit with BREAKING CHANGE declares a breaking change. Multiple BREAKING CHANGE paragraphs are supported.

  • A scope (following the tag in parentheses) is supported

The default configuration options for semantic_release.commit_parser.ScipyCommitParser are:

[tool.semantic_release.commit_parser_options]
allowed_tags = [
    "API",
    "DEP",
    "ENH",
    "REV",
    "BUG",
    "MAINT",
    "BENCH",
    "BLD",
    "DEV",
    "DOC",
    "STY",
    "TST",
    "REL",
    "FEAT",
    "TEST",
]
major_tags = ["API"]
minor_tags = ["DEP", "DEV", "ENH", "REV", "FEAT"]
patch_tags = ["BLD", "BUG", "MAINT"]

semantic_release.history.TagCommitParser

The original parser from v1.0.0 of Python Semantic Release. Similar to the emoji parser above, but with less features.

The default configuration options for semantic_release.commit_parser.TagCommitParser are:

[tool.semantic_release.commit_parser_options]
minor_tag = ":sparkles:"
patch_tag = ":nut_and_bolt:"

Writing your own parser

If you would prefer to use an alternative commit style, for example to adjust the different type values that are associated with a particular commit, this is possible.

The commit_parser option, if set to a string which does not match one of Python Semantic Release’s inbuilt commit parsers, will be used to attempt to dynamically import a custom commit parser class. As such you will need to ensure that your custom commit parser is import-able from the environment in which you are running Python Semantic Release. The string should be structured in the standard module:attr format; for example, to import the class MyCommitParser from the file custom_parser.py at the root of your repository, you should specify "commit_parser=custom_parser:MyCommitParser" in your configuration, and run the semantic-release command line interface from the root of your repository. Equally you can ensure that the module containing your parser class is installed in the same virtual environment as semantic-release. If you can run python -c "from $MODULE import $CLASS" successfully, specifying commit_parser="$MODULE:$CLASS" is sufficient. You may need to set the PYTHONPATH environment variable to the directory containing the module with your commit parser.

Python Semantic Release provides several building blocks to help you write your parser. To maintain compatibility with how Python Semantic Release will invoke your parser, you should use the appropriate object as described below, or create your own object as a subclass of the original which maintains the same interface. Type parameters are defined where appropriate to assist with static type-checking.

Tokens

The tokens built into Python Semantic Release’s commit parsing mechanism are inspired by both the error-handling mechanism in Rust’s error handling and its implementation in black. It is documented that catching exceptions in Python is slower than the equivalent guard implemented using if/else checking when exceptions are actually caught, so although try/except blocks are cheap if no exception is raised, commit parsers should always return an object such as semantic_release.ParseError instead of raising an error immediately. This is to avoid catching a potentially large number of parsing errors being caught as the commit history of a repository is being parsed. Python Semantic Release does not raise an exception if a commit cannot be parsed.

Python Semantic Release uses semantic_release.ParsedCommit as the return type of a successful parse operation, and semantic_release.ParseError as the return type from an unsuccessful parse of a commit. semantic_release.ParsedCommit is a namedtuple which has the following fields:

  • bump: a semantic_release.LevelBump indicating what type of change this commit introduces.

  • type: the type of the commit as a string, per the commit message style. This is up to the parser to implement; for example, the semantic_release.commit_parser.EmojiCommitParser parser fills this field with the emoji representing the most significant change for the commit. The field is named after the representation in the Angular commit specification.

  • scope: The scope, as a string, parsed from the commit. Commit styles which do not have a meaningful concept of “scope” should fill this field with an empty string.

  • descriptions: A list of paragraphs (strings) (delimited by a double-newline) from the commit message.

  • breaking_descriptions: A list of paragraphs (strings) which are deemed to identify and describe breaking changes by the parser. An example would be a paragraph which begins with the text BREAKING CHANGE:.

  • commit: The original commit object that was parsed.

semantic_release.ParseError is a namedtuple which has the following fields:

  • commit: The original commit object that was parsed.

  • error: A string with a meaningful error message as to why the commit parsing failed.

In addition, semantic_release.ParseError implements an additional method, raise_error. This method raises a semantic_release.CommitParseError with the message contained in the error field, as a convenience.

ParsedCommit and ParseError objects also make the following attributes available, each implemented as a property which is computed, as a convenience for template authors - therefore custom implementations should ensure these properties can also be computed:

  • message: the message attribute of the commit; where the message is of type bytes this should be decoded to a UTF-8 string.

  • hexsha: the hexsha attribute of the commit, representing its hash.

  • short_hash: the first 7 characters of the hexsha attribute of the commit.

In Python Semantic Release, the class semantic_release.ParseResult is defined as ParseResultType[ParsedCommit, ParseError], as a convenient shorthand.

semantic_release.ParseResultType is a generic type, which is the Union of its two type parameters. One of the types in this union should be the type returned on a successful parse of the commit, while the other should be the type returned on an unsuccessful parse of the commit.

A custom parser result type, therefore, could be implemented as follows:

  • MyParsedCommit subclasses ParsedCommit

  • MyParseError subclasses ParseError

  • MyParseResult = ParseResultType[MyParsedCommit, MyParseError]

Internally, Python Semantic Release uses isinstance to determine if the result of parsing a commit was a success or not, so you should check that your custom result and error types return True from isinstance(<object>, ParsedCommit) and isinstance(<object>, ParseError) respectively.

While it’s not advisable to remove any of the fields that are available in the built-in token types, currently only the bump field of the successful result type is used to determine how the version should be incremented as part of this release. However, it’s perfectly possible to add additional fields to your tokens which can be populated by your parser; these fields will then be available on each commit in your changelog template, so you can make additional information available.

Parser Options

To provide options to the commit parser which is configured in the configuration file, Python Semantic Release includes a semantic_release.ParserOptions class. Each parser built into Python Semantic Release has a corresponding “options” class, which subclasses semantic_release.ParserOptions.

The configuration in commit_parser_options is passed to the “options” class which is specified by the configured commit_parser - more information on how this is specified is below.

The “options” class is used to validate the options which are configured in the repository, and to provide default values for these options where appropriate.

If you are writing your own parser, you should accompany it with an “options” class which accepts the appropriate keyword arguments. This class’ __init__ method should store the values that are needed for parsing appropriately.

Commit Parsers

The commit parsers that are built into Python Semantic Release implement an instance method called parse, which takes a single parameter commit of type git.objects.commit.Commit, and returns the type semantic_release.ParseResultType.

To be compatible with Python Semantic Release, a commit parser must subclass semantic_release.CommitParser. A subclass must implement the following:

  • A class-level attribute parser_options, which must be set to semantic_release.ParserOptions or a subclass of this.

  • An __init__ method which takes a single parameter, options, that should be of the same type as the class’ parser_options attribute.

  • A method, parse, which takes a single parameter commit that is of type git.objects.commit.Commit, and returns semantic_release.token.ParseResult, or a subclass of this.

By default, the constructor for semantic_release.CommitParser will set the options parameter on the options attribute of the parser, so there is no need to override this in order to access self.options during the parse method. However, if you have any parsing logic that needs to be done only once, it may be a good idea to perform this logic during parser instantiation rather than inside the parse method. The parse method will be called once per commit in the repository’s history during parsing, so the effect of slow parsing logic within the parse method will be magnified significantly for projects with sizeable Git histories.

Commit Parsers have two type parameters, “TokenType” and “OptionsType”. The first is the type which is returned by the parse method, and the second is the type of the “options” class for this parser.

Therefore, a custom commit parser could be implemented via:

class MyParserOptions(semantic_release.ParserOptions):
    def __init__(self, message_prefix: str) -> None:
        self.prefix = message_prefix * 2


class MyCommitParser(
    semantic_release.CommitParser[semantic_release.ParseResult, MyParserOptions]
):
    def parse(self, commit: git.objects.commit.Commit) -> semantic_release.ParseResult:
        ...