Thursday, March 7, 2019

Exclude files from being exported into the zip/tar source archives on github.com

GitHub.com (and probably GitLab too) provides various ways to export the Git branch contents or tags and releases as Zip- or Tar-archives. When creating a release, these tar-/zipballs are automatically created and added to the release. I often find archives, which contain a lot of files not useful to the end user, like .github directories, Git (.gitignore, .gitattributes) or CI related files (.travis.yml, .appveyor.yml). Sometimes they also contain directories (e.g. for test files), upstream hosts in Git, but does not need for the source distribution. But there is an easy way to keep these files out of the automatically created source archives and keep the latter clean by using the export-ignore attribute in the .gitattributes files:

# don't export the github-pages source
/docs export-ignore
# export some other irrelevant directories
/foo export-ignore
# don't export the files necessary for CI
Gemfile export-ignore
.appveyor.yml export-ignore
.travis.yml export-ignore
# ignore Git related files
.gitattributes export-ignore
.gitignore export-ignore

Sunday, February 24, 2019

Jekyll and GitHub pages: access the download URL (aka browser_download_url) for an asset of your latest release via site.github

Add the download URL of an asset of your latest release

Often there is the question to get the download URL for an asset (e.g. a setup-file) of the latest release of a project. In my case I provide an executable, which includes the version number in its name together with the source as ZIP- and Tarball-archive. Others provide versioned source tarballs or executables, which are different from the Git repository source tarballs.

project-X.Y.Z-setup.exe
project-X.Y.Z-src.tar.gz

Now to get the download URL(s) for the asset(s) using the GitHub API one can get and process this URL (replacing USER and PROJECT/var> with the GitHub user account and projectname accordingly):

https://api.github.com/repos/USER/PROJECT/releases/latest

Note, that the assets download URL is provided by the browser_download_url object in the assets objects list:

{
  ...
  "assets": [
    {
      ...
      "browser_download_url": "...",
      ...
    }
  ]
}

The content provided by the API is also available to Jekyll sites hosted on GitHub pages via the site.github namespace. You can easily check all the content of this namespace using this approach (somewhere in your code):

{{ site.github | inspect }}

You'll find, that you can even access detailed author and project information. Now to get the download URL of my asset, I just access the first list entry using this:

{{ site.github.latest_release.assets[0].browser_download_url }}

or this approach (less typing):

{% assign release = site.github.latest_release %}
{{ release.assets[0].browser_download_url }}

I use this to create structured data in JSON-LD for a software application. I can even access the file size, the creation and publication date of my asset. The following shows the JSON-LD snippet I add to one of my GitHub project pages (I replaced fixed content with dots):

{% assign release = site.github.latest_release %}
{
  "@context": "http://schema.org/",
  "@type": "SoftwareApplication",
  "name": "...",
  "softwareVersion": "{{ release.tag_name | strip | remove: 'v' }}",
  "alternateName": [
    "...",
    "{{ release.name }}"
  ],
  "description": "...",
  "applicationCategory": "...",
  "inLanguage": ["..", ".."],
  "operatingSystem": [
    "...",
    "..."
  ],
  "downloadUrl": "{{ release.assets[0].browser_download_url }}",
  "fileSize": "{{ release.assets[0].size | divided_by: 1024 }}",
  "releaseNotes": "{{ release.html_url }}",
  "license": "...",
  "url": "{{ site.github.repository_url }}",
  "datePublished": "{{ release.published_at }}",
  "dateCreated": "{{ release.created_at }}",
  "author":    {%- include json/person.json -%},
  "publisher": {%- include json/publisher.json -%}
}

If there is more than one asset (the GitHub repository source tarball and zipball are not assets) one probably has to use a more flexibale approach then accessing the first list entry via asset[0] as shown above. If there are several assets and the asset file name is created the same way for every release but includes the version number (see the file name examples from the beginning of this post), there is another approach, that might be to used. One can process:

{{ site.github.latest_release.tag_name }}

and create the download URL like this

{{ site.github.releases_url }}/download/latest/foo-{{ site.github.latest_release.tag_name | strip | remove 'v' }}-setup.exe
{{ site.github.releases_url }}/download/latest/foo-{{ site.github.latest_release.tag_name | strip | remove 'v' }}-src.tar.gz

Because it is common to tag the version as vX.Y.Z the leading v is removed from the version tag in the examples above.

Using the approach above one can even loop over site.github.releaes and create a changelog/news page automatically for all releases! Maybe you can share your ideas about the suggested approaches on my GIST page.