Mirroring NuGet.org

Published on Sunday, January 22, 2017

Downloading the full set of nupkgs from a feed to a local folder is useful for storing an offline copy, caching, or when using writing scripts to analyze files and metadata in packages.

The v3.0.0 feed format used by NuGet.org makes discovering packages and incremental changes to a feed easier with the feed catalog, an append only record of all changes. Downloading these packages can then be done by simple web requests, or by using the existing NuGet libraries which contain support for feed authentication.

NuGetMirror.exe

NuGetMirror.exe is a simple tool that puts all this together and allows you to mirror NuGet v3 feeds to disk, including NuGet.org itself.

This app contains two commands, list and nupkgs. The list command will display all packages in the feed. The nupkgs command will download them to a folder on disk.

Using NuGetMirror.exe to download all of NuGet.org

Here is what to expect when mirroring the full set of packages on NuGet.org.

Requirements

At the time of writing this NuGet.org contains around 800K total packages. The size on disk is roughly 850 GB. This will continue to grow, so make sure you enough space on disk before attempting to download all packages.

Initial sync

The command to download all packages is straight forward, give NuGetMirror the v3 feed URL and the output folder path.

Since there are some old and invalid packages on NuGet.org I recommend using the --ignore-errors flag, at least for the first run. This will keep you from returning a day later only to find that this command stopped half way through.

NuGetMirror.exe nupkgs https://api.nuget.org/v3/index.json -o d:\output --ignore-errors
Incremental updates

NuGetMirror keeps track of the current state by writing cursor.json to the output folder. This file contains a date and time that matches the last commit time from the catalog that was processed.

During incremental updates the cursor time is used as the start time. All new commits are read from between that start time, and a time 10 minutes before the current time. The extra delay is needed to ensure that the feed has made all new packages fully available for download.

New packages after the start time are downloaded to disk. Modified packages are also downloaded and the originals overwritten.

Each run will write the list of update nupkg paths to the updatedFiles.txt file in the root of the output folder.

Verifying an existing folder

To re-run NuGetMirror against all packages delete the cursor.json file in the root of the output folder. This causes the tool to go back to the first package and verify that all packages are valid and up to date. If the package on disk already matches the commit time down in the feed, and the nupkg can be read successfully then it will be skipped without downloading the package again.

Getting NuGetMirror