Create an artifact

Use the W&B Python SDK to construct artifacts from W&B Runs. You can add files, directories, URIs, and files from parallel runs to artifacts. After you add a file to an artifact, save the artifact to the W&B Server or your own private server. Each artifact is associated with a run. For information on how to track external files, such as files stored in Amazon S3, see the Track external files page.

Construct an artifact

Construct a W&B Artifact in three steps:

1. Create an artifact Python object with `wandb.Artifact()`

Initialize the wandb.Artifact() class to create an artifact object. Specify the following parameters:

Name: The name of your artifact. The name should be unique, descriptive, and easy to remember.
Type: The type of artifact. The type should be simple, descriptive, and correspond to a single step of your machine learning pipeline. Common artifact types include 'dataset' or 'model'.

W&B uses the “name” and “type” you provide to create a directed acyclic graph in the W&B App. See the Explore and traverse artifact graphs for more information.

Artifacts can not have the same name, regardless of type. In other words, you can not create an artifact named cats of type dataset and another artifact with the same name of type model.

You can optionally provide a description and metadata when you initialize an artifact object. For more information on available attributes and parameters, see the wandb.Artifact Class definition in the Python SDK Reference Guide. Copy and paste the following code snippet to create an artifact object. Replace the <name> and <type> placeholders with your own values:

import wandb

# Create an artifact object
artifact = wandb.Artifact(name="<name>", type="<type>")

2. Add one more files to the artifact

Add files, directories, external URI references (such as Amazon S3) and more to your artifact object. To add a single file, use the artifact object’s Artifact.add_file() method:

artifact.add_file(local_path="path/to/file.txt", name="<name>")

To add a directory, use the Artifact.add_dir() method:

artifact.add_dir(local_path="path/to/directory", name="<name>")

See the next section, Add files to an artifact, for more information on how to add different file types to an artifact.

3. Save your artifact to the W&B server

Save your artifact to the W&B server. Use the run object’s wandb.Run.log_artifact() method to save the artifact.

with wandb.init(project="<project>", job_type="<job-type>") as run:
    run.log_artifact(artifact)

When to use to use wandb.Run.log_artifact() or Artifact.save()

Use wandb.Run.log_artifact() to create a new artifact and associate it with a specific run.
Use Artifact.save() to update an existing artifact without creating a new run.

Putting this all together, the following code snippet demonstrates how to create a dataset artifact, add a file to the artifact, and save the artifact to W&B:

import wandb

artifact = wandb.Artifact(name="<name>", type="<type>")
artifact.add_file(local_path="path/to/file.txt", name="<name>")
artifact.add_dir(local_path="path/to/directory", name="<name>")

with wandb.init(project="<project>", job_type="<job-type>") as run:
    run.log_artifact(artifact)

Each time you log an artifact with the same name and type, W&B creates a new version of that artifact. For more information, see Create a new artifact version.

W&B performs calls wandb.Run.log_artifact() asynchronously for performant uploads. This can cause surprising behavior when logging artifacts in a loop. For example:

with wandb.init() as run:
    for i in range(10):
        a = wandb.Artifact(name = "race",
            type="dataset",
            metadata={
                "index": i,
            },
        )
        # ... add files to artifact a ...
        run.log_artifact(a)

The artifact version v0 is NOT guaranteed to have an index of 0 in its metadata because artifacts may be logged in an arbitrary order.

Add files to an artifact

The following sections demonstrate how to add different types of objects to an artifact. Assume you have a directory with the following structure as you read through the examples:

root-directory
| - hello.txt
| - images/
| -- | cat.png
| -- | dog.png
| - checkpoints/
| -- | model.h5
| - models/
| -- | model.h5

Add a single file

Use wandb.Artifact.add_file() to add a single local file to an artifact. Provide the local path to the file as the local_path parameter:

import wandb

# Initialize an artifact object
artifact = wandb.Artifact(name="<name>", type="<type>")

# Add a single file
artifact.add_file(local_path="path/file.format")

For example, suppose you had a file called 'hello.txt' in your working local directory.

artifact.add_file("hello.txt")

The artifact now has the following content:

hello.txt

Optionally, pass a different name to the name parameter to rename the file within the artifact object itself. Continuing the previous example:

artifact.add_file(
    local_path="hello.txt", 
    name="new/path/hello_world.txt"
    )

The artifact is stored as:

new/path/hello_world.txt

The following table shows how different API calls produce different artifact contents:

API Call	Resulting artifact
`artifact.new_file('hello.txt')`	`hello.txt`
`artifact.add_file('model.h5')`	`model.h5`
`artifact.add_file('checkpoints/model.h5')`	`model.h5`
`artifact.add_file('model.h5', name='models/mymodel.h5')`	`models/mymodel.h5`

Add multiple files

Use the wandb.Artifact.add_dir() method to add multiple files from a local directory to an artifact. Provide the local path to the directory as the local_path parameter.

import wandb

# Initialize an artifact object
artifact = wandb.Artifact(name="<name>", type="<type>")

# Add a local directory to the artifact
artifact.add_dir(local_path="path/file.format", name="optional-prefix")

The following table show how different API calls produce different artifact contents:

API Call	Resulting artifact
`artifact.add_dir('images')`	`cat.png` `dog.png`
`artifact.add_dir('images', name='images')`	`images/cat.png` `images/dog.png`

Add a URI reference

Artifacts track checksums and other information for reproducibility if the URI has a scheme that W&B library knows how to handle. Add an external URI reference to an artifact with the wandb.Artifact.add_reference() method. Replace the 'uri' string with your own URI. Optionally pass the desired path within the artifact for the name parameter.

# Add a URI reference
artifact.add_reference(uri="uri", name="optional-name")

Artifacts currently support the following URI schemes:

http(s)://: A path to a file accessible over HTTP. The artifact will track checksums in the form of etags and size metadata if the HTTP server supports the ETag and Content-Length response headers.
s3://: A path to an object or object prefix in S3. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects.
gs://: A path to an object or object prefix in GCS. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects.

The following table shows how different API calls produce different artifact contents:

API call	Resulting artifact contents
`artifact.add_reference('s3://my-bucket/model.h5')`	`model.h5`
`artifact.add_reference('s3://my-bucket/checkpoints/model.h5')`	`model.h5`
`artifact.add_reference('s3://my-bucket/model.h5', name='models/mymodel.h5')`	`models/mymodel.h5`
`artifact.add_reference('s3://my-bucket/images')`	`cat.png` `dog.png`
`artifact.add_reference('s3://my-bucket/images', name='images')`	`images/cat.png` `images/dog.png`

Add files to artifacts from parallel runs

For large datasets or distributed training, multiple parallel runs might need to contribute to a single artifact.

import wandb
import time

# This example uses Ray to runs in parallel
# for demonstration purposes.
import ray

ray.init()

artifact_type = "dataset"
artifact_name = "parallel-artifact"
table_name = "distributed_table"
parts_path = "parts"
num_parallel = 5

# Each batch of parallel writers should have its own
# unique group name.
group_name = "writer-group-{}".format(round(time.time()))


@ray.remote
def train(i):
    """
    Our writer job. Each writer will add one image to the artifact.
    """
    with wandb.init(group=group_name) as run:
        artifact = wandb.Artifact(name=artifact_name, type=artifact_type)

        # Add data to a wandb table.
        table = wandb.Table(columns=["a", "b", "c"], data=[[i, i * 2, 2**i]])

        # Add the table to folder in the artifact
        artifact.add(table, "{}/table_{}".format(parts_path, i))

        # Upserting the artifact creates or appends data to the artifact
        run.upsert_artifact(artifact)


# Launch your runs in parallel
result_ids = [train.remote(i) for i in range(num_parallel)]

# Join on all the writers to make sure their files have
# been added before finishing the artifact.
ray.get(result_ids)

# Once all the writers are finished, finish the artifact
# to mark it ready.
with wandb.init(group=group_name) as run:
    artifact = wandb.Artifact(artifact_name, type=artifact_type)

    # Create a "PartitionTable" pointing to the folder of tables
    # and add it to the artifact.
    artifact.add(wandb.data_types.PartitionedTable(parts_path), table_name)

    # Finish artifact finalizes the artifact, disallowing future "upserts"
    # to this version.
    run.finish_artifact(artifact)

Find path for logged artifacts and other metadata

The following code snippet shows how to use the W&B Public API to list the files in a run, including their names and URLs. Replace the <entity/project/run-id> placeholder with your own values:

from wandb.apis.public.files import Files
from wandb.apis.public.api import Api

# Example run object
run = Api().run("<entity/project/run-id>")

# Create a Files object to iterate over files in the run
files = Files(api.client, run)

# Iterate over files
for file in files:
    print(f"File Name: {file.name}")
    print(f"File URL: {file.url}")
    print(f"Path to file in the bucket: {file.direct_url}")

See the File Class for more information on available attributes and methods.

Guides

Integrations

Tutorials

Reference

Construct an artifact

1. Create an artifact Python object with `wandb.Artifact()`

2. Add one more files to the artifact

3. Save your artifact to the W&B server

Add files to an artifact

Add a single file

Add multiple files

Add a URI reference

Add files to artifacts from parallel runs

Find path for logged artifacts and other metadata

Guides

Integrations

Tutorials

Reference

​Construct an artifact

​1. Create an artifact Python object with wandb.Artifact()

​2. Add one more files to the artifact

​3. Save your artifact to the W&B server

​Add files to an artifact

​Add a single file

​Add multiple files

​Add a URI reference

​Add files to artifacts from parallel runs

​Find path for logged artifacts and other metadata

Construct an artifact

1. Create an artifact Python object with `wandb.Artifact()`

2. Add one more files to the artifact

3. Save your artifact to the W&B server

Add files to an artifact

Add a single file

Add multiple files

Add a URI reference

Add files to artifacts from parallel runs

Find path for logged artifacts and other metadata