The Nautilus ceph storage cluster can be accessed via S3 protocol. It uses our own storage, which is free for our users and is not related to Amazon or any commercial cloud.
Ceph filesystems data use
Credit: Ceph data usage
You should request your credentials (key and secret) in Matrix chat. Go there and let admins know you'd like to access S3, and which pool works best for you.
S3 regions settings
Use the appropriate endpoint URL for your S3 client or library.
|West pool (default)
Note that the inside endpoint is http (without SSL) and the outside endpoint is https (with SSL). You can use the outside endpoint within the kubernetes cluster but it will end up going through a load balancer. By using the inside endpoint it is possible for multiple parallel requests from one or many machines to hit multiple separate OSD's and therefore achieve very large training set bandwith.
The easiest way to access S3 is Rclone.
Use these options:
Storage: Amazon S3 Compliant Storage Providers
S3 provider: Ceph Object Storage
AWS Access Key ID, AWS Secret Access Key: ask in Matrix chat
Endpoint: use the regions section
S3cmd is an open-source tool for accessing S3.
To configure, create the
~/.s3cfg file with contents if you're accessing from outside of the cluster:
or this if accessing from inside:
s3cmd ls to see the available buckets.
Upload files with the
s3cmd put FILE
Or, to upload a file to be public, use the
-P for public file:
Using AWS S3 tool
First add your credentials to ~/.aws/credentials.
If you are familiar with the AWS CLI you can create an additional profile preserving your AWS credentials by adding it to ~/.aws/credentials:
If you don't use AWS then you can just add credentials to [default] and skip the [profile] selection.
We recommend to use awscli-plugin-endpoint to write endpoint url in .aws/config, instead of typing endpoint in the CLI repeatedly. Install the plugin with:
There are a few steps on the awscli-plugin-endpoint README.md to install this plugin. If you do not wish to add this plugin, add
--endpoint-url https://s3-west.nrp-nautilus.io to all commands below.
Your .aws/config file should look like:
Using AWS CLI
The AWS CLI (command line interface) has two modes of operation for S3,
aws s3 are used for basic file manipulations (copy, list, delete, move, etc), and
aws s3api for creating/deleting buckets, manipulating permissions, etc.
You can specify the endpoint on the command line (example:
aws --endpoint https://s3-west.nrp-nautilus.io s3 ls s3://bucket-name/path) or via the s3 endpoint plugin (which is sometimes hard to install).
Create a bucket:
List objects in the bucket:
Upload a file:
Upload a file and make it publicly accessible:
You can how access this file via a browser as https://s3-west.nrp-nautilus.io/my-bucket/hello.txt
Download a file:
Give multiple users full access to the bucket
When multiple users need to access a bucket you can set those permissions with the bucket policy. You set the bucket policy using the aws s3api command:
policy.json with the following text (replace USER# and BUCKETNAME with your own users and bucket name), this policy will give all users full control over the bucket, other more granular bucket policies are certainly supported as well:
More detailed policy.json examples at: https://docs.aws.amazon.com/cli/latest/reference/s3api/put-bucket-policy.html
Cyberduck is a free S3 client for Mac and Windows. It can be used to upload and download files to/from S3 buckets. To use Cyberduck with Ceph S3 endpoints you need to leverage "deprecated" path style requests. The simplest way to do this is to install the appropriate profile into Cyberduck referenced in the Cyberduck profiles documentation, S3 (Deprecated path style requests).cyberduckprofile.
Once you add the profile, you can connect to the S3 endpoint by entering the endpoint hostname in the "Server" field. If you enter it as a URL instead of a hostname, it will likely trigger the selection of a different and undesired connection profile. For example, to connect to the S3 endpoint the for the PRP project's western region, you would enter
s3-west.nrp-nautilus.io in the "Server" field. You can then enter your access key and secret key in the "Access Key ID" and "Secret Access Key" fields, respectively.
S3 from tensorflow
Setting up s3fs (posix mount)## Setting up s3fs (posix mount) To mount a S3 bucket to filesystem, use [s3fs-fuse]. Also see the [FUSE docs](/userdocs/storage/fuse/) ### Example mounting commands are as follows access from outside the cluster
Using S3 in GitLab CIsummary> In GitLab project go to `Settings`->`CI/CD`, open the `Variables` tab, and add the variables holding your S3 credentials: `ACCESS_KEY_ID` and `SECRET_ACCESS_KEY`. Choose `protect variable` and `mask variable`. Your `.gitlab-ci.yml` file can look like:
- apt-get update && apt-get install -y curl unzip
- curl https://rclone.org/install.sh | bash
- rclone config create nautilus-s3 s3 endpoint https://s3-west.nrp-nautilus.io provider Ceph access_key_id $ACCESS_KEY_ID secret_access_key $SECRET_ACCESS_KEY
- rclone ls "nautilus-s3:"