⚠️ Note: This page is within the system
namespace/directory because as of 2024-09-20, the RUB importer in ReSeeD only supports importing from a subdirectory of the same S3 share ("share" meaning "parent of an S3 bucket") that is also used by ReSeeD to store the imported data (configured using the S3_ENDPOINT
, S3_ACCESS_KEY
, S3_SECRET_KEY
, and S3_REGION
variables in the .env
file). The name of the S3 bucket from which the RUB importer imports the data is configured in the S3_FILE_UPLOAD_BUCKET
variable in the .env
file. Because access to this S3 share requires sysadmin access anyway, this page belongs into the system
directory of the wiki for the time being.
This process uses the Bulkrax CSV from S3 parser to do imports. Metadata is prepared in CSV files, data for each dataset is provided in distinct folders (see below).
The data to be imported needs to have the following file structure
There should be a file called metadata.csv
metadata.csv
file is explained in The metadata CSV format section (below)metadata.csv
in the column dataset_path
.Within each dataset path, there should be a directory named data
where all the data for the dataset is placed.
An example data structure for 2 datasets is shown below
cl-reseed_import/set1/
├── dataset1
│ └── data
│ └── 1529
│ ├── folder_1
│ │ ├── another_file.exe
│ │ └── some_other_file.json
│ ├── my_software.exe
│ └── mydata.json
├── dataset2
│ └── data
│ ├── AV02CP07GI0
│ │ ├── anat
│ │ │ └── sub-AV02CP07GI0_T1w.nii
│ │ └── func
│ │ └── sub-AV02CP07GI0_task-rest_bold.nii
│ ├── CHANGES
│ ├── README
│ ├── dataset_description.json
│ ├── participants.json
│ └── participants.tsv
└── metadata.csv
The example zip file Example_RUB_import_data.zip has the datasets and the metadata.csv
structured as needed.
Upload the data you want to import (for example: the unzipped data in Example_RUB_import_data.zip) into the S3 bucket that ReSeed has access to
For example: cl-reseed_import
This bucket name needs to be filled in the form for Specify a bucket name with prefix
Log into ReSeeD as an administrator.
On the dashboard you should see the options Importers and Exporters. Click on Importers.
In the importers page, click on New
on the top left corner. This would open the importer form
Fill in the Importer form with the values as shown in the screenshot above and click on Create and import
Field name | Value | Note |
---|---|---|
Name | Any identifiable name for the import | |
Administrative Set | RUB publication workflow | This will apply this workflow to all imported datasets |
Frequency | Once | We are running a one off import |
Limit | 0 or leave blank | This will import all records in the metadata.csv file |
Parser | CSV from S3 - ReSeed CSV parser for work (Datasets) from local S3 | This will choose the parser for ReSeed |
Visibility | Private | The workflow will need all datasets to be private until published |
Rights statement | Leave blank | It will pick up the rights statement from the csv file |
Specify a bucket name with prefix | cl-reseed_import | The bucket name with the prefix. You could also add a path within the bucket, for example: cl-reseed_import/set1 |
Column header | Cardinality | Format | Example 1 | Example 2 |
---|---|---|---|---|
title | One | String The title of the dataset | Test dataset 1 for import | Test dataset 2 for import |
dataset_path | One | String Folder path within the bucket | dataset1 | dataset2 |
alternative_title | Zero or more | String The alternative title(s) of the dataset Multiple values should be separated with a semi-colon. | The rhythms of old men who hit things with sticks | The rhythms of old men who hit things with sticks; Huh? |
description | Zero or one | String Description of the dataset | A collection of rhythms from veteran rock drummers | A collection of rhythms from veteran rock drummers |
contributor | Zero or more | Names should be entered in the format: LAST_NAME, FORENAME(S). Multiple contributors should be separated with a semi-colon. The order of names is significant in relating them to: contributor_orcid contributor_affiliation | Starr, Ringo; Bonham, John; Densmore, John; Moon, Keith | Starr, Ringo; Bonham, John; Densmore, John; Moon, Keith |
contributor_orcid | Zero or more | ORCIDS should be entered in their full https format. The order of ORCIDS is significant in relating them to contributor. ORCIDS should be separated with a semi-colon. It should ideally have the same number of semi-colons as contributor. | ;;https://orcid.org/0000-0001-5109-3700; | https://orcid.org/0000-0001-0001-3700;;; |
contributor_affiliation | Zero or more | String The order of affiliations is significant in relating them to contributor. Affiliations should be separated with a semi-colon. It should ideally have the same number of semi-colons as contributor. | The Beatles; Led Zeppelin; The Doors; The Who | The Beatles;;The Doors; |
creator | One or more | Names should be entered in the format: LAST_NAME, FORENAME(S) Multiple creators should be separated with a semi-colon. The order of names is significant in relating them to: creator_orcid creator_affiliation | Lennon, John | Lennon, John; McCartney, Paul |
creator_orcid | One or more | ORCIDS should be entered in their full https format. The order of ORCIDS is significant in relating them to creator. ORCIDS should be separated with a semi-colon. It should ideally have the same number of semi-colons as creator. | https://orcid.org/0000-0001-5109-3700 | https://orcid.org/0000-0001-5109-3700;https://orcid.org/0000-0001-5109-3701 |
creator_affiliation | One or more | String The order of affiliations is significant in relating them to creator Affiliations should be separated with a semi-colon. It should ideally have the same number of semi-colons as creator. | The Beatles | The Beatles;The Beatles |
keyword | One or more | String Multiple keywords should be separated with a semi-colon | drumming | drumming; pop stars |
resource_type | One or more | Must be one or more of: Book BookChapter Collection ComputationalNotebook ConferencePaper DataPaper Dataset Dissertation Event Image InteractiveResource Journal JournalArticle Model OutputManagementPlan PeerReview PhysicalObject Preprint Report Service Software Sound Standard Text Workflow Other If the value is not one of the allowed values, we will set it to Dataset | Dataset | Dataset |
license | One | Must be one of http://rightsstatements.org/vocab/InC/1.0/ https://creativecommons.org/licenses/by/4.0/ https://creativecommons.org/licenses/by-sa/4.0/ https://creativecommons.org/licenses/by-nd/4.0/ https://creativecommons.org/licenses/by-nc/4.0/ https://creativecommons.org/licenses/by-nc-nd/4.0/ https://creativecommons.org/licenses/by-nc-sa/4.0/ http://creativecommons.org/publicdomain/zero/1.0/ http://creativecommons.org/publicdomain/mark/1.0/ http://www.apache.org/licenses/LICENSE-2.0 http://www.gnu.org/licenses/gpl.html http://opensource.org/licenses/MIT If the license URI is not one of the allowed values, we will ignore it | http://creativecommons.org/publicdomain/mark/1.0/ | http://opensource.org/licenses/MIT |
date | Zero or more | Dates should be entered in the format: YYYY-MM-DD |
2024-05-29 Created; 2024-06-10 Published | 2024-05-29 Created; 2024-06-10 Published |
subject | Zero or more | String Multiple subjects should be separated with a semi-colon | drumming | Drumming; music |
language | Zero or more | String Multiple languages should be separated with a semi-colon | English | English |
location | Zero or more | String Multiple languages should be separated with a semi-colon | London | |
software_version | Zero or more | String Multiple software versions should be separated with a semi-colon | ||
funder_identifier | Zero or more | Identifiers should be entered as full URIs Multiple funders Identifier should be separated with a semi-colon The order of identifiers is significant in relating them to: funder_name award_number award_uri award_title | http://dx.doi.org/10.13039/501100001659 | http://dx.doi.org/10.13039/501100001659;http://dx.doi.org/10.13039/50110000165999 |
funder_name | Zero or more | Multiple funder’s name should be separated with a semi-colon. It should ideally have the same number of semi-colons as identifier. The order of funder name is significant in relating them to: funder_identifier award_number award_uri award_title | DFG | DFG;RUB |
award_number | Zero or more | Multiple Funder's award number should be separated with a semi-colon. It should ideally have the same number of semi-colons as identifier. The order of award number is significant in relating them to: funder_identifier funder_name award_uri award_title | A0001 | A0001;W3asxa3 |
award_uri | Zero or more | Multiple Funder's award uri should be separated with a semi-colon. It should ideally have the same number of semi-colons as identifier. The order of award uri is significant in relating them to: funder_identifier funder_name award_number award_title | ||
award_title | Zero or more | Multiple Funder's award uri should be separated with a semi-colon. It should ideally have the same number of semi-colons as identifier. The order of award uri is significant in relating them to: funder_identifier funder_name award_number award_title |