The DataPLANT DataHUB as a science gateway is primarily backed by the open source GitLab framework. It provides an entry point to our various services, starting with a versioned, generated web page and additional modules for community interaction. The DataHUB platform is where the DataPLANT Annotated Research Contexts (ARCs) evolve to a certain state. This can be done either on the central DataPLANT instance or on various on premise installations. To allow more sites to join the DataHUB federation, we created a Docker image to ease the on-premise installation. The package is available like other DataPLANT tools on GitHub.
Storage resources are used for both keeping the necessary service configurations and user data. Depending on the service provided locally, storage resources may be provided in the form of traditional network file systems such as NFS or SMB, or as object storage. The on-premise storage has to implement the necessary redundancy to keep user data secure to the required level. An authentication instance is required for authentication of users and the services behind them. The DataPLANT user management builds on existing AAIs. Well established services such as Life Sciences AAI and ORCID can be combined with local authentication within the central DataPLANT authentication service. The infrastructure relies on KeyCloak, which supports modern authentication protocols like OpenID Connect and SAML, allowing the integration of multiple AAIs and identity brokering. It is possible to assign different roles depending on the source of the account or specific attributes. Permissions can be derived from these roles to differentiate between users. These range from privileged users with full access to the data and the ability to create archives/publications, to users who only have a reporting function and/or read-only access to raw data. All of this is at an early stage and will need to be refined through more productive use of the infrastructure.