Data Management @ TACC

data@tacc.utexas.edu

by Chris Jordan ctjordan@tacc.utexas.edu

Webcast

Discussion

Slides

Life cycle of the data

  • Generation for specific purposes
  • Creation of metadata
  • Direct use in research/experimentation
  • Selection and Publication of data
  • Retirement of inaccurate/outmoded data
  • Archival of not immediately useful data
  • Long-term preservation
  • Incorporation into larger repositories

Open Access, multi-collection repositories

Collection Structure

  • dated, type doc
  • consistent in structure
  • linkage to background/foreground
  • internal conventions, i.e. norms

  • Data Management Plan

Data Type

Structured

  • SQL/RDBMS
  • semi-structured, HTML/XML

Unstructured

  • NoSQL
  • images, audio, mesh data...

HDF5 - Hierarchical Data Formats

  • self-description
  • inclusion of metadata
  • coherence in storing output data

Metadata

metadata: data about data, header

  • Provenance: "audit trail" for research
  • Reproducibility
  • Observational data: date/time/sensor...
  • Simulational: software, hardware, exec. params.

  • crucial to record AEarlyAP

  • automate extraction/creation
  • enforce good recording

Archiving Research Data

  • Repositories
    • centralized
  • Accessibility
    • maintainace and performance
  • Evolving
    • flexibility to configure
    • seamless transition
    • long-term agreement
  • Publish-ability

    • scale of data
  • Preparation

    • map of data
    • mapping as processing
  • Execution
    • follow practices & permission
    • collective note/database/workflow
    • batch data management scripts
    • cron to automate permissions & movement
    • use logs/job output

multiple backup
don't use personal hard drive
don't use commercial "cloud" as primary data store

Workflow Management tutorials

TACC Stockyard

$work: 1TB/2TB by request

project-term: 1 year storage

$scratch: computational work

$home: personal files

Corral: 5TB free, open web access, persistent DNS/Web for long-lived URLs, VMs as backend

http://www.fishesoftexas.org
http://arctos.database.museum

Ranch: 160PB for personal Archive

XSEDE/Wranger: 500TB flash HP storage

GridFTP/Globus Online: web-based, graphical interfaces, useful for XSEDE

iRODS: APIS, web/GUI/cmd, automated enforcement

Get Help

Published: Thu 06 April 2017. By Dongming Jin in

Comments !