data@tacc.utexas.edu
by Chris Jordan ctjordan@tacc.utexas.edu
Life cycle of the data
- Generation for specific purposes
- Creation of metadata
- Direct use in research/experimentation
- Selection and Publication of data
- Retirement of inaccurate/outmoded data
- Archival of not immediately useful data
- Long-term preservation
- Incorporation into larger repositories
Open Access, multi-collection repositories
Collection Structure
- dated, type doc
- consistent in structure
- linkage to background/foreground
-
internal conventions, i.e. norms
-
Data Management Plan
Data Type
Structured
- SQL/RDBMS
- semi-structured, HTML/XML
Unstructured
- NoSQL
- images, audio, mesh data...
HDF5 - Hierarchical Data Formats
- self-description
- inclusion of metadata
- coherence in storing output data
Metadata
metadata
: data about data, header
- Provenance: "audit trail" for research
- Reproducibility
- Observational data: date/time/sensor...
-
Simulational: software, hardware, exec. params.
-
crucial to record AEarlyAP
- automate extraction/creation
- enforce good recording
Archiving Research Data
- Repositories
- centralized
- Accessibility
- maintainace and performance
- Evolving
- flexibility to configure
- seamless transition
- long-term agreement
-
Publish-ability
- scale of data
-
Preparation
- map of data
- mapping as processing
- Execution
- follow practices & permission
- collective note/database/workflow
- batch data management scripts
- cron to automate permissions & movement
- use logs/job output
multiple backup
don't use personal hard drive
don't use commercial "cloud" as primary data store
TACC Stockyard
$work
: 1TB/2TB by request
project-term
: 1 year storage
$scratch
: computational work
$home
: personal files
Corral
: 5TB free, open web access, persistent DNS/Web for long-lived URLs, VMs as
backend
http://www.fishesoftexas.org
http://arctos.database.museum
Ranch
: 160PB for personal Archive
XSEDE/Wranger
: 500TB flash HP storage
GridFTP/Globus Online
: web-based, graphical interfaces, useful for XSEDE
iRODS
: APIS, web/GUI/cmd, automated enforcement
Comments !