Status
HDF5 is a great mechanism for storing large numerical arrays of homogenous type, for data models that can be organized hierarchically and benefit from tagging of datasets with arbitrary metadata. It’s quite different from SQL-style relational databases. If experiencing trouble finding HDF5 with CMake, try our FindHDF5.cmake, which is more up to date than the FindHDF5.cmake included with CMake. An example CMake for writing network data to HDF5 in C: CMakeLists.txt. A simple HDF5 read/write example is given below. Linux: apt install libhdf5-dev; Mac: brew install hdf5. The h5py package is a Pythonic interface to the HDF5 binary data format. It lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays.
Currently testing with nodejs v9.9.0 and V8 6.2.414.46-node.22.
If your native hdf5 libraries aren’t at the default you can set the path with –hdf5_home_linux switch on this project as well as dependent projects.
If you want static native linking set –link_type to static
For mac and windows the switches are –hdf5_home_mac & –hdf5_home_win
Dependencies
- HDF5 C Library v5-1.10.0-patch1 (Prior v5-1.8.x’s untested yet should work)
native install on Ubuntu
If you don’t already have and don’t know where your native hdf5 install is located:
You can easily rearrange added Vuze files by dragging and dropping, or through up/down buttons on top.Different from ImgBurn, every time you import photos, they will be combined into a slideshow, which is a good way to watch photos on TV. Import files into the DVD burnerAfter installed this ImgBurn alternative for Mac, drag and drop the video and photo files you want to burn to DVD to the list area of the program, or click to the Plus button to browse and load files from computer. See below step-by-step guide on how to burn DVD with ImgBurn for Mac alternative.Step 1. How to Use ImgBurn for Mac AlternativeThis alternative to ImgBurn Mac has an intuitive interface and can burn DVD on macOS High Sierra, Sierra, EL Capitan, Yosemite, Maviercks, Mountain Lion, Lion and Snow Leopard. Burn for mac apple.
The installed location could be /usr/lib/x86_64-linux-gnu/hdf5/serial
Compiling
The code requires a gcc compiler supporting C++11 for linux, MacOSX & Windows. The binding.gyp defines the cflags with -std=c++11.
In a working copy of git
NODE_PATH is still used for the mocha tests.
Environment Variables
The path to the HDF5 shared objects must be added to the runtime library search path.
for linux example:
for Mac OSX example:
for Windows example:
If you want one of the third party filters available put its install path on HDF5_PLUGIN_PATH
Running Test
The tests are based on co-mocha
Introduced in release: 1.18.
Hierarchical Data Format (HDF) is a set of file formats designed to store and organize large amounts of data 1. Originally developed at the National Center for Supercomputing Applications, it is supported by The HDF Group, a non-profit corporation whose mission is to ensure continued development of HDF5 technologies and the continued accessibility of data stored in HDF 2.
This plugin enables Apache Drill to query HDF5 files.
Configuring the HDF5 Format Plugin
There are three configuration variables in this plugin and which are tabled below.
Option | Default | Description |
---|---|---|
type | (none) | Set to “hdf5” to make use of this plugin |
extensions | “.h5” | This is a list of the file extensions used to identify HDF5 files. Typically HDF5 uses .h5 or .hdf5 as file extensions. |
defaultPath | null | The default path defines which path Drill will query for data. Typically this should be left as null in the configuration file. Its usage is explained below. |
Example Configuration
For most uses, the configuration below will suffice to enable Drill to query HDF5 files.
Usage
Since HDF5 can be viewed as a file system within a file, a single file can contain many datasets. For instance, if you have a simple HDF5 file, a star query will produce the following result:
The actual data in this file is mapped to a column called int_data. In order to effectively access the data, you should use Drill’s
FLATTEN()
function on the int_data
column, which produces the following result.apache drill> select flatten(int_data) as int_data from dfs.test.
dset.h5
;Once the data is in this form, you can access it similarly to how you might access nested data in JSON or other files.
However, a better way to query the actual data in an HDF5 file is to use the
defaultPath
field in your query. If the defaultPath
field is defined in the query, or via the plugin configuration, Drill will only return the data, rather than the file metadata.Note
Once you have determined which data set you are querying, it is advisable to use this method to query HDF5 data.
Note
Datasets larger than 16MB will be truncated in the metadata view.
You can set the
defaultPath
variable in either the plugin configuration, or at query time using the table()
function as shown in the example below:This query will return the result below:
If the data in
defaultPath
is a column, the column name will be the last part of the path. If the data is multidimensional, the columns will get a name of <data_type>_col_n
. Therefore a column of integers will be called int_col_1
.Attributes
Occasionally, HDF5 paths will contain attributes. Drill will map these to a map data structure called
attributes
, as shown in the query below.You can access the individual fields within the
attributes
map by using the structure table.map.key
. Note that you will have to give the table an alias for this to work properly.Known Limitations
There are several limitations of the HDF5 format plugin in Drill.
- Drill cannot read unsigned 64 bit integers. When the plugin encounters this data type, it will write an INFO message to the log.
- While Drill can read compressed HDF5 files, Drill cannot read individual compressed fields within an HDF5 file.
- HDF5 files can contain nested data sets of up to
n
dimensions. Since Drill works best with two dimensional data, datasets with more than two dimensions are reduced to 2 dimensions. - HDF5 has a
COMPOUND
data type. At present, Drill supports readingCOMPOUND
data types that contain multiple datasets. At present Drill does not supportCOMPOUND
fields with multidimesnional columns. Drill will ignore multidimensional columns withinCOMPOUND
fields.
Hdf5 For Mac
- https://en.wikipedia.org/wiki/Hierarchical_Data_Format ↩
- https://www.hdfgroup.org ↩