| Apache Parquet binary column format |
|---|
| Collect: | standard_name(s) name(s) |
| Include: | Activity names |
| Estimated | Available | |
|---|---|---|
| RAM (GB) | ... | ... |
| Time (min) | ... | ... |
| Rows | ... | |
| File size (MB) | ... |
% File: read_parquet.m
% Read parquet files from STOQS and report the time it takes
campaign_url = "https://stoqs.mbari.org/stoqs_all_dorado";
api_call = "/api/measuredparameter.parquet?collect=name&activity__name__contains=";
surveys = ["Dorado389_2020_006_06_006_06_decim", "Dorado389_2020_035_00_035_00_decim"];
wo = weboptions('CertificateFilename', '', 'Timeout', 300);
for i=1:length(surveys),
tic;
disp(strcat("Reading ", surveys(i), " as .parquet"));
parquet_url = strcat(campaign_url, api_call, surveys(i));
file = websave(strcat(surveys(i), '.parquet'), parquet_url, wo);
disp(strcat("Wrote ", file));
toc;
tic;
info = parquetinfo(file);
T = parquetread(file);
disp(strcat("Read ", file));
toc;
end
disp(info.VariableNames)
—————————————————
>> read_parquet
Reading Dorado389_2020_006_06_006_06_decim as .parquet
Wrote /u/mccann/matlab/Dorado389_2020_006_06_006_06_decim.parquet
Elapsed time is 14.278701 seconds.
Read /u/mccann/matlab/Dorado389_2020_006_06_006_06_decim.parquet
Elapsed time is 0.410685 seconds.
Reading Dorado389_2020_035_00_035_00_decim as .parquet
Wrote /u/mccann/matlab/Dorado389_2020_035_00_035_00_decim.parquet
Elapsed time is 14.070438 seconds.
Read /u/mccann/matlab/Dorado389_2020_035_00_035_00_decim.parquet
Elapsed time is 0.097718 seconds.
Columns 1 through 11
"adinos" "altitude" "bbp420 (m-1)" "bbp700 (m-1)" "bg_biolum (ph L^..." "biolume" "diatoms" "fl700_uncorr" "fluo" "hdinos" "intflash (ph s^{..."
Columns 12 through 19
"mepCountList (co..." "nbflash_high (L^..." "nbflash_low (L^{..." "nitrate (micromo..." "oxygen (ml/l)" "pitch (degree)" "profile" "roll (degree)"
Columns 20 through 30
"salinity" "sepCountList (co..." "sigmat" "spice" "temperature (Cel..." "yaw (degree)" "platform" "timevalue" "depth" "latitude" "longitude"
Here is visualize_parquet.ipynb
which shows how to read and visualize parquet data in Python.