When planning for EML at the Arctic LTER we decided to update our text based form for entering metadata by a form based on an Excel worksheet. Most of our site’s researchers use Excel and there has been some interest in having the metadata and data in the same Excel workbook. After looking at Florida Coastal Everglades (FCE) LTER’s Excel metadata file we developed an Excel worksheet using the following design criteria.
The Worksheet for Entering Metadata
Figure 1. The worksheet for entering metadata
(click on image for larger version)
The design of this worksheet (Figure 1) is based on the Arctic LTER’s old metadata entry form which was developed in 1989. To ease transition some categories retained their names instead of using the EML tag names. Several new categories were added based on EML Bests Practices but not the entire granularity of EML was implemented. For example Associated Party does not include address information since these are often students, summer RA or others for which addresses change.
Comments are used extensively throughout the sheet to aid in filling out the data. Text boxes are used for sections with more then 256 characters since moving or copying an Excel worksheet truncates the text in cells to 256 characters. Data validation lists are used to created drop down lists for units, measurement scale and number types.
At this time the methods section includes only one text box where all the information about methods are entered. It was decided to be simple and not split out instrumentation, sampling, etc. In a future version we may decide to split out the information.
The variable description (dataTable/entityGroup) section includes columns for attribute name, description, units, measurement scale, number type and missing values. The use of data validation lists for units, measurement scale and number type eases filling in the required values. Not all the EML units are included in the list on the metadata worksheet – only metric ones. Researchers are allowed to enter their own units if none are found on the list. When the file is processed by the IM the units will be checked and flagged if they do not conform to EML standards. The IM will then decide if a custom unit needs to be defined.
The Information Manager’s Excel Workbook
This workbook is used by the IM to process the metadata worksheet. It has three worksheets and one macro module:
NameOfRanges worksheet includes the range names. It is used to check for valid names in the metadata sheet.
EML worksheet has the common information along with the EML tags that is included in every EML file. Information such as metadata provider, site intellectual rights, site access, etc. are entered here. In addition there may be information on file locations and URLs. This worksheet is also intended for information that would be LTER site specific thus keeping such information out of the macro code.
Units worksheet is where all the units are listed. This is similar to the FCE Excel metadata unit worksheet. Here custom units and the EML tags are defined along with alias for any of the units. This sheet is used to check the units that are entered by researchers in the metadata form. An alias column is included that allows the IM to enter commonly used alias for EML units.
The VBA macro module, Excel2EML, is used by the IM to process the metadata worksheet to create an EML file. The macro’s main procedure steps through the metadata worksheet. Separated procedures are used for checking the metadata worksheet for proper named ranges, for errors in units and for missing required data. The package ID is checked against the harvest list and it will either find the id and check the revision number or assign the next highest number to the dataset. If a new number is used the necessary elements are created and added to the harvest list. The actual EML tags are in separate procedures which are usually functions that return EML elements for the different sections. This helps with error checking and placement of elements in the EML file.
One of the more difficult sections to automate is the URLs for the EML and data files. At present for ARC some assumptions are made about the underlying directory structure where the files are located. However, in a newer version of the metadata form it was decided to have the distribution URL for the metadata and data files entered by the IM in the metadata worksheet. This avoids editing the macro code for changes in file paths or site differences.
Once the metadata worksheet is processed the macro calls HTML tidy.exe to tidy up the code and to get rid of any invalid ASCII characters. Excel often uses special ASCII characters that are not allowed in EML files. It then validates the EML file using the KNB’s EMLparser site on the web.
There is still more thought and work needed on the metadata entry worksheet and on the macro. Still incomplete is coding for taxonomic coverage and for a more complete methods section. And as in any programming project more error checking and documentation are needed.
Presently Plum Island Ecosystems (PIE) LTER and ARC are using the metadata worksheet. Sevilleta (SEV) LTER and Hubbard Brook Experimental Forest (HBEF) have expressed interest in using the metadata worksheet.
In the future better tools may supersede this simple form. One could also have used Perl to accomplish the same things I have done in a VBA macro. A stylesheet could also have been used.
||Please contact firstname.lastname@example.org questions, comments, or for technical assistance regarding this web site.|