public class Eml200DataSource extends ResultRecord implements DataCacheListener, Previewable
The Eml200DataSource is used to gain access to a wide variety of data packages that have been described using Ecological Metadata Language (EML). Each data package contains an EML metadata description and one or more data entities (data tables, spatial raster images, spatial vector images). The data packages can be accessed from the local filesystem or through any EcoGrid server that provides access to its collection of data objects.
The metadata provided by the EML description of the data allows the data to be easily ingested into Kepler and exposed for use in downstream components. The Eml200DataSource handles all of the mechanical issues associated with parsing the metadata, downloading the data from remote servers if applicable, understanding the logical structure of the data, and emitting the data for downstream use when required. The supported data transfer protocols include http, ftp, file, ecogrid and srb.
After parsing the EML metadata, the actor automatically reconfigures its exposed ports to provide one port for each attribute in the first entity that is described in the EML description. For example, if the first entity is a data table with four columns, the ports might be "Site", "Date", "Plot", and "Rainfall" if that's what the data set contained. These details are obtained from the EML document.
By default, the ports created by the EML200DataSource represent fields in the data entities, and one tuple of data is emitted on these ports during each fire cycle. Alternatively, the actor can be configured to so that the ports instead represent an array of values for a field ("AsColumnVector"), or so that the ports represent an entire table of data ("AsTable") formatted in comma-separated-value (CSV) format.
If more than one data entity is described in the EML metadata, then the output of the actor defaults to the first entity listed in the EML. To select the other entities, one must provide a query statement that describes the filter and join that should be used to produce the data to be output. This is accomplished by selecting 'Open actor', which shows the Query configuration dialog, which can be used to select the columns to be output and any filtering constraints to be applied.
Entity.ContainedObjectsIterator
Modifier and Type | Field and Description |
---|---|
Parameter |
checkVersion
This parameter determines if remote source should be queried for latest
revision of metadata file.
|
FileParameter |
dataFilePath
The file path for locating a data file that is available from the local
file system
|
StringParameter |
dataOutputFormat
The format of the output to be produced for the data entity.
|
static Settable.Visibility |
DEFAULTFORSCHEMA |
static Settable.Visibility |
DEFAULTFORSQL |
FileParameter |
emlFilePath
The file path for locating an EML file that is available from a local
file.
|
StringParameter |
fileExtensionFilter
This parameter specifies a file extension that is used to limit the array
of filenames returned by the data source actor when "As unCompressed File
Name" is selected as the ouput type.
|
Parameter |
isLenient
This parameter determines if extra data columns that are NOT described in
the EML should be ignored (isLenient=true) or if an error should be
raised when the data and EML description do not match (isLenient=false)
TRUE - extra data columns are ignored FALSE - an error is raised when
data and metadata conflict
|
StringAttribute |
schemaDef
Schema definition for the entities in this package.
|
StringParameter |
selectedEntity
If this EML package has mutiple entities, this parameter specifies which
entity should be used for output.
|
StringAttribute |
sqlDef
The SQL command which will be applied to the data entity to filter data
values.
|
hide
_triggered, output, trigger
_typesValid
_actorFiringListeners, _initializables, _notifyingActorFiring, _stopRequested
_changeListeners, _changeLock, _changeRequests, _debugging, _debugListeners, _deferChangeRequests, _elementName, _isPersistent, _verbose, _workspace, ATTRIBUTES, CLASSNAME, COMPLETE, CONTENTS, DEEP, FULLNAME, LINKS
BLACK, MAGENTA, NAMESPACE, RECORDID, RED, TITLE_BINARY, TITLE_BUSY, TITLE_ERROR, YELLOW
COMPLETED, NOT_READY, STOP_ITERATING
Constructor and Description |
---|
Eml200DataSource(CompositeEntity container,
java.lang.String name)
Construct an actor with the given container and name.
|
Modifier and Type | Method and Description |
---|---|
void |
attributeChanged(Attribute attribute)
Callback for changes in attribute values.
|
java.lang.Object |
clone(Workspace workspace)
Clone the Eml200DataSource into the specified workspace.
|
void |
complete(DataCacheObject aItem) |
void |
fire()
Send a record's tokens over the ports on each fire event.
|
static void |
generateDocumentationForInstance(Eml200DataSource emlActor)
This method allows default documentation to be added to the actor
specified in the parameter.
|
java.util.Vector |
getColumns()
Accessor to _columns member.
|
java.net.URL |
getDocumentation()
Get a URL pointer to the documentation for this data source.
|
java.util.Vector<Entity> |
getEntityList() |
java.io.Reader |
getFullRecord()
Overwrite the method in Parent class -- ResultRecord
|
java.util.Vector |
gotRowVectorFromSource()
This method will read a row vector from data source, either from
resultset which excuted by data query or delimiterdReader which reader
from data inputtream - _reader.
|
void |
initialize()
Initialize the actor prior to running in the workflow.
|
boolean |
isEndOfResultset()
This method will determine if the resultset is complete.
|
boolean |
postfire()
This method is only for output as byte array.
|
boolean |
prefire()
If the trigger input is connected and it has no input or an unknown
state, then return false.
|
void |
preinitialize()
Create receivers and declare delay dependencies.
|
void |
preview()
Implementers should provide some sort of preview (typically presenting a
GUI) of the actor that is implementing this interface
|
void |
stop()
Callback method that indicates that the workflow is being stopped.
|
addRecordDetail, getEndpoint, getNamespace, getRecordDetailList, getRecordId, hasConnectionValues, setEndpoint, setNamespace, setRecordId, transformResultRecordArrayToVector
_customTypeConstraints
_containedTypeConstraints, _defaultTypeConstraints, _fireAt, _fireAt, attributeTypeChanged, clone, isBackwardTypeInferenceEnabled, newPort, typeConstraintList, typeConstraints
_actorFiring, _actorFiring, _declareDelayDependency, addActorFiringListener, addInitializable, connectionsChanged, createReceivers, declareDelayDependency, getCausalityInterface, getDirector, getExecutiveDirector, getManager, inputPortList, isFireFunctional, isStrict, iterate, newReceiver, outputPortList, pruneDependencies, recordFiring, removeActorFiringListener, removeDependency, removeInitializable, setContainer, stopFire, terminate, wrapup
_adjustDeferrals, _checkContainer, _getContainedObject, _propagateExistence, getContainer, instantiate, isAtomic, isOpaque, moveDown, moveToFirst, moveToIndex, moveToLast, moveUp, propagateExistence, setName
_addPort, _description, _exportMoMLContents, _removePort, _validateSettables, connectedPortList, connectedPorts, containedObjectsIterator, getAttribute, getPort, getPorts, linkedRelationList, linkedRelations, portList, removeAllPorts, setClassDefinition, uniqueName
_setParent, exportMoML, getChildren, getElementName, getParent, getPrototypeList, isClassDefinition, isWithinClassDefinition
_addAttribute, _adjustOverride, _attachText, _cloneFixAttributeFields, _containedDecorators, _copyChangeRequestList, _debug, _debug, _debug, _debug, _debug, _executeChangeRequests, _getIndentPrefix, _isMoMLSuppressed, _markContentsDerived, _notifyHierarchyListenersAfterChange, _notifyHierarchyListenersBeforeChange, _propagateValue, _removeAttribute, _splitName, _stripNumericSuffix, addChangeListener, addDebugListener, addHierarchyListener, attributeDeleted, attributeList, attributeList, decorators, deepContains, depthInHierarchy, description, description, event, executeChangeRequests, exportMoML, exportMoML, exportMoML, exportMoML, exportMoMLPlain, getAttribute, getAttributes, getChangeListeners, getClassName, getDecoratorAttribute, getDecoratorAttributes, getDerivedLevel, getDerivedList, getDisplayName, getFullName, getModelErrorHandler, getName, getName, getSource, handleModelError, isDeferringChangeRequests, isOverridden, isPersistent, lazyContainedObjectsIterator, message, notifyOfNameChange, propagateValue, propagateValues, removeAttribute, removeChangeListener, removeDebugListener, removeHierarchyListener, requestChange, setClassName, setDeferringChangeRequests, setDerivedLevel, setDisplayName, setModelErrorHandler, setPersistent, setSource, sortContainedObjects, toplevel, toString, validateSettables, workspace
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
createReceivers, getCausalityInterface, getDirector, getExecutiveDirector, getManager, inputPortList, newReceiver, outputPortList
isFireFunctional, isStrict, iterate, stopFire, terminate
addInitializable, removeInitializable, wrapup
description, getContainer, getDisplayName, getFullName, getName, getName, setName
getDerivedLevel, getDerivedList, propagateValue
public static final Settable.Visibility DEFAULTFORSQL
public static final Settable.Visibility DEFAULTFORSCHEMA
public FileParameter emlFilePath
public FileParameter dataFilePath
public StringAttribute sqlDef
public StringAttribute schemaDef
public StringParameter dataOutputFormat
As Field: This is the default. One output port is created for each field (aka column/attribute/variable) that is described in the EML metadata for the data entity. If the SQL statement has been used to subset the data, then only those fields selected in the SQL statement will be configured as ports.
As Table: The selected entity will be sent out as a string which contains the entire entity data. It has three output ports: DataTable - the data itself, Delimiter - delimiter to seperate fields, and NumColumns - the number of fields in the table.
As Row: In this output format, one tuple of selected data is formatted as an array and sent out. It only has one output port (DataRow) and the data type is a record containing each of the individuals fields.
As Byte Array: Selected data will be sent out as an array of bytes which are read from the data file. This is the raw data being sent in binary format. It has two output ports: BinaryData - contains data itself, and EndOfStream - a tag to indicate if it is end of data stream.
As UnCompressed File Name: This format is only used when the entity is a compressed file (zip, tar et al). The compressed archive file is uncompressed after it is downloaded. It has only one output port which will contain an array of the filenames of all of the uncompressed files from the archive. If the parameter "Target File Extension in Compressed File" is provided, then instead the array that is returned will only contain the files with the file extension provided.
As Cache File Name: Kepler stores downloaded data files from remote sites into its cache system. This output format will send the local cache file path for the entity so that workflow designers can directly access the cache files. It has two output ports: CacheLocalFileName - the local file path, and CacheResourceName - the data link in eml for this enity.
As Column Vector: This output format is similar to "As Field". The difference is instead sending out a single value on each port, it sends out an array of all of the data for that field. The type of each port is an array of the base type for the field.
As ColumnBased Record: This output format will send all data on one port using a Record structure that encapsulates the entire data object. The Record will contain one array for each of the fields in the data, and the type of that array will be determined by the type of the field it represents.
public StringParameter fileExtensionFilter
public Parameter isLenient
public Parameter checkVersion
public StringParameter selectedEntity
public Eml200DataSource(CompositeEntity container, java.lang.String name) throws NameDuplicationException, IllegalActionException
container
- The container.name
- The name of this actor.IllegalActionException
- If the actor cannot be contained by the proposed
container.NameDuplicationException
- If the container already has an actor with this name.public java.util.Vector getColumns()
public java.util.Vector<Entity> getEntityList()
public void preinitialize() throws IllegalActionException
AtomicActor
preinitialize
in interface Initializable
preinitialize
in class AtomicActor<TypedIOPort>
IllegalActionException
- Not thrown in this base class.public void initialize() throws IllegalActionException
initialize
in interface Initializable
initialize
in class AtomicActor<TypedIOPort>
IllegalActionException
public java.util.Vector gotRowVectorFromSource() throws java.lang.Exception
java.lang.Exception
public boolean prefire() throws IllegalActionException
Source
prefire
in interface Executable
prefire
in class Source
IllegalActionException
- If checking the trigger for
a token throws it or if the super class throws it.public void fire() throws IllegalActionException
fire
in interface Executable
fire
in class Source
IllegalActionException
- If there is no director.public boolean postfire() throws IllegalActionException
postfire
in interface Executable
postfire
in class AtomicActor<TypedIOPort>
IllegalActionException
- If there is a problem reading the file.public void attributeChanged(Attribute attribute) throws IllegalActionException
attributeChanged
in class NamedObj
attribute
- The attribute that changed.IllegalActionException
- If the change is not acceptable
to this container (not thrown in this base class).public java.lang.Object clone(Workspace workspace) throws java.lang.CloneNotSupportedException
clone
in class ResultRecord
workspace
- The workspace for the new object.java.lang.CloneNotSupportedException
- If a derived class contains
an attribute that cannot be cloned.NamedObj.exportMoML(Writer, int, String)
,
NamedObj.setDeferringChangeRequests(boolean)
public static void generateDocumentationForInstance(Eml200DataSource emlActor)
emlActor
- the instance to which documentation will be addedpublic void preview()
Previewable
preview
in interface Previewable
public java.net.URL getDocumentation()
getDocumentation
in interface DataSourceInterface
getDocumentation
in class ResultRecord
public void complete(DataCacheObject aItem)
complete
in interface DataCacheListener
public void stop()
stop
in interface Executable
stop
in class AtomicActor<TypedIOPort>
public boolean isEndOfResultset() throws java.sql.SQLException
java.sql.SQLException
public java.io.Reader getFullRecord()
getFullRecord
in class ResultRecord