public abstract class PrefetchableTextFilesFirehoseFactory<T> extends AbstractTextFilesFirehoseFactory<T>
connect(StringInputRowParser, File)
, it caches objects in a local disk
up to maxCacheCapacityBytes. These caches are NOT deleted until the process terminates, and thus can be used for
future reads.
prefetchTriggerBytes
, a background prefetch thread automatically starts to fetch remaining
objects.
maxFetchRetry
.
This implementation can be useful when the cost for reading input objects is large as reading from AWS S3 because
batch tasks like IndexTask or HadoopIndexTask can read the whole data twice for determining partition specs and
generating segments if the intervals of GranularitySpec is not specified.
LineIterator
reading that file.
LineIterator
only when the
download operation is successfully finished.
LineIterator
which directly reads the stream opened by
AbstractTextFilesFirehoseFactory.openObjectStream(T)
. If there is an IOException, it will throw it and the read will fail.Constructor and Description |
---|
PrefetchableTextFilesFirehoseFactory(Long maxCacheCapacityBytes,
Long maxFetchCapacityBytes,
Long prefetchTriggerBytes,
Long fetchTimeout,
Integer maxFetchRetry) |
Modifier and Type | Method and Description |
---|---|
Firehose |
connect(StringInputRowParser firehoseParser,
File temporaryDirectory)
Initialization method that connects up the fire hose.
|
long |
getFetchTimeout() |
long |
getMaxCacheCapacityBytes() |
long |
getMaxFetchCapacityBytes() |
int |
getMaxFetchRetry() |
long |
getPrefetchTriggerBytes() |
initObjects, openObjectStream, wrapObjectStream
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
connect
public long getMaxCacheCapacityBytes()
public long getMaxFetchCapacityBytes()
public long getPrefetchTriggerBytes()
public long getFetchTimeout()
public int getMaxFetchRetry()
public Firehose connect(StringInputRowParser firehoseParser, File temporaryDirectory) throws IOException
FirehoseFactory
PrefetchableTextFilesFirehoseFactory
may use a temporary
directory to cache data in it.connect
in interface FirehoseFactory<StringInputRowParser>
connect
in class AbstractTextFilesFirehoseFactory<T>
firehoseParser
- an input row parsertemporaryDirectory
- a directory where temporary files are storedIOException
Copyright © 2011–2018. All rights reserved.