public abstract class HadoopTask extends AbstractTask
Modifier and Type | Field and Description |
---|---|
protected static com.google.common.base.Predicate<URL> |
IS_DRUID_URL
buildClassLoader(TaskToolbox) has outdated javadocs referencing this field, TODO update |
Modifier | Constructor and Description |
---|---|
protected |
HadoopTask(String id,
String dataSource,
List<String> hadoopDependencyCoordinates,
Map<String,Object> context) |
Modifier and Type | Method and Description |
---|---|
static ClassLoader |
buildClassLoader(List<String> hadoopDependencyCoordinates,
List<String> defaultHadoopCoordinates) |
protected ClassLoader |
buildClassLoader(TaskToolbox toolbox)
This makes an isolated classloader that has classes loaded in the "proper" priority.
|
List<String> |
getHadoopDependencyCoordinates() |
static <InputType,OutputType> |
invokeForeignLoader(String clazzName,
InputType input,
ClassLoader loader)
This method tries to isolate class loading during a Function call
|
canRestore, equals, getClasspathPrefix, getContext, getDataSource, getGroupId, getId, getNodeType, getQueryRunner, getTaskResource, hashCode, stopGracefully, success, toString
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
getContextValue, getContextValue, getPriority, getType, isReady, run
protected static final com.google.common.base.Predicate<URL> IS_DRUID_URL
buildClassLoader(TaskToolbox)
has outdated javadocs referencing this field, TODO updateprotected ClassLoader buildClassLoader(TaskToolbox toolbox) throws MalformedURLException
IS_DRUID_URL
) found in the ClassLoader for HadoopIndexTask.class. This will
probably be the ApplicationClassLoader
2. Hadoop jars found in the hadoop dependency coordinates directory, loaded in the order they are specified in
3. Druid jars (see IS_DRUID_URL
) found in the ClassLoader for HadoopIndexTask.class
4. Extension URLs maintaining the order specified in the extensions list in the extensions config
At one point I tried making each one of these steps a URLClassLoader, but it is not easy to make a properly
predictive IS_DRUID_URL
that captures all things which reference druid classes. This lead to a case where
the class loader isolation worked great for stock druid, but failed in many common use cases including extension
jars on the classpath which were not listed in the extensions list.
As such, the current approach is to make a list of URLs for a URLClassLoader based on the priority above, and use
THAT ClassLoader with a null parent as the isolated loader for running hadoop or hadoop-like driver tasks.
Such an approach combined with reasonable exclusions in io.druid.cli.PullDependencies#exclusions tries to maintain
sanity in a ClassLoader where all jars (which are isolated by extension ClassLoaders in the Druid framework) are
jumbled together into one ClassLoader for Hadoop and Hadoop-like tasks (Spark for example).toolbox
- The toolbox to pull the default coordinates from if not present in the taskMalformedURLException
- from Initialization.getClassLoaderForExtensionpublic static ClassLoader buildClassLoader(List<String> hadoopDependencyCoordinates, List<String> defaultHadoopCoordinates) throws MalformedURLException
MalformedURLException
public static <InputType,OutputType> OutputType invokeForeignLoader(String clazzName, InputType input, ClassLoader loader)
InputType
- The input type of the method.OutputType
- The output type of the method. The result of runTask must be castable to this type.clazzName
- The Class which has a static method called `runTask`input
- The input for `runTask`, must have `input.getClass()` be the class of the input for runTaskloader
- The loader to use as the context class loader during invocationCopyright © 2011–2018. All rights reserved.