HadoopTask (io.druid:druid 0.12.0 API)

java.lang.Object
- io.druid.indexing.common.task.AbstractTask
- - io.druid.indexing.common.task.HadoopTask

All Implemented Interfaces:

Task

Direct Known Subclasses:

HadoopConverterTask.ConverterSubTask, HadoopIndexTask
```
public abstract class HadoopTask
extends AbstractTask
```

Field Summary

Fields
Modifier and Type	Field and Description
`protected static com.google.common.base.Predicate<URL>`	`IS_DRUID_URL` `buildClassLoader(TaskToolbox)` has outdated javadocs referencing this field, TODO update

Constructor Summary

Constructors
Modifier	Constructor and Description
`protected`	`HadoopTask(String id, String dataSource, List<String> hadoopDependencyCoordinates, Map<String,Object> context)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static ClassLoader`	`buildClassLoader(List<String> hadoopDependencyCoordinates, List<String> defaultHadoopCoordinates)`
`protected ClassLoader`	`buildClassLoader(TaskToolbox toolbox)` This makes an isolated classloader that has classes loaded in the "proper" priority.
`List<String>`	`getHadoopDependencyCoordinates()`
`static <InputType,OutputType> OutputType`	`invokeForeignLoader(String clazzName, InputType input, ClassLoader loader)` This method tries to isolate class loading during a Function call

Methods inherited from class io.druid.indexing.common.task.AbstractTask
canRestore, equals, getClasspathPrefix, getContext, getDataSource, getGroupId, getId, getNodeType, getQueryRunner, getTaskResource, hashCode, stopGracefully, success, toString

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Methods inherited from interface io.druid.indexing.common.task.Task
getContextValue, getContextValue, getPriority, getType, isReady, run

- Field Detail
  - IS_DRUID_URL
```
protected static final com.google.common.base.Predicate<URL> IS_DRUID_URL
```
    buildClassLoader(TaskToolbox) has outdated javadocs referencing this field, TODO update
- Constructor Detail
  - HadoopTask
```
protected HadoopTask(String id,
                     String dataSource,
                     List<String> hadoopDependencyCoordinates,
                     Map<String,Object> context)
```
- Method Detail
  - getHadoopDependencyCoordinates
```
public List<String> getHadoopDependencyCoordinates()
```
  - buildClassLoader
```
protected ClassLoader buildClassLoader(TaskToolbox toolbox)
                                throws MalformedURLException
```
    This makes an isolated classloader that has classes loaded in the "proper" priority. This isolation is *only* for the part of the HadoopTask that calls runTask in an isolated manner. Jars for the job are the same jars as for the classloader EXCEPT the hadoopDependencyCoordinates, which are not used in the job jars. The URLs in the resultant classloader are loaded in this priority: 1. Non-Druid jars (see IS_DRUID_URL) found in the ClassLoader for HadoopIndexTask.class. This will probably be the ApplicationClassLoader 2. Hadoop jars found in the hadoop dependency coordinates directory, loaded in the order they are specified in 3. Druid jars (see IS_DRUID_URL) found in the ClassLoader for HadoopIndexTask.class 4. Extension URLs maintaining the order specified in the extensions list in the extensions config At one point I tried making each one of these steps a URLClassLoader, but it is not easy to make a properly predictive IS_DRUID_URL that captures all things which reference druid classes. This lead to a case where the class loader isolation worked great for stock druid, but failed in many common use cases including extension jars on the classpath which were not listed in the extensions list. As such, the current approach is to make a list of URLs for a URLClassLoader based on the priority above, and use THAT ClassLoader with a null parent as the isolated loader for running hadoop or hadoop-like driver tasks. Such an approach combined with reasonable exclusions in io.druid.cli.PullDependencies#exclusions tries to maintain sanity in a ClassLoader where all jars (which are isolated by extension ClassLoaders in the Druid framework) are jumbled together into one ClassLoader for Hadoop and Hadoop-like tasks (Spark for example).
    
    Parameters:
    
    toolbox - The toolbox to pull the default coordinates from if not present in the task
    
    Returns:
    
    An isolated URLClassLoader not tied by parent chain to the ApplicationClassLoader
    
    Throws:
    
    MalformedURLException - from Initialization.getClassLoaderForExtension
  - buildClassLoader
```
public static ClassLoader buildClassLoader(List<String> hadoopDependencyCoordinates,
                                           List<String> defaultHadoopCoordinates)
                                    throws MalformedURLException
```
    Throws:
    
    MalformedURLException
  - invokeForeignLoader
```
public static <InputType,OutputType> OutputType invokeForeignLoader(String clazzName,
                                                                    InputType input,
                                                                    ClassLoader loader)
```
    This method tries to isolate class loading during a Function call
    
    Type Parameters:
    
    InputType - The input type of the method.
    
    OutputType - The output type of the method. The result of runTask must be castable to this type.
    
    Parameters:
    
    clazzName - The Class which has a static method called `runTask`
    
    input - The input for `runTask`, must have `input.getClass()` be the class of the input for runTask
    
    loader - The loader to use as the context class loader during invocation
    
    Returns:
    
    The result of the method invocation

Class HadoopTask

Field Summary

Constructor Summary

Method Summary

Methods inherited from class io.druid.indexing.common.task.AbstractTask

Methods inherited from class java.lang.Object

Methods inherited from interface io.druid.indexing.common.task.Task

Field Detail

IS_DRUID_URL

Constructor Detail

HadoopTask

Method Detail

getHadoopDependencyCoordinates

buildClassLoader

buildClassLoader

invokeForeignLoader