Class DocumentFilter
- java.lang.Object
-
- org.apache.manifoldcf.core.connector.BaseConnector
-
- org.apache.manifoldcf.agents.transformation.BaseTransformationConnector
-
- org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter
-
- All Implemented Interfaces:
org.apache.manifoldcf.agents.interfaces.IPipelineConnector,org.apache.manifoldcf.agents.interfaces.ITransformationConnector,org.apache.manifoldcf.core.interfaces.IConnector
public class DocumentFilter extends org.apache.manifoldcf.agents.transformation.BaseTransformationConnector
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static classDocumentFilter.SpecPacker
-
Field Summary
Fields Modifier and Type Field Description protected static java.lang.String[]activitiesListprotected static java.lang.StringACTIVITY_FILTER-
Fields inherited from class org.apache.manifoldcf.agents.transformation.BaseTransformationConnector
_rcsid
-
-
Constructor Summary
Constructors Constructor Description DocumentFilter()Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description intaddOrReplaceDocumentWithException(java.lang.String documentURI, org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, org.apache.manifoldcf.agents.interfaces.RepositoryDocument document, java.lang.String authorityNameString, org.apache.manifoldcf.agents.interfaces.IOutputAddActivity activities)Add (or replace) a document in the output data store using the connector.protected booleancheckDateIndexable(DocumentFilter.SpecPacker sp, org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.util.Date date, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities)booleancheckDateIndexable(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.util.Date date, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities)Detect if a document date is acceptable or not.protected booleancheckLengthIndexable(DocumentFilter.SpecPacker sp, org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, long length, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities)booleancheckLengthIndexable(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, long length, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities)protected booleancheckMimeTypeIndexable(DocumentFilter.SpecPacker sp, org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.lang.String mimeType, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities)booleancheckMimeTypeIndexable(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.lang.String mimeType, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities)Detect if a mime type is indexable or not.protected booleancheckURLIndexable(DocumentFilter.SpecPacker sp, org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.lang.String url, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities)booleancheckURLIndexable(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.lang.String url, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities)protected static voidfillInContentsSpecificationMap(java.util.Map<java.lang.String,java.lang.Object> paramMap, org.apache.manifoldcf.core.interfaces.Specification os)protected static java.util.Set<java.lang.String>fillSet(java.lang.String input)java.lang.String[]getActivitiesList()Return a list of activities that this connector generates.java.lang.StringgetFormCheckJavascriptMethodName(int connectionSequenceNumber)Obtain the name of the form check javascript method to call.java.lang.StringgetFormPresaveCheckJavascriptMethodName(int connectionSequenceNumber)Obtain the name of the form presave check javascript method to call.org.apache.manifoldcf.core.interfaces.VersionContextgetPipelineDescription(org.apache.manifoldcf.core.interfaces.Specification os)Get an output version string, given an output specification.voidoutputSpecificationBody(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification os, int connectionSequenceNumber, int actualSequenceNumber, java.lang.String tabName)Output the specification body section.voidoutputSpecificationHeader(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification os, int connectionSequenceNumber, java.util.List<java.lang.String> tabsArray)Output the specification header section.java.lang.StringprocessSpecificationPost(org.apache.manifoldcf.core.interfaces.IPostParameters variableContext, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification os, int connectionSequenceNumber)Process a specification post.voidviewSpecification(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification os, int connectionSequenceNumber)View specification.-
Methods inherited from class org.apache.manifoldcf.agents.transformation.BaseTransformationConnector
checkDocumentIndexable, requestInfo
-
Methods inherited from class org.apache.manifoldcf.core.connector.BaseConnector
check, clearThreadContext, connect, deinstall, disconnect, getConfiguration, install, isConnected, outputConfigurationBody, outputConfigurationBody, outputConfigurationHeader, outputConfigurationHeader, outputConfigurationHeader, pack, packFixedList, packList, packList, poll, processConfigurationPost, processConfigurationPost, setThreadContext, unpack, unpackFixedList, unpackList, viewConfiguration, viewConfiguration
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.manifoldcf.core.interfaces.IConnector
check, clearThreadContext, connect, deinstall, disconnect, getConfiguration, install, isConnected, outputConfigurationBody, outputConfigurationHeader, poll, processConfigurationPost, setThreadContext, viewConfiguration
-
-
-
-
Field Detail
-
ACTIVITY_FILTER
protected static final java.lang.String ACTIVITY_FILTER
- See Also:
- Constant Field Values
-
activitiesList
protected static final java.lang.String[] activitiesList
-
-
Method Detail
-
getActivitiesList
public java.lang.String[] getActivitiesList()
Return a list of activities that this connector generates. The connector does NOT need to be connected before this method is called.- Specified by:
getActivitiesListin interfaceorg.apache.manifoldcf.agents.interfaces.ITransformationConnector- Overrides:
getActivitiesListin classorg.apache.manifoldcf.agents.transformation.BaseTransformationConnector- Returns:
- the set of activities.
-
getPipelineDescription
public org.apache.manifoldcf.core.interfaces.VersionContext getPipelineDescription(org.apache.manifoldcf.core.interfaces.Specification os) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionGet an output version string, given an output specification. The output version string is used to uniquely describe the pertinent details of the output specification and the configuration, to allow the Connector Framework to determine whether a document will need to be output again. Note that the contents of the document cannot be considered by this method, and that a different version string (defined in IRepositoryConnector) is used to describe the version of the actual document. This method presumes that the connector object has been configured, and it is thus able to communicate with the output data store should that be necessary.- Specified by:
getPipelineDescriptionin interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
getPipelineDescriptionin classorg.apache.manifoldcf.agents.transformation.BaseTransformationConnector- Parameters:
os- is the current output specification for the job that is doing the crawling.- Returns:
- a string, of unlimited length, which uniquely describes output configuration and specification in such a way that if two such strings are equal, the document will not need to be sent again to the output data store.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
checkDateIndexable
public boolean checkDateIndexable(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.util.Date date, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionDetect if a document date is acceptable or not. This method is used to determine whether it makes sense to fetch a document in the first place.- Specified by:
checkDateIndexablein interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
checkDateIndexablein classorg.apache.manifoldcf.agents.transformation.BaseTransformationConnector- Parameters:
outputDescription- is the document's output version.date- is the date of the document.activities- is an object including the activities that can be performed by this method.- Returns:
- true if the document with that date can be accepted by this connector.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
checkDateIndexable
protected boolean checkDateIndexable(DocumentFilter.SpecPacker sp, org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.util.Date date, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruption
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
checkMimeTypeIndexable
public boolean checkMimeTypeIndexable(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.lang.String mimeType, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionDetect if a mime type is indexable or not. This method is used by participating repository connectors to pre-filter the number of unusable documents that will be passed to this output connector.- Specified by:
checkMimeTypeIndexablein interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
checkMimeTypeIndexablein classorg.apache.manifoldcf.agents.transformation.BaseTransformationConnector- Parameters:
outputDescription- is the document's output version.mimeType- is the mime type of the document.- Returns:
- true if the mime type is indexable by this connector.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
checkMimeTypeIndexable
protected boolean checkMimeTypeIndexable(DocumentFilter.SpecPacker sp, org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.lang.String mimeType, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruption
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
checkLengthIndexable
public boolean checkLengthIndexable(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, long length, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruption- Specified by:
checkLengthIndexablein interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
checkLengthIndexablein classorg.apache.manifoldcf.agents.transformation.BaseTransformationConnector- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
checkLengthIndexable
protected boolean checkLengthIndexable(DocumentFilter.SpecPacker sp, org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, long length, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruption
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
checkURLIndexable
public boolean checkURLIndexable(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.lang.String url, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruption- Specified by:
checkURLIndexablein interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
checkURLIndexablein classorg.apache.manifoldcf.agents.transformation.BaseTransformationConnector- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
checkURLIndexable
protected boolean checkURLIndexable(DocumentFilter.SpecPacker sp, org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.lang.String url, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruption
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
addOrReplaceDocumentWithException
public int addOrReplaceDocumentWithException(java.lang.String documentURI, org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, org.apache.manifoldcf.agents.interfaces.RepositoryDocument document, java.lang.String authorityNameString, org.apache.manifoldcf.agents.interfaces.IOutputAddActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruption, java.io.IOExceptionAdd (or replace) a document in the output data store using the connector. This method presumes that the connector object has been configured, and it is thus able to communicate with the output data store should that be necessary.- Specified by:
addOrReplaceDocumentWithExceptionin interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
addOrReplaceDocumentWithExceptionin classorg.apache.manifoldcf.agents.transformation.BaseTransformationConnector- Parameters:
documentURI- is the URI of the document. The URI is presumed to be the unique identifier which the output data store will use to process and serve the document. This URI is constructed by the repository connector which fetches the document, and is thus universal across all output connectors.outputDescription- is the description string that was constructed for this document by the getOutputDescription() method.document- is the document data to be processed (handed to the output data store).authorityNameString- is the name of the authority responsible for authorizing any access tokens passed in with the repository document. May be null.activities- is the handle to an object that the implementer of an output connector may use to perform operations, such as logging processing activity.- Returns:
- the document status (accepted or permanently rejected).
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruptionjava.io.IOException
-
fillInContentsSpecificationMap
protected static void fillInContentsSpecificationMap(java.util.Map<java.lang.String,java.lang.Object> paramMap, org.apache.manifoldcf.core.interfaces.Specification os)
-
getFormCheckJavascriptMethodName
public java.lang.String getFormCheckJavascriptMethodName(int connectionSequenceNumber)
Obtain the name of the form check javascript method to call.- Specified by:
getFormCheckJavascriptMethodNamein interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
getFormCheckJavascriptMethodNamein classorg.apache.manifoldcf.agents.transformation.BaseTransformationConnector- Parameters:
connectionSequenceNumber- is the unique number of this connection within the job.- Returns:
- the name of the form check javascript method.
-
getFormPresaveCheckJavascriptMethodName
public java.lang.String getFormPresaveCheckJavascriptMethodName(int connectionSequenceNumber)
Obtain the name of the form presave check javascript method to call.- Specified by:
getFormPresaveCheckJavascriptMethodNamein interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
getFormPresaveCheckJavascriptMethodNamein classorg.apache.manifoldcf.agents.transformation.BaseTransformationConnector- Parameters:
connectionSequenceNumber- is the unique number of this connection within the job.- Returns:
- the name of the form presave check javascript method.
-
outputSpecificationHeader
public void outputSpecificationHeader(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification os, int connectionSequenceNumber, java.util.List<java.lang.String> tabsArray) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, java.io.IOExceptionOutput the specification header section. This method is called in the head section of a job page which has selected a pipeline connection of the current type. Its purpose is to add the required tabs to the list, and to output any javascript methods that might be needed by the job editing HTML.- Specified by:
outputSpecificationHeaderin interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
outputSpecificationHeaderin classorg.apache.manifoldcf.agents.transformation.BaseTransformationConnector- Parameters:
out- is the output to which any HTML should be sent.locale- is the preferred local of the output.os- is the current pipeline specification for this connection.connectionSequenceNumber- is the unique number of this connection within the job.tabsArray- is an array of tab names. Add to this array any tab names that are specific to the connector.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOException
-
outputSpecificationBody
public void outputSpecificationBody(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification os, int connectionSequenceNumber, int actualSequenceNumber, java.lang.String tabName) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, java.io.IOExceptionOutput the specification body section. This method is called in the body section of a job page which has selected a pipeline connection of the current type. Its purpose is to present the required form elements for editing. The coder can presume that the HTML that is output from this configuration will be within appropriate <html>, <body>, and <form> tags. The name of the form is "editjob".- Specified by:
outputSpecificationBodyin interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
outputSpecificationBodyin classorg.apache.manifoldcf.agents.transformation.BaseTransformationConnector- Parameters:
out- is the output to which any HTML should be sent.locale- is the preferred local of the output.os- is the current pipeline specification for this job.connectionSequenceNumber- is the unique number of this connection within the job.actualSequenceNumber- is the connection within the job that has currently been selected.tabName- is the current tab name.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOException
-
processSpecificationPost
public java.lang.String processSpecificationPost(org.apache.manifoldcf.core.interfaces.IPostParameters variableContext, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification os, int connectionSequenceNumber) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionProcess a specification post. This method is called at the start of job's edit or view page, whenever there is a possibility that form data for a connection has been posted. Its purpose is to gather form information and modify the transformation specification accordingly. The name of the posted form is "editjob".- Specified by:
processSpecificationPostin interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
processSpecificationPostin classorg.apache.manifoldcf.agents.transformation.BaseTransformationConnector- Parameters:
variableContext- contains the post data, including binary file-upload information.locale- is the preferred local of the output.os- is the current pipeline specification for this job.connectionSequenceNumber- is the unique number of this connection within the job.- Returns:
- null if all is well, or a string error message if there is an error that should prevent saving of the job (and cause a redirection to an error page).
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
viewSpecification
public void viewSpecification(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification os, int connectionSequenceNumber) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, java.io.IOExceptionView specification. This method is called in the body section of a job's view page. Its purpose is to present the pipeline specification information to the user. The coder can presume that the HTML that is output from this configuration will be within appropriate <html> and <body>tags.- Specified by:
viewSpecificationin interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
viewSpecificationin classorg.apache.manifoldcf.agents.transformation.BaseTransformationConnector- Parameters:
out- is the output to which any HTML should be sent.locale- is the preferred local of the output.connectionSequenceNumber- is the unique number of this connection within the job.os- is the current pipeline specification for this job.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOException
-
fillSet
protected static java.util.Set<java.lang.String> fillSet(java.lang.String input)
-
-