Class GTSConnector

  • All Implemented Interfaces:
    org.apache.manifoldcf.agents.interfaces.IOutputConnector, org.apache.manifoldcf.agents.interfaces.IPipelineConnector, org.apache.manifoldcf.core.interfaces.IConnector

    public class GTSConnector
    extends org.apache.manifoldcf.agents.output.BaseOutputConnector
    This is the output connector for the MetaCarta appliance. It establishes a notion of collection(s) a document is ingested into, as well as the idea of a document template for the output.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      protected static class  GTSConnector.ReaderListener
      Reader listener object that extracts the app name
    • Constructor Summary

      Constructors 
      Constructor Description
      GTSConnector()
      Constructor.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int addOrReplaceDocumentWithException​(java.lang.String documentURI, org.apache.manifoldcf.core.interfaces.VersionContext pipelineDescription, org.apache.manifoldcf.agents.interfaces.RepositoryDocument document, java.lang.String authorityNameString, org.apache.manifoldcf.agents.interfaces.IOutputAddActivity activities)
      Add (or replace) a document in the output data store using the connector.
      java.lang.String check()
      Test the connection.
      boolean checkDocumentIndexable​(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.io.File localFile, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities)
      Pre-determine whether a document (passed here as a File object) is indexable by this connector.
      boolean checkMimeTypeIndexable​(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.lang.String mimeType, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities)
      Detect if a mime type is indexable or not.
      void connect​(org.apache.manifoldcf.core.interfaces.ConfigParams configParameters)
      Connect.
      void disconnect()
      Close the connection.
      protected static int fingerprint​(java.io.File file)
      Fingerprint a file! Pass in the name of the (local) temporary file that we should be looking at.
      java.lang.String[] getActivitiesList()
      Return the list of activities that this connector supports (i.e.
      protected static java.lang.String getAppName​(java.io.File documentPath)
      Get a binary document's APPNAME field, or return null if the document does not seem to be an OLE compound document.
      org.apache.manifoldcf.core.interfaces.VersionContext getPipelineDescription​(org.apache.manifoldcf.core.interfaces.Specification spec)
      Get an output version string, given an output specification.
      protected void getSession()
      Set up a session
      protected static java.lang.String hexprint​(byte x)  
      protected static boolean isStrange​(byte x)
      Check if character is not typical ASCII.
      protected static boolean isText​(byte[] beginChunk, int chunkLength)
      Test to see if a document is text or not.
      protected static boolean isWhiteSpace​(byte x)
      Check if a byte is a whitespace character.
      protected static char nibbleprint​(int x)  
      void outputConfigurationBody​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters, java.lang.String tabName)
      Output the configuration body section.
      void outputConfigurationHeader​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters, java.util.List<java.lang.String> tabsArray)
      Output the configuration header section.
      void outputSpecificationBody​(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification os, int connectionSequenceNumber, int actualSequenceNumber, java.lang.String tabName)
      Output the specification body section.
      void outputSpecificationHeader​(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification os, int connectionSequenceNumber, java.util.List<java.lang.String> tabsArray)
      Output the specification header section.
      java.lang.String processConfigurationPost​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IPostParameters variableContext, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
      Process a configuration post.
      java.lang.String processSpecificationPost​(org.apache.manifoldcf.core.interfaces.IPostParameters variableContext, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification os, int connectionSequenceNumber)
      Process a specification post.
      protected static int recognizeApp​(java.lang.String appName)
      Translate a string application name to one of the kinds of documents we care about.
      void removeDocument​(java.lang.String documentURI, java.lang.String outputDescription, org.apache.manifoldcf.agents.interfaces.IOutputRemoveActivity activities)
      Remove a document using the connector.
      void viewConfiguration​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
      View configuration.
      void viewSpecification​(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification os, int connectionSequenceNumber)
      View specification.
      • Methods inherited from class org.apache.manifoldcf.agents.output.BaseOutputConnector

        checkDateIndexable, checkLengthIndexable, checkURLIndexable, getFormCheckJavascriptMethodName, getFormPresaveCheckJavascriptMethodName, noteAllRecordsRemoved, noteJobComplete, requestInfo
      • Methods inherited from class org.apache.manifoldcf.core.connector.BaseConnector

        clearThreadContext, deinstall, getConfiguration, install, isConnected, outputConfigurationBody, outputConfigurationHeader, outputConfigurationHeader, pack, packFixedList, packList, packList, poll, processConfigurationPost, setThreadContext, unpack, unpackFixedList, unpackList, viewConfiguration
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface org.apache.manifoldcf.core.interfaces.IConnector

        clearThreadContext, deinstall, getConfiguration, install, isConnected, poll, setThreadContext
    • Constructor Detail

      • GTSConnector

        public GTSConnector()
        Constructor.
    • Method Detail

      • getActivitiesList

        public java.lang.String[] getActivitiesList()
        Return the list of activities that this connector supports (i.e. writes into the log).
        Specified by:
        getActivitiesList in interface org.apache.manifoldcf.agents.interfaces.IOutputConnector
        Overrides:
        getActivitiesList in class org.apache.manifoldcf.agents.output.BaseOutputConnector
        Returns:
        the list.
      • connect

        public void connect​(org.apache.manifoldcf.core.interfaces.ConfigParams configParameters)
        Connect.
        Specified by:
        connect in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        connect in class org.apache.manifoldcf.core.connector.BaseConnector
        Parameters:
        configParameters - is the set of configuration parameters, which in this case describe the target appliance, basic auth configuration, etc. (This formerly came out of the ini file.)
      • disconnect

        public void disconnect()
                        throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Close the connection. Call this before discarding the connection.
        Specified by:
        disconnect in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        disconnect in class org.apache.manifoldcf.core.connector.BaseConnector
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • getSession

        protected void getSession()
                           throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Set up a session
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • check

        public java.lang.String check()
                               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Test the connection. Returns a string describing the connection integrity.
        Specified by:
        check in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        check in class org.apache.manifoldcf.core.connector.BaseConnector
        Returns:
        the connection's status as a displayable string.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • checkMimeTypeIndexable

        public boolean checkMimeTypeIndexable​(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription,
                                              java.lang.String mimeType,
                                              org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities)
                                       throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                              org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Detect if a mime type is indexable or not. This method is used by participating repository connectors to pre-filter the number of unusable documents that will be passed to this output connector.
        Specified by:
        checkMimeTypeIndexable in interface org.apache.manifoldcf.agents.interfaces.IPipelineConnector
        Overrides:
        checkMimeTypeIndexable in class org.apache.manifoldcf.agents.output.BaseOutputConnector
        Parameters:
        mimeType - is the mime type of the document.
        Returns:
        true if the mime type is indexable by this connector.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • checkDocumentIndexable

        public boolean checkDocumentIndexable​(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription,
                                              java.io.File localFile,
                                              org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities)
                                       throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                              org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Pre-determine whether a document (passed here as a File object) is indexable by this connector. This method is used by participating repository connectors to help reduce the number of unmanageable documents that are passed to this output connector in advance of an actual transfer. This hook is provided mainly to support search engines that only handle a small set of accepted file types.
        Specified by:
        checkDocumentIndexable in interface org.apache.manifoldcf.agents.interfaces.IPipelineConnector
        Overrides:
        checkDocumentIndexable in class org.apache.manifoldcf.agents.output.BaseOutputConnector
        Parameters:
        localFile - is the local file to check.
        Returns:
        true if the file is indexable.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getPipelineDescription

        public org.apache.manifoldcf.core.interfaces.VersionContext getPipelineDescription​(org.apache.manifoldcf.core.interfaces.Specification spec)
                                                                                    throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                                                           org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Get an output version string, given an output specification. The output version string is used to uniquely describe the pertinent details of the output specification and the configuration, to allow the Connector Framework to determine whether a document will need to be output again. Note that the contents of the document cannot be considered by this method, and that a different version string (defined in IRepositoryConnector) is used to describe the version of the actual document. This method presumes that the connector object has been configured, and it is thus able to communicate with the output data store should that be necessary.
        Specified by:
        getPipelineDescription in interface org.apache.manifoldcf.agents.interfaces.IPipelineConnector
        Overrides:
        getPipelineDescription in class org.apache.manifoldcf.agents.output.BaseOutputConnector
        Parameters:
        spec - is the current output specification for the job that is doing the crawling.
        Returns:
        a string, of unlimited length, which uniquely describes output configuration and specification in such a way that if two such strings are equal, the document will not need to be sent again to the output data store.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • addOrReplaceDocumentWithException

        public int addOrReplaceDocumentWithException​(java.lang.String documentURI,
                                                     org.apache.manifoldcf.core.interfaces.VersionContext pipelineDescription,
                                                     org.apache.manifoldcf.agents.interfaces.RepositoryDocument document,
                                                     java.lang.String authorityNameString,
                                                     org.apache.manifoldcf.agents.interfaces.IOutputAddActivity activities)
                                              throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                     org.apache.manifoldcf.agents.interfaces.ServiceInterruption,
                                                     java.io.IOException
        Add (or replace) a document in the output data store using the connector. This method presumes that the connector object has been configured, and it is thus able to communicate with the output data store should that be necessary.
        Specified by:
        addOrReplaceDocumentWithException in interface org.apache.manifoldcf.agents.interfaces.IPipelineConnector
        Overrides:
        addOrReplaceDocumentWithException in class org.apache.manifoldcf.agents.output.BaseOutputConnector
        Parameters:
        documentURI - is the URI of the document. The URI is presumed to be the unique identifier which the output data store will use to process and serve the document. This URI is constructed by the repository connector which fetches the document, and is thus universal across all output connectors.
        pipelineDescription - includes the description string that was constructed for this document by the getOutputDescription() method.
        document - is the document data to be processed (handed to the output data store).
        authorityNameString - is the name of the authority responsible for authorizing any access tokens passed in with the repository document. May be null.
        activities - is the handle to an object that the implementer of a pipeline connector may use to perform operations, such as logging processing activity, or sending a modified document to the next stage in the pipeline.
        Returns:
        the document status (accepted or permanently rejected).
        Throws:
        java.io.IOException - only if there's a stream error reading the document data.
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • removeDocument

        public void removeDocument​(java.lang.String documentURI,
                                   java.lang.String outputDescription,
                                   org.apache.manifoldcf.agents.interfaces.IOutputRemoveActivity activities)
                            throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                   org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Remove a document using the connector. Note that the last outputDescription is included, since it may be necessary for the connector to use such information to know how to properly remove the document.
        Specified by:
        removeDocument in interface org.apache.manifoldcf.agents.interfaces.IOutputConnector
        Overrides:
        removeDocument in class org.apache.manifoldcf.agents.output.BaseOutputConnector
        Parameters:
        documentURI - is the URI of the document. The URI is presumed to be the unique identifier which the output data store will use to process and serve the document. This URI is constructed by the repository connector which fetches the document, and is thus universal across all output connectors.
        outputDescription - is the last description string that was constructed for this document by the getOutputDescription() method above.
        activities - is the handle to an object that the implementer of an output connector may use to perform operations, such as logging processing activity.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • outputConfigurationHeader

        public void outputConfigurationHeader​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
                                              org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                              java.util.Locale locale,
                                              org.apache.manifoldcf.core.interfaces.ConfigParams parameters,
                                              java.util.List<java.lang.String> tabsArray)
                                       throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                              java.io.IOException
        Output the configuration header section. This method is called in the head section of the connector's configuration page. Its purpose is to add the required tabs to the list, and to output any javascript methods that might be needed by the configuration editing HTML.
        Specified by:
        outputConfigurationHeader in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        outputConfigurationHeader in class org.apache.manifoldcf.core.connector.BaseConnector
        Parameters:
        threadContext - is the local thread context.
        out - is the output to which any HTML should be sent.
        parameters - are the configuration parameters, as they currently exist, for this connection being configured.
        tabsArray - is an array of tab names. Add to this array any tab names that are specific to the connector.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        java.io.IOException
      • outputConfigurationBody

        public void outputConfigurationBody​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
                                            org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                            java.util.Locale locale,
                                            org.apache.manifoldcf.core.interfaces.ConfigParams parameters,
                                            java.lang.String tabName)
                                     throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                            java.io.IOException
        Output the configuration body section. This method is called in the body section of the connector's configuration page. Its purpose is to present the required form elements for editing. The coder can presume that the HTML that is output from this configuration will be within appropriate <html>, <body>, and <form> tags. The name of the form is "editconnection".
        Specified by:
        outputConfigurationBody in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        outputConfigurationBody in class org.apache.manifoldcf.core.connector.BaseConnector
        Parameters:
        threadContext - is the local thread context.
        out - is the output to which any HTML should be sent.
        parameters - are the configuration parameters, as they currently exist, for this connection being configured.
        tabName - is the current tab name.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        java.io.IOException
      • processConfigurationPost

        public java.lang.String processConfigurationPost​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
                                                         org.apache.manifoldcf.core.interfaces.IPostParameters variableContext,
                                                         java.util.Locale locale,
                                                         org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
                                                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Process a configuration post. This method is called at the start of the connector's configuration page, whenever there is a possibility that form data for a connection has been posted. Its purpose is to gather form information and modify the configuration parameters accordingly. The name of the posted form is "editconnection".
        Specified by:
        processConfigurationPost in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        processConfigurationPost in class org.apache.manifoldcf.core.connector.BaseConnector
        Parameters:
        threadContext - is the local thread context.
        variableContext - is the set of variables available from the post, including binary file post information.
        parameters - are the configuration parameters, as they currently exist, for this connection being configured.
        Returns:
        null if all is well, or a string error message if there is an error that should prevent saving of the connection (and cause a redirection to an error page).
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • viewConfiguration

        public void viewConfiguration​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
                                      org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                      java.util.Locale locale,
                                      org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
                               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                      java.io.IOException
        View configuration. This method is called in the body section of the connector's view configuration page. Its purpose is to present the connection information to the user. The coder can presume that the HTML that is output from this configuration will be within appropriate <html> and <body>tags.
        Specified by:
        viewConfiguration in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        viewConfiguration in class org.apache.manifoldcf.core.connector.BaseConnector
        Parameters:
        threadContext - is the local thread context.
        out - is the output to which any HTML should be sent.
        parameters - are the configuration parameters, as they currently exist, for this connection being configured.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        java.io.IOException
      • outputSpecificationHeader

        public void outputSpecificationHeader​(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                              java.util.Locale locale,
                                              org.apache.manifoldcf.core.interfaces.Specification os,
                                              int connectionSequenceNumber,
                                              java.util.List<java.lang.String> tabsArray)
                                       throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                              java.io.IOException
        Output the specification header section. This method is called in the head section of a job page which has selected a pipeline connection of the current type. Its purpose is to add the required tabs to the list, and to output any javascript methods that might be needed by the job editing HTML.
        Specified by:
        outputSpecificationHeader in interface org.apache.manifoldcf.agents.interfaces.IPipelineConnector
        Overrides:
        outputSpecificationHeader in class org.apache.manifoldcf.agents.output.BaseOutputConnector
        Parameters:
        out - is the output to which any HTML should be sent.
        locale - is the preferred local of the output.
        os - is the current pipeline specification for this connection.
        connectionSequenceNumber - is the unique number of this connection within the job.
        tabsArray - is an array of tab names. Add to this array any tab names that are specific to the connector.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        java.io.IOException
      • outputSpecificationBody

        public void outputSpecificationBody​(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                            java.util.Locale locale,
                                            org.apache.manifoldcf.core.interfaces.Specification os,
                                            int connectionSequenceNumber,
                                            int actualSequenceNumber,
                                            java.lang.String tabName)
                                     throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                            java.io.IOException
        Output the specification body section. This method is called in the body section of a job page which has selected a pipeline connection of the current type. Its purpose is to present the required form elements for editing. The coder can presume that the HTML that is output from this configuration will be within appropriate <html>, <body>, and <form> tags. The name of the form is "editjob".
        Specified by:
        outputSpecificationBody in interface org.apache.manifoldcf.agents.interfaces.IPipelineConnector
        Overrides:
        outputSpecificationBody in class org.apache.manifoldcf.agents.output.BaseOutputConnector
        Parameters:
        out - is the output to which any HTML should be sent.
        locale - is the preferred local of the output.
        os - is the current pipeline specification for this job.
        connectionSequenceNumber - is the unique number of this connection within the job.
        actualSequenceNumber - is the connection within the job that has currently been selected.
        tabName - is the current tab name.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        java.io.IOException
      • processSpecificationPost

        public java.lang.String processSpecificationPost​(org.apache.manifoldcf.core.interfaces.IPostParameters variableContext,
                                                         java.util.Locale locale,
                                                         org.apache.manifoldcf.core.interfaces.Specification os,
                                                         int connectionSequenceNumber)
                                                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Process a specification post. This method is called at the start of job's edit or view page, whenever there is a possibility that form data for a connection has been posted. Its purpose is to gather form information and modify the transformation specification accordingly. The name of the posted form is "editjob".
        Specified by:
        processSpecificationPost in interface org.apache.manifoldcf.agents.interfaces.IPipelineConnector
        Overrides:
        processSpecificationPost in class org.apache.manifoldcf.agents.output.BaseOutputConnector
        Parameters:
        variableContext - contains the post data, including binary file-upload information.
        locale - is the preferred local of the output.
        os - is the current pipeline specification for this job.
        connectionSequenceNumber - is the unique number of this connection within the job.
        Returns:
        null if all is well, or a string error message if there is an error that should prevent saving of the job (and cause a redirection to an error page).
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • viewSpecification

        public void viewSpecification​(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                      java.util.Locale locale,
                                      org.apache.manifoldcf.core.interfaces.Specification os,
                                      int connectionSequenceNumber)
                               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                      java.io.IOException
        View specification. This method is called in the body section of a job's view page. Its purpose is to present the pipeline specification information to the user. The coder can presume that the HTML that is output from this configuration will be within appropriate <html> and <body>tags.
        Specified by:
        viewSpecification in interface org.apache.manifoldcf.agents.interfaces.IPipelineConnector
        Overrides:
        viewSpecification in class org.apache.manifoldcf.agents.output.BaseOutputConnector
        Parameters:
        out - is the output to which any HTML should be sent.
        locale - is the preferred local of the output.
        connectionSequenceNumber - is the unique number of this connection within the job.
        os - is the current pipeline specification for this job.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        java.io.IOException
      • fingerprint

        protected static int fingerprint​(java.io.File file)
                                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Fingerprint a file! Pass in the name of the (local) temporary file that we should be looking at. This method will read it as needed until the file has been identified (or found to remain "unknown"). The code here has been lifted algorithmically from products/ShareCrawler/Fingerprinter.pas.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • getAppName

        protected static java.lang.String getAppName​(java.io.File documentPath)
                                              throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Get a binary document's APPNAME field, or return null if the document does not seem to be an OLE compound document.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • recognizeApp

        protected static int recognizeApp​(java.lang.String appName)
        Translate a string application name to one of the kinds of documents we care about.
      • isText

        protected static boolean isText​(byte[] beginChunk,
                                        int chunkLength)
        Test to see if a document is text or not. The first n bytes are passed in, and this code returns "true" if it thinks they represent text. The code has been lifted algorithmically from products/Sharecrawler/Fingerprinter.pas, which was based on "perldoc -f -T".
      • isStrange

        protected static boolean isStrange​(byte x)
        Check if character is not typical ASCII.
      • isWhiteSpace

        protected static boolean isWhiteSpace​(byte x)
        Check if a byte is a whitespace character.
      • hexprint

        protected static java.lang.String hexprint​(byte x)
      • nibbleprint

        protected static char nibbleprint​(int x)