Class ThrottledFetcher
- java.lang.Object
-
- org.apache.manifoldcf.crawler.connectors.rss.ThrottledFetcher
-
public class ThrottledFetcher extends java.lang.ObjectThis class uses httpclient to fetch stuff from webservers. However, it additionally controls the fetch rate in two ways: first, controlling the overall bandwidth used per server, and second, limiting the number of simultaneous open connections per server. It's also capable of limiting the maximum number of fetches per time period per server as well; however, this functionality is not strictly necessary at this time because the CF scheduler does that at a higher layer. An instance of this class would very probably need to have a lifetime consistent with the long-term nature of these values, and be static. This class sets up a different Http connection pool for each server, so that we can foist off onto the httpclient library the task of limiting the number of connections. This means that we need periodic polling to determine when idle pooled connections can be freed.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static classThrottledFetcher.AbortCheckerThis class furnishes an abort signal whenever the job activity says it should.protected static classThrottledFetcher.ExecuteMethodThreadThis thread does the actual socket communication with the server.protected static classThrottledFetcher.ThrottledConnectionThis class represents an established connection to a URL.protected static classThrottledFetcher.ThrottledInputstreamThis class throttles an input stream based on the specified byte rate parameters.
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String_rcsidprotected static intglobalHandleCountThis counter keeps track of the total outstanding handles across everything, because we do try to control thatprotected static java.lang.IntegerglobalHandleCounterLockThis is the lock object for that global handle counterprotected static intREAD_CHUNK_LENGTHThe read chunk lengthprotected static booleanrecordEverythingThis flag determines whether we record everything to the disk, as a means of doing a web snapshotprotected intrefCountReference count for how many connections to this pool there areprotected java.util.Map<java.lang.String,org.apache.manifoldcf.connectorcommon.interfaces.IConnectionThrottler>serverMapThis hash maps the server string (without port) to a pool throttling object, where we can track the statistics and make sure we throttle appropriately
-
Constructor Summary
Constructors Constructor Description ThrottledFetcher()Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description IThrottledConnectioncreateConnection(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, java.lang.String throttleGroupName, java.lang.String serverName, int connectionLimit, int connectionTimeoutMilliseconds, java.lang.String proxyHost, int proxyPort, java.lang.String proxyAuthDomain, java.lang.String proxyAuthUsername, java.lang.String proxyAuthPassword, org.apache.manifoldcf.crawler.interfaces.IAbortActivity activities)Establish a connection to a specified URL.voidnoteConnectionEstablished()Note that there is a repository connection that is using this object.voidnoteConnectionReleased()Connection pool no longer needed.voidpoll()Poll.protected static voidregisterGlobalHandle(int maxHandles)Note that we're about to need a handle (and make sure we have enough)protected static voidreleaseGlobalHandle()Note that we're done with a handle (so we can free it)
-
-
-
Field Detail
-
_rcsid
public static final java.lang.String _rcsid
- See Also:
- Constant Field Values
-
recordEverything
protected static final boolean recordEverything
This flag determines whether we record everything to the disk, as a means of doing a web snapshot- See Also:
- Constant Field Values
-
READ_CHUNK_LENGTH
protected static final int READ_CHUNK_LENGTH
The read chunk length- See Also:
- Constant Field Values
-
globalHandleCount
protected static int globalHandleCount
This counter keeps track of the total outstanding handles across everything, because we do try to control that
-
globalHandleCounterLock
protected static java.lang.Integer globalHandleCounterLock
This is the lock object for that global handle counter
-
serverMap
protected final java.util.Map<java.lang.String,org.apache.manifoldcf.connectorcommon.interfaces.IConnectionThrottler> serverMap
This hash maps the server string (without port) to a pool throttling object, where we can track the statistics and make sure we throttle appropriately
-
refCount
protected int refCount
Reference count for how many connections to this pool there are
-
-
Method Detail
-
registerGlobalHandle
protected static void registerGlobalHandle(int maxHandles) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote that we're about to need a handle (and make sure we have enough)- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
releaseGlobalHandle
protected static void releaseGlobalHandle()
Note that we're done with a handle (so we can free it)
-
createConnection
public IThrottledConnection createConnection(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, java.lang.String throttleGroupName, java.lang.String serverName, int connectionLimit, int connectionTimeoutMilliseconds, java.lang.String proxyHost, int proxyPort, java.lang.String proxyAuthDomain, java.lang.String proxyAuthUsername, java.lang.String proxyAuthPassword, org.apache.manifoldcf.crawler.interfaces.IAbortActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Establish a connection to a specified URL.- Parameters:
serverName- is the FQDN of the server, e.g. foo.metacarta.comconnectionLimit- is the maximum desired outstanding connections at any one time.connectionTimeoutMilliseconds- is the number of milliseconds to wait for the connection before timing out.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
poll
public void poll() throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionPoll. This method is designed to allow idle connections to be closed and freed.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteConnectionEstablished
public void noteConnectionEstablished()
Note that there is a repository connection that is using this object.
-
noteConnectionReleased
public void noteConnectionReleased()
Connection pool no longer needed. Call this to indicate that this object no longer needs to keep its pools available, for the moment.
-
-