Provide the base classes icat.dumpfile.DumpFileReader and icat.dumpfile.DumpFileWriter that define the API and the logic for reading and writing ICAT data files. The actual work is done in file format specific modules that should provide subclasses that must implement the abstract methods.
Bases: object
Base class for backends that read a data file.
Iterate over the chunks in the data file.
Yield some data object in each iteration. This data object is specific to the implementing backend and should be passed as the data argument to icat.dumpfile.DumpFileReader.getobjs_from_data().
Iterate over the objects in a data chunk.
Yield a new entity object in each iteration. The object is initialized from the data, but not yet created at the client.
Iterate over the objects in the data file.
Yield a new entity object in each iteration. The object is initialized from the data, but not yet created at the client.
| Parameters: | objindex (dict) – cache of previously retrieved objects, used to resolve object relations. See the icat.client.Client.searchUniqueKey() for details. If this is None, an internal cache will be used that is purged at the start of every new data chunk. |
|---|
Bases: object
Base class for backends that write a data file.
Write a header with some meta information to the data file.
Start a new data chunk.
If the current chunk contains any data, write it to the data file.
Add an entity object to the current data chunk.
Finalize the data file.
Write some entity objects to the current data chunk.
The objects are searched from the ICAT server. The key index is used to serialize object relations in the data file. For object types that do not have an appropriate uniqueness constraint in the ICAT schema, a generic key is generated. These objects may only be referenced from the same chunk in the data file.
| Parameters: |
|
|---|
Write a data chunk.
| Parameters: |
|
|---|
A register of all known backends.
Register a backend.
This function should be called by file format specific backends at initialization.
| Parameters: |
|
|---|
Open a data file, either for reading or for writing.
Note that (subclasses of) icat.dumpfile.DumpFileReader and icat.dumpfile.DumpFileWriter may be used as context managers. This function is suitable to be used in the with statement.
>>> with open_dumpfile(client, f, "XML", 'r') as dumpfile:
... for obj in dumpfile.getobjs():
... obj.create()
| Parameters: |
|
|---|---|
| Returns: | an instance of the appropriate class. This is either the reader or the writer class, according to the mode, that has been registered by the backend. |
| Raises ValueError: | |
if the format is not known or if the mode does not start with “r” or “w”. |
|
Data files are partitioned in chunks. This is done to avoid having the whole file, e.g. the complete inventory of the ICAT, at once in memory. The problem is that objects contain references to other objects (e.g. Datafiles refer to Datasets, the latter refer to Investigations, and so forth). We keep an index of the objects in order to resolve these references. But there is a memory versus time tradeoff: we cannot keep all the objects in the index, that would again mean the complete inventory of the ICAT. And we can’t know beforehand which object is going to be referenced later on, so we don’t know which one to keep and which one to discard from the index. Fortunately we can query objects we discarded once back from the ICAT server with icat.client.Client.searchUniqueKey(). But this is expensive. So the strategy is as follows: keep all objects from the current chunk in the index and discard the complete index each time a chunk has been processed. This will work fine if objects are mostly referencing other objects from the same chunk and only a few references go across chunk boundaries.
Therefore, we want these chunks to be small enough to fit into memory, but at the same time large enough to keep as many relations between objects as possible local in a chunk. It is in the responsibility of the writer of the data file to create the chunks in this manner.
The objects that get written to the data file and how this file is organized is controlled by lists of ICAT search expressions, see icat.dumpfile.DumpFileWriter.writeobjs(). There is some degree of flexibility: an object may include related objects in an one-to-many relation, just by including them in the search expression. In this case, these related objects should not have a search expression on their own again. For instance, the search expression for Grouping may include UserGroup. The UserGroups will then be embedded in their respective grouping in the data file. There should not be a search expression for UserGroup then.
Objects related in a many-to-one relation must always be included in the search expression. This is also true if the object is indirectly related to one of the included objects. In this case, only a reference to the related object will be included in the data file. The related object must have its own list entry.