-
Notifications
You must be signed in to change notification settings - Fork 269
Description
This is a copy of a thread I started on the NeuralEnsemble mailing list (https://groups.google.com/forum/#!topic/neuralensemble/NYpXcjlTzFE)
I'm rewriting some parts of the NixIO to make them more efficient. One thing that's very inefficient in the current IO is the function that resolves naming conflicts between objects. While rethinking the functionality, I've realised that there is some ambiguity in the way writing works, in particular when rewriting objects. I'm starting this thread to request comments on how people expect the IO to work and what should be considered the right way of doing rewrites.
First, some background
The NIX data format requires unique names between objects of the same type. Due to the mapping between Neo and NIX object types, object names in the following groups must be unique:
- AnalogSignal, IrregularlySampledSignal: Map to DataArray
- Event, Epoch, Spiketrain: Map to MultiTag
- ChannelIndex, Unit: Map to Source
Another important restriction of the NIX format is that object names are required (can't be empty string or None). I have noticed that it is not uncommon for objects, especially container types like Block and Segment, to be anonymous in Neo, so the NixIO automatically generates names for anonymous objects during writing.
Current state of NIX writer
Currently the NixIO has a method called 'resolve_name_conflicts' which accepts a list of objects and, as the name suggests, goes through and finds anonymous objects and objects with conflicting names and resolves the issues. Anonymous objects get a name based on their type and the number of objects of that type (e.g., neo.Block-2, neo.Block-3, etc) and conflicting names get a number appended to them.
Issues with the current state
There are two things that I don't like about the current function:
- It renames the user's objects. This could cause issues for users who attach meaning to the names of objects but more generally, I believe it's wrong for a "save" function to in-place modify the data it's been given. While it is possibly unavoidable to keep conflicting or invalid names after a user saves and later loads the data (more on this later), we can certainly avoid editing the objects that the user has passed to the writer and may still be working on after a successful save (during the same session or script).
- It's rather slow. Checking for conflicts can slow down writing and it's especially bad when there are no conflicts to resolve. An on-demand renaming would be preferable (I'm working on this now).
The bigger issue
The issues I mention above can be resolved in many ways, but each solution makes some assumption about how rewrites should work. This is where I could use some feedback. When writing an object to a file that already contains an object with the same name, should the new object overwrite the old, or should it be renamed and stored alongside? If we overwrite, how do we handle repeated writes of anonymous objects?
Consider the following example script:
nixfile = NixIO(filename, "rw")
block = neo.Block("datablock") # a simple block
nixfile.write_block(block)
block.name = "datablock-B" # same block renamed
nixfile.write_block(block)
print(len(nixfile.read_all_blocks()))
nixfile.close()
nixfile = NixIO(filename, "rw")
nixfile.write_block(block) # same block written again
print(len(nixfile.read_all_blocks()))
nixfile.close()
nixfile = NixIO(filename, "rw")
block = neo.Block("datablock-B", # new block, reused name
description="a new datablock with an old name")
nixfile.write_block(block)
print(len(nixfile.read_all_blocks()))
nixfile.close()
nixfile = NixIO(filename, "rw")
blks = [neo.Block() for _ in range(10)] # 10 anonymous blocks
nixfile.write_all_blocks(blks)
print(len(nixfile.read_all_blocks()))
nixfile.write_all_blocks(blks) # write them again without changing anything
print(len(nixfile.read_all_blocks()))
blks = [neo.Block() for _ in range(10)] # 10 new anonymous blocks
nixfile.write_all_blocks(blks)
print(len(nixfile.read_all_blocks()))
nixfile.close()
nixfile = NixIO(filename, "rw")
for _ in range(10): # write 10 anonymous blocks, one at a time
nixfile.write_block(neo.Block())
print(len(nixfile.read_all_blocks()))
nixfile.close()
nixfile = NixIO(filename, "rw")
for idx in range(10): # write 10 named blocks, one at a time
nixfile.write_block(neo.Block(name=str(idx)))
print(len(nixfile.read_all_blocks()))
nixfile.close()What should the number of blocks be each time we print the result of 'read_all_blocks'?
I don't know if this is comparable to how other IOs handle rewrites. This issue begins with the naming of objects, but it goes beyond that. For instance, the naming issue can be circumvented entirely by giving each object stored in NIX a unique identifier for a name and just storing the Neo object name as an attribute. This would preserve the name for loading and even keep anonymous objects anonymous after a save-load cycle.
I'm curious to hear from Neo devs, IO devs, and users. What should happen when you do a "write" multiple times on the same object? What about if you load data, modify it, and then write it within a single session?
I can probably work with any (reasonable) expected behaviour, but while thinking about all the alternatives I realised that I was assuming too much.