Generic Key-Value File. More...

#include <rkeyvaluefile.h>

Inheritance diagram for RKeyValueFile< K >:
[legend]
Collaboration diagram for RKeyValueFile< K >:
[legend]

Public Member Functions

 RKeyValueFile (const RURI &uri, size_t blocksize, size_t nbcaches, size_t tolerance)
 
void Open (void)
 
virtual void Close (void)
 
void Clear (void)
 
size_t GetNbBlocks (void) const
 
CacheType GetCacheType (void) const
 
void SetCacheType (CacheType type)
 
void Flush (void)
 
void Write (const char *buffer, size_t nb)
 
template<class I , bool bOrder>
void Write (size_t &blockid, K &key, const RNumContainer< I, bOrder > &vec)
 
template<class C , bool bAlloc, bool bOrder>
void Write (size_t &blockid, K &key, const RContainer< C, bAlloc, bOrder > &cont)
 
void Read (char *buffer, size_t nb)
 
template<class I , bool bOrder>
void Read (size_t blockid, K &key, RNumContainer< I, bOrder > &vec)
 
template<class C , bool bAlloc, bool bOrder>
void Read (size_t blockid, K &key, RContainer< C, bAlloc, bOrder > &cont)
 
void EraseRecord (size_t blockid, K &key)
 
void Seek (size_t blockid, K &key)
 
void Seek (size_t &blockid, K &key, size_t size)
 
virtual ~RKeyValueFile (void)
 

Protected Member Functions

virtual void Open (RIO::ModeType mode)
 
 RBlockFile (const RURI &uri, size_t nbcaches)
 
 RBlockFile (const RURI &uri, size_t blocksize, size_t nbcaches)
 
void Open (void)
 
void Clear (void)
 
size_t GetNbBlocks (void) const
 
CacheType GetCacheType (void) const
 
void SetCacheType (CacheType type)
 
void Flush (void)
 
void Read (char *buffer, size_t nb)
 
void Write (const char *buffer, size_t nb)
 
void MoveBlock (size_t blockid, size_t start, size_t end, size_t size)
 
const char * GetPtr (size_t size)
 
void Seek (size_t blockid, size_t pos)
 
void SeekRel (long pos)
 
virtual ~RBlockFile (void)
 
- Protected Member Functions inherited from RIOFile
 RIOFile (void)
 
 RIOFile (const RURI &uri)
 
 RIOFile (RIOFile &file)
 
RURI GetRealName (void) const
 
void Open (const RURI &uri, RIO::ModeType mode)
 
bool IsOpen (void) const
 
size_t Read (char *buffer, size_t nb, bool move=true)
 
void Write (const char *buffer, size_t nb)
 
virtual void SeekRel (off_t pos)
 
virtual void SeekToEnd (void)
 
void Truncate (off_t newsize)
 
bool End (void) const
 
off_t GetSize (void) const
 
off_t GetPos (void) const
 
virtual ~RIOFile (void)
 
- Protected Member Functions inherited from RFile
 RFile (void)
 
 RFile (const RURI &uri)
 
 RFile (const RFile &file)
 
void Open (const RURI &uri, RIO::ModeType mode)
 
int Compare (const RFile &file) const
 
int Compare (const RFile *file) const
 
int Compare (const RString &uri) const
 
const RURIGetURI (void) const
 
void SetURI (const RURI &uri)
 
const RString GetFileName (void) const
 
virtual ~RFile (void)
 

Protected Attributes

RNumContainer< size_t, false > FreeSpaces
 
size_t Tolerance
 
size_t NbRecs
 
- Protected Attributes inherited from RBlockFile
CacheType Type
 
size_t BlockSize
 
RContainer< RBlockFileData,
true, true > 
Cache
 
RBlockFileDataCurrent
 
size_t CurrentPos
 
char * CurrentData
 
size_t NbBlocks
 
- Protected Attributes inherited from RIOFile
bool CanWrite
 
bool CanRead
 
- Protected Attributes inherited from RFile
RIO::ModeType Mode
 
RURI URI
 

Static Protected Attributes

static const size_t cLen =sizeof(size_t)
 
static const size_t cLen2 =2*sizeof(size_t)
 

Private Member Functions

size_t GetIndex (size_t block, K &key, bool &find)
 
void MoveRecords (size_t blockid, size_t entry, long rel)
 
void ModifyFreeSpace (size_t blockid, long rel)
 
void NewRecord (size_t blockid, K &key, size_t entry, size_t size)
 
virtual void Seek (off_t pos)
 

Additional Inherited Members

- Protected Types inherited from RBlockFile
enum  CacheType { WriteThrough, WriteBack }
 
- Static Protected Member Functions inherited from RFile
static RChar GetDirSeparator (void)
 
static void RemoveFile (const RURI &uri)
 
static void RenameFile (const RURI &olduri, const RURI &newuri)
 
static RURI GetTempFile (void)
 
static bool Exists (const RURI &uri)
 
static bool IsDir (const RURI &uri)
 

Detailed Description

template<class K>
class R::RKeyValueFile< K >

Generic Key-Value File.

The RKeyValueFile class represents a generic file for managing (key,value) pairs (such as an inverted file used by search engines).

The approach is based on Zobel, Moffat, and Sack-Davis (1993):

  • Each file is composed from several blocks.
  • Each block manages a set of pairs. In practice, a block contains an address table (storing for each key the position inside the block of the corresponding value), several values, and some free space.

It is supposed that all keys have the same size (for example an integer or a hash code). Moreover, The size of a value cannot exceed the size of a single block.

Each pair has a given key and a given block number. In practice, a block is composed from:

  1. The number of bytes free in the block (size_t).
  2. The number of records in the block (size_t).
  3. The block table representing for each key the corresponding address (key size,size_t).
  4. Some free spaces.
  5. The values at the different addresses.

The values are stored like a memory heap: starting from the end, new values have decreasing internal addresses.

The methods RKeyValueFile<K>::Seek are used to position the block to a specific value corresponding to a block number and a key. The basic RKeyValueFile<K>::Read and RKeyValueFile<K>::Write methods can then be used to read or write data.

The key must be managed through a class that must be given as parameter of this template. This class must define several methods:

class cKey
{
public:
cKey(const cKey& key); // Copy constructor.
size_t GetSize(void) const; // Size of a key.
int Compare(const char* data) const; // Compare method.
void Read(R::RKeyValueFile<cKey>& file); // Read the key from the current position.
void Write(R::RKeyValueFile<cKey>& file); // Write the key to the current position.
R::RString GetKey(void) const; // Build a string representing the key.
cKey& operator=(const cKey &src) // Assignment operator.
};

Look at RIntKey and RIntsKey for examples of implementation.

The class provides high level methods to manage RVectorInt objects:

RKeyValueFile<RIntKey> Test("/home/pfrancq/test.bin",10,2,1); // 10 Kb per blocks, 2 blocks in cache, tolerance of 1 Kb.
Test.Open();
// Write Vec
RVectorInt<size_t,false> Vec(30);
Vec.Insert(3); Vec.Insert(6); Vec.Insert(7); Vec.Insert(10); Vec.Insert(11);
size_t BlockNb(0); // No block is currently assign
RIntKey Key(300);
Test.Write(BlockNb,Key,Vec);
// Read Vec2
RVectorInt<size_t,false> Vec2(30);
Test.Read(BlockNb,Key,Vec2);
for(Vec2.Start();!Vec2.End();Vec2.Next())
cout<<"Read "<<Vec2()<<endl;

The class provides also high level methods to manage containers. The class of objects contained must provide three methods :

  • A static GetSizeT method returning the size needed to store an object.
  • A static Load method returning a pointer to the object to insert after loading it.
  • A Write method to store an object.
class MyClass
{
public:
size_t Id;
double Nb;
MyClass(size_t id,double nb) : Id(id), Nb(nb) {}
int Compare(const MyClass& c) const {return(CompareIds(Id,c.Id));}
static size_t GetSizeT(void) {return(sizeof(size_t)+sizeof(double));}
template<class K>void Write(RKeyValueFile<K>& file) const
{
file.Write((char*)&Id,sizeof(size_t));
file.Write((char*)&Nb,sizeof(double));
}
* template<class K> static MyClass* Read(RKeyValueFile<K>& file)
{
size_t id;
double nb;
file.Read((char*)&id,sizeof(size_t));
file.Read((char*)&nb,sizeof(double));
return(new MyClass(id,nb));
}
};
RIndexFile<RIntKey> Test("/home/pfrancq/test.bin",10,2,1); // 10 Kb per blocks, 2 blocks in cache, tolerance of 1 Kb.
Test.Open();
// Write Cont
RContainer<MyClass,true,true> Cont(30);
Cont.InsertPtr(new MyClass(2,23.0)); Cont.InsertPtr(new MyClass(1,2.0));
size_t BlockNb(0); // Start without any block
RIntKey Key(300);
Test.Write(BlockNb,Key,Cont);
// Read Cont2
RContainer<MyClass,true,true> Cont2(30);
Test.Read(BlockNb,Key,Cont2);
RCursor<MyClass> Cur(Cont2);
for(Cur.Start();!Cur.End();Cur.Next())
cout<<"Read "<<Cur()->Id<<" "<<Cur()->Nb<<endl;
Template Parameters
KClass corresponding the keys.

Constructor & Destructor Documentation

RKeyValueFile ( const RURI uri,
size_t  blocksize,
size_t  nbcaches,
size_t  tolerance 
)

Construct an index file.

Parameters
uriURI of the file.
blocksizeSize of a block (in KBytes).
nbcachesNumber of blocks managed in memory.
toleranceFix the size of the tolerance, i.e. minimal free size to add a new index (in KBytes).
virtual ~RKeyValueFile ( void  )
virtual

Destruct the file.

Member Function Documentation

virtual void Open ( RIO::ModeType  mode)
protectedvirtual

Open the file.

Parameters
modeThe open mode for the file.

Reimplemented from RBlockFile.

void Open ( void  )

Open the file in RIO::ReadWrite mode (the only one acceptable).

virtual void Close ( void  )
virtual

Close the file.

Reimplemented from RBlockFile.

void Clear ( void  )

Clear the file.

size_t GetNbBlocks ( void  ) const

Get the number of blocks.

CacheType GetCacheType ( void  ) const

Get the type of the cache.

void SetCacheType ( CacheType  type)

Set the type of the cache. If the type is set to WriteThrough, all the changed caches are saved.

Parameters
typeType.
void Flush ( void  )

Flush the caches : All the blocks in memory that are dirtied are save on disk.

size_t GetIndex ( size_t  block,
K &  key,
bool &  find 
)
private

Get the index of the a identifier in the current block address table.

Parameters
blockBlock number.
idIdentifier.
findWas it found ?
Returns
Position in the block address table.
void MoveRecords ( size_t  blockid,
size_t  entry,
long  rel 
)
private

Move all the records of all entries in the table.

Parameters
blockidIdentifier of the block.
entryFirst entry to move.
relNumber of bytes to increase or decrease the record addresses (increasing the size of record implies decreasing the record addresses).
void ModifyFreeSpace ( size_t  blockid,
long  rel 
)
private

Modify the free space associated with a given block.

Parameters
blockidIdentifier of the block.
relNumber of bytes to increase or decrease.
void NewRecord ( size_t  blockid,
K &  key,
size_t  entry,
size_t  size 
)
private

A new record of a given size is added to the block. Set the block to the good position to write the content.

Parameters
blockidIdentifier of the block.
indexidIdentifier of the index.
entryEntry in the index table.
sizeSize of the record.
virtual void Seek ( off_t  pos)
privatevirtual

Go to a specific position of the file. In fact this method should never be called and generates an error.

Parameters
posPosition to reach.

Reimplemented from RIOFile.

void Write ( const char *  buffer,
size_t  nb 
)

Write a number of bytes of a buffer in the current position of the file. The Seek(size_t&,size_t,size_t) must be called before to ensure the internal integrity of the file.

Parameters
bufferBuffer.
nbNumber of bytes to write.
void Write ( size_t &  blockid,
K &  key,
const RNumContainer< I, bOrder > &  vec 
)

Write a vector of integers associated to a given index in a given block.

Template Parameters
IType of the numbers contained.
bOrderDetermine is the container to write is ordered.
Parameters
blockidIdentifier of the block.
indexidIdentifier of the index.
vecVector to write.
void Write ( size_t &  blockid,
K &  key,
const RContainer< C, bAlloc, bOrder > &  cont 
)

Write a RContainer of associated to a given index in a given block.

Template Parameters
CClass of the object contained in the container.
bAllocThe container to write is responsible of the deallocation.
bOrderThe container to write is ordered.
Parameters
blockidIdentifier of the block.
indexidIdentifier of the index.
contContainer to write.
void Read ( char *  buffer,
size_t  nb 
)

Read a given number of bytes at the current position of the file.

Parameters
bufferBuffer (must be allocated).
nbNumber of bytes to read.
void Read ( size_t  blockid,
K &  key,
RNumContainer< I, bOrder > &  vec 
)

Read a vector of integers associated to a given index in a given block.

Template Parameters
IType of the numbers contained.
bOrderDetermine is the container to read is ordered.
Parameters
blockidIdentifier of the block.
indexidIdentifier of the index.
vecRead to write.
void Read ( size_t  blockid,
K &  key,
RContainer< C, bAlloc, bOrder > &  cont 
)

Read a RContainer of associated to a given index in a given block.

Template Parameters
CClass of the object contained in the container.
bAllocThe container to read is responsible of the deallocation.
bOrderThe container to read is ordered.
Parameters
blockidIdentifier of the block.
indexidIdentifier of the index.
contContainer to Read.
void EraseRecord ( size_t  blockid,
K &  key 
)

Erase a given record from the block file.

Parameters
blockidBlock number.
indexidIdentifier of the index.
void Seek ( size_t  blockid,
K &  key 
)

Go to a specific position of the file.

Parameters
blockidIdentifier of the block.
indexidIdentifier of the index.
void Seek ( size_t &  blockid,
K &  key,
size_t  size 
)

Go to a specific position of the file to write a given number of bytes associated with a given index. This method must be called before any call to Write(const char*,size_t).

If the identifier of the block is null, the method find a new block that can contain the given number of bytes. If the index exist but the number of bytes asked is different, internal moves are done to create the necessary space. If the block is full, a new one is searched.

Warning
This method supposes that the information of the index will be update after the call. In particular, if the amount of bytes needed implies to write in another block, the old content is lost.
Parameters
blockidIdentifier of the block. If null, a new block is searched and the identifier is updated.
indexidIdentifier of the index.
sizeNumber of bytes to read/write.

Field Documentation

const size_t cLen =sizeof(size_t)
staticprotected

Basic length to store positions, number of records and free spaces.

const size_t cLen2 =2*sizeof(size_t)
staticprotected

Double of the length to store positions, number of records and free spaces. In practice, it corresponds to the size of the "header" of a block.

RNumContainer<size_t,false> FreeSpaces
protected

Free spaces in each block.

size_t Tolerance
protected

Tolerance.

size_t NbRecs
protected

Number of records in the current block.