Difference between revisions of "Libmu dbm"

From Mailutils
Jump to navigationJump to search
Line 402: Line 402:
Once done, it becomes avaiable for use by the API.
Once done, it becomes avaiable for use by the API.
<div id="mu_dbm_impl_iterator"></div>
Registered implementations can be accessed via an [[iterator]] returned by the <tt>mu_dbm_impl_iterator</tt> function:
Registered implementations can be accessed via an [[iterator]] returned by the <tt>mu_dbm_impl_iterator</tt> function:

Revision as of 12:50, 28 October 2011

The library libmu_dbm provides a uniform and consistent interface to various DBM-like database managers. Presently (as of version 2.99.93), the following backend libraries are supported:

Note: The Tokyo Cabinet support was available in previous versions. It is not yet re-implemented via libmu_dbm, but it will probably be in the future.

Header Files and Libraries

The data types and function prototypes of libmu_dbm are declared in mailutils/dbm.h[1]. Internal interfaces, which are of interest for implementors of new backends, are declared in mailutils/sys/dbm.h[2].

File loader arguments for linking programs with libmu_dbm can be obtained running:

 mu ldflags dbm

Data Types

A database file is represented by the mu_dbm_file_t type. It is an opaque type, declared as

  typedef struct _mu_dbm_file *mu_dbm_file_t;

A datum is a piece of information stored in the database. It is declared as:

  struct mu_dbm_datum
    char *mu_dptr;               /* Data pointer */
    size_t mu_dsize;             /* Data size */
    void *mu_data;               /* Implementation-dependent data */
    struct mu_dbm_impl *mu_sys;  /* Pointer to implementation */

Of all its fields, only mu_dptr and mu_dsize are of interest for application developers. The mu_dptr points to the actual data and mu_dsize keeps the number of bytes pointed to by mu_dptr.

When initializing an object of struct mu_dbm_datum type it is important to fill all the rest of its fields with zeros. The usual initialization sequence is therefore:

  struct mu_dbm_datum key;

  memset (&key, 0, sizeof key);
  key.mu_dptr = data;
  key.mu_dsize = size;

Global Variables

  mu_url_t mu_dbm_hint;

This variable keeps an URL hint which is used to supply missing URL parts for mu_dbm_create_from_url and mu_dbm_create calls (see next section). By default it contains only a scheme part pointing to the first available DBM implementation.

This variable is NULL before any of the following functions is called: mu_dbm_init, mu_dbm_get_hint, mu_dbm_register, mu_dbm_impl_iterator, mu_dbm_create_from_url or mu_dbm_create.

Therefore it is not recommended to use this variable directly. The preferred way is to use the the mu_dbm_get_hint function instead. This function makes sure the variable is initialized before returning it.

Creating a Database Object

A variable of type mu_dbm_file_t is created using the following functions:

  int mu_dbm_create_from_url (mu_url_t url, mu_dbm_file_t *db);
  int mu_dbm_create (char *name, mu_dbm_file_t *db);

The former function creates a database object from an URL supplied as its first argument. The latter one creates a database object from a file name or a URL in string form. On success both functions store a pointer to the initialized struct _mu_dbm_file in the memory location pointed to by the db parameter and return 0. On error, they return a Mailutils error code and do not touch db.

Safety Checking

The following function verifies whether the database file is safe to use:

  int mu_dbm_safety_check (mu_dbm_file_t db);

It returns 0 if the file is OK, or an error code describing the problem otherwise.

There are two ways to set safety criteria. First, they can be configured using the URL parameters when calling mu_dbm_create_from_url or mu_dbm_create. Second, they can be set using the following functions:

  int mu_dbm_safety_set_flags (mu_dbm_file_t db, int flags);
  int mu_dbm_safety_set_owner (mu_dbm_file_t db, uid_t uid);

The mu_dbm_safety_set_flags functions defines the set of safety criteria for this file. The flags argument has the same meaning as the mode argument to mu_file_safety_check. If it contains the MU_FILE_SAFETY_OWNER_MISMATCH bit, the file owner UID can be set using the mu_dbm_safety_set_owner function.

Whatever way was used to set them, the current safety criteria can be queried using the following two functions:

  int mu_dbm_safety_get_flags (mu_dbm_file_t db, int *flags);
  int mu_dbm_safety_get_owner (mu_dbm_file_t db, uid_t *uid);

Opening a Database

Once a mu_dbm_file_t is created, it can be opened:

  int mu_dbm_open (mu_dbm_file_t db, int flags, int mode);

The function mu_dbm_open opens a database described by db. The flags argument specifies how to open the database. The valid values (declared in mailutils/stream.h[3]) are:

Open database for reading only. If the database file does not exist, an error is returned.
Open database for reading and writing. If the database file does not exist, it is created.
Create an empty database and open it for reading and writing. If the database file already exists, all existing records will be removed from it.

The mode argument supplies the file mode for newly created files. Its meaning is the same as in chmod(2) call.

Database Lookups

  int mu_dbm_fetch (mu_dbm_file_t db, struct mu_dbm_datum const *key,
		    struct mu_dbm_datum *ret);

The mu_dbm_fetch functions looks up in the database db a record whose key matches key. If such a record is found, its associated data are stored in ret and 0 is returned. If there is no such key in the database, the function returns MU_ERR_NOENT. If an error occurs, it returns MU_ERR_FAILURE. The textual error description can then be obtained using mu_dbm_strerror.

The memory for data returned in ret is allocated by the library. To reclaim this memory, use the mu_dbm_datum_free function. If the ret parameter points to a datum initialized by a previous call to a libmu_dbm function (such as mu_dbm_fetch, mu_dbm_firstkey, etc.) it will be freed automatically. Therefore it is safe not to free existing datum before passing it as a third argument to mu_dbm_fetch.

It is important to notice, however, that the datum pointed to by ret must always be initialized. This means, in practice, that it should either be returned from another libmu_dbm library call or initialized manually. For example:

 1  struct mu_dbm_datum key, content;
 2  int rc;
 4  /* Initialize search key */
 5  memset (&key, 0, sizeof key);
 6  key.mu_dptr = "user";
 7  key.mu_dsize = 4;
 8  /* Initialize content */
 9  memset (&content, 0, sizeof content);
10  /* Look up the data */
11  rc = mu_dbm_fetch (db, &key, &content);
12  ...
13  /* Find another key: */
14  key.mu_dptr = "hostname";
15  key.mu_dsize = 8;
16  /* It is OK not to re-set content here, because it was initialized
17     by the previous call to mu_dbm_fetch and will be freed automatically. */
18  rc = mu_dbm_fetch (db, &key, &content);
19  ...
20  /* Free the content */
21  mu_dbm_datum_free (&content);

Storing Records

  int mu_dbm_store (mu_dbm_file_t db, struct mu_dbm_datum const *key,
 		    struct mu_dbm_datum const *content, int replace);

The function mu_dbm_store inserts or replaces records in the database.

The parameters key and content supply the record key and associated data. The parameter replace determines behavior of the function if a record with such key already exists in the database. If replace is 1, mu_dbm_store will replace the data part of the existing record with data from content. Otherwise, if replace is 0, the function will leave the record unchanged and return MU_ERR_EXISTS.

The function returns 0 on success and MU_ERR_FAILURE on error. In the latter case the textual description of the error can be obtained using mu_dbm_strerror.

Deleting a Record

  int mu_dbm_delete (mu_dbm_file_t db, struct mu_dbm_datum const *key);

This function deletes from the database a record with the given key. It returns 0 on success, MU_ERR_NOENT, if there is no such key in the database and MU_ERR_FAILURE on error. In the latter case the textual description of the error can be obtained using mu_dbm_strerror.

Sequential Access

The following two functions allow for accessing all items in the database:

  int mu_dbm_firstkey (mu_dbm_file_t db, struct mu_dbm_datum *ret);
  int mu_dbm_nextkey (mu_dbm_file_t db, struct mu_dbm_datum *ret);

The function mu_dbm_firstkey initializes db for sequential access and stores the first key in the datum pointed to by ret. The function returns 0 on success, MU_ERR_NOENT if the database is empty and MU_ERR_FAILURE on error.

The function mu_dbm_nextkey takes a pointer to the datum returned by a prior call to either of the two functions and locates the key immediately following this one. On success, it stores the new key in that same datum and returns 0. It returns MU_ERR_NOENT if all keys in the database has been visited and MU_ERR_FAILURE if a database error occurred.

This function takes care about the memory allocated for ret. This means that it is not necessary to call mu_dbm_datum_free (ret) neither before calling this function nor after it returns MU_ERR_NOENT (although doing so does not constitute an error). The usual sequential access code looks like:

 1  struct mu_dbm_datum key;
 2  int rc;
 4  memset (&key, 0, sizeof key);
 5  for (rc = mu_dbm_firstkey (db, &key); 
 6       rc == 0; 
 7       rc = mu_dbm_nextkey (db, &key))
 8    {
 9      do_something (&key);
10    }
11  if (rc != MU_ERR_NOENT)
12    mu_dbm_datum_free (&key);

Notice, that sequential access does not imply any particular ordering in which the keys will be visited. It only guarantees that each key in the database will be visited exactly once, provided that no database updates take place within the loop. The latter condition is important. You cannot store or delete records while doing sequential access, because doing so can change the order in which keys are returned. In other words, the following is wrong:

  struct mu_dbm_datum key;
  int rc;

  memset (&key, 0, sizeof key);
  for (rc = mu_dbm_firstkey (db, &key); 
       rc == 0; 
       rc = mu_dbm_nextkey (db, &key))
      if (key_matches_something (&key))
        /* This is wrong! */
        mu_dbm_delete (db, &key);

Additional Functions

The function mu_dbm_get_fd can be used to obtain file descriptors of the underlying database:

  int mu_dbm_get_fd (mu_dbm_file_t db, int *pag, int *dir);

The function returns two descriptors, because some libraries (namely, NDBM) keep databases in two files: so called page and directory files. If so, the descriptors of these files are returned in the memory locations pointed to by pag and dir, correspondingly. If the underlying library stores the database in a single file, the descriptor of this file will be stored in both pag and dir. In any case, dir can be NULL if that information is of no interest for the caller. However, pag can not be NULL, otherwise mu_dbm_get_fd will return ENOMEM.

The function mu_dbm_init initializes the library:

  void mu_dbm_init (void);

Normally it is not necessary to call this function because it is called automatically by any of the following functions: mu_dbm_get_hint, mu_dbm_register, mu_dbm_impl_iterator, mu_dbm_create_from_url and mu_dbm_create.

Return Codes

Apart from the return codes discussed above, each libmu_dbm function can return the following:

Not enough memory for the operation.
The db or some other required argument is NULL.
Database is not open. This error can be returned by any function taking mu_dbm_database_t as argument, except the following: mu_dbm_create_from_url, mu_dbm_create, mu_dbm_safety_get_owner, mu_dbm_safety_get_flags, mu_dbm_safety_set_owner, mu_dbm_safety_set_flags, mu_dbm_safety_check.
Function not implemented.

Error Reporting

The functions from libmu_dbm return MU_ERR_FAILURE if an error occurred in the underlying database implementation. A textual description of this error can be obtained using the following call:

  char const *mu_dbm_strerror (mu_dbm_file_t db);

The pointer returned by this function must not be freed or its contents altered by the application.

An example usage:

1   rc = mu_dbm_store (db, &key, &contents, replace);
2   if (rc)
3     mu_error ("cannot store datum: %s",
4	       rc == MU_ERR_FAILURE ? 
5                  mu_dbm_strerror (db) : mu_strerror (rc));

Notice how an appropriate error reporting function is selected in lines 4--5.

Closing the Database

An open database is closed using:

  int mu_dbm_close (mu_dbm_file_t db);

Unless intended for a subsequent re-opening, a closed database should be destroyed via a call to mu_dbm_destroy:

  void mu_dbm_destroy (mu_dbm_file_t *pdb);

This function reclaims the memory used by the mu_dbm_file_t object pointed to by its argument and initializes it to NULL before returning. Calling mu_dbm_destroy on an open database is OK: the function will call mu_dbm_close prior to freeing the memory. In fact, it is seldom necessary to call mu_dbm_close explicitly. Instead, it suffices to call mu_dbm_destroy once the database is no longer necessary.

API for Backend Implementors

The backend data and methods are defined in the following structure:

  struct mu_dbm_impl
    char *_dbm_name;
    int (*_dbm_file_safety) (mu_dbm_file_t db, int mode, uid_t owner);
    int (*_dbm_get_fd) (mu_dbm_file_t db, int *pag, int *dir);
    int (*_dbm_open) (mu_dbm_file_t db, int flags, int mode);
    int (*_dbm_close) (mu_dbm_file_t db);
    int (*_dbm_fetch) (mu_dbm_file_t db, struct mu_dbm_datum const *key,
                       struct mu_dbm_datum *ret);
    int (*_dbm_store) (mu_dbm_file_t db, struct mu_dbm_datum const *key,
                       struct mu_dbm_datum const *contents, int replace);
    int (*_dbm_delete) (mu_dbm_file_t db,
                        struct mu_dbm_datum const *key);
    int (*_dbm_firstkey) (mu_dbm_file_t db, struct mu_dbm_datum *ret);
    int (*_dbm_nextkey) (mu_dbm_file_t db, struct mu_dbm_datum *ret);
    void (*_dbm_datum_free) (struct mu_dbm_datum *datum);
    char const *(*_dbm_strerror) (mu_dbm_file_t db);

The _dbm_name method names the backend. It will be used as a scheme in URLs referring to databases of this type. The purpose of the remaining members should be pretty obvious. They correspond to API functions discussed above. Each API call is guaranteed to do the necessary error checking before calling the corresponding _dbm_ method.

The structure _mu_dbm_file is defined in mailutils/sys/dbm.h as follows:

  struct _mu_dbm_file
    char *db_name;                /* Database name */
    void *db_descr;               /* Database descriptor */
    int db_safety_flags;          /* Safety checks */
    uid_t db_owner;               /* Database owner UID */
    struct mu_dbm_impl *db_sys;   /* Pointer to the database implementation */
    union _mu_dbm_errno db_errno; /* Error description for the latest failed call */

The members db_name, db_safety_flags, db_owner and db_sys are initialized by high level API. Implementation functions should initialize db_descr and db_errno as necessary. The db_descr method keeps a pointer to the implementation-specific data describing the database. For example, in a NDBM implementation, it can hold a pointer to the DBM structure for the open database:

 1static int
 2_ndbm_open (mu_dbm_file_t db, int flags, int mode)
 4  DBM *dbm;
 6  /* Open the database and store its handle in db_descr */
 7  dbm = dbm_open (db->db_name, _mu_flags_to_ndbm (flags), mode);
 8  if (!dbm)
 9    return MU_ERR_FAILURE;
10  db->db_descr = dbm;
11  return 0;

The db_errno member is intended to keep the latest error value for use by the _dbm_strerror method. To satisfy most implementation, it is able to keep both numeric and generic pointer data:

  union _mu_dbm_errno
    int n;            /* numeric value */
    void *p;          /* pointer to an error object */

The implementation methods are supposed to keep its value up to date. For example, in GDBM implementation, they store the value of gdbm_errno in db_errno.n:

 1static int
 2_gdbm_store (mu_dbm_file_t db,
 3             struct mu_dbm_datum const *key,
 4             struct mu_dbm_datum const *contents,
 5             int replace)
 7  /* initialize local variables */
 8  ...
 9  /* Do the underlying call and return: */
10  switch (gdbm_store ((GDBM_FILE)db->db_descr, key_datum, content_datum, replace))
11    {
12    case 0:
13      break;
15    case 1:
16      return MU_ERR_EXISTS;
18    case -1:
19      db->db_errno.n = gdbm_errno;
20      return MU_ERR_FAILURE;
21    }

A corresponding _dbm_strerror implementation uses the stored value:

1static char const *
2_gdbm_strerror (mu_dbm_file_t db)
4  return gdbm_strerror (db->db_errno.n);

A properly initialized mu_dbm_impl structure must be registered with the library by calling the following function:

  int mu_dbm_register (struct mu_dbm_impl *impl);

Once done, it becomes avaiable for use by the API.

Registered implementations can be accessed via an iterator returned by the mu_dbm_impl_iterator function:

  int mu_dbm_impl_iterator (mu_iterator_t *itr);

For example, the following code snippet lists avaialble DBM implementations:

 1  mu_iterator_t itr;
 3  mu_dbm_impl_iterator (&itr);
 4  for (mu_iterator_first (itr), i = 0; !mu_iterator_is_done (itr);
 5       mu_iterator_next (itr), i++)
 6    {
 7      struct mu_dbm_impl *impl;
 9      mu_iterator_current (itr, (void**)&impl);
10      fprintf (stream, " ");
11      fprintf (stream, "%s", impl->_dbm_name);
12    }
13  fputc ('\n', stream);
14  mu_iterator_destroy (&itr);

For an example of using the backend implementation API see libmu_dbm/gdbm.c[4].