Debtags Frequently Asked Questions

Contents

General

What is Debtags?

Debtags is a set of categories to describe Debian packages.

It provides a vocabulary of categories as well as tag information for the packages.

Where can I find information about Debtags?

http://debtags.alioth.debian.org has a documentation section.

Also, various blog posts of Enrico Zini about Debtags can be found at http://www.enricozini.org/tags/debtags.html

Why aren't debtags yet (well) integrated with apt & co.?

They probably don't need to be integrated with apt, whose main purpose is to resolve dependencies and figure out what packages to install.

However they should be integrated with higher-level package managers, like synaptic and aptitude. Why they aren't yet integrated, it's a question you have to ask to their authors. I hope that my recent work on apt-xapian-index can give a good way to integrate debtags support and much more into all sort of existing applications.

What are future plans and perspectives?

Hopefully, to get package managers to use debtags.

There is a very interesting discussion going on about creating tags for use by the security team: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=436161

There is also some work on 3rd party tag sources: Myriam Ruiz, for example, is providing tags for parental rating of games: http://www.miriamruiz.es/tags/ And more can follow their example.

How many people are working on or around Debtags now?

There's about 6 active people, but everyone active on something different:

  • Enrico Zini is working at the Debtags libraries and tools
  • Erich Schubert is working at the packagebrowser and central tag repository
  • Benjamin Mesing is experimenting with automated bayesian tagging, and also working at packagesearch, which uses debtags, among other data, to search for packages
  • Thaddeus H. Black is working at debram, with the intention of converging to debtags
  • Justin B. Rye is active on the vocabulary.
  • Emanuele Rocca maintains the bash completion scripts for debtags

Plus, there are various occasional contributors adding tags, either with Erich's packagebrowser or with debtags-edit.

Are there projects out there using the tags?

Benjamin Mesing's packagesearch is probably the most mature package search application using Debtags at the moment. Enrico Zini's debtags-edit is another package search program, which also allows to enter and submit new tags.

Then there is libept, which allows allow to access tags together with all other sorts of information about packages. It has a command line interface called ept-cache.

Peter Rockai's adept is a recently developed KDE package manager which has been built with debtags support right from the start.

apt-xapian-index is an attempt to create a new system wide index for package information that also integrates debtags.

What is the status of the central tag repository?

Erich Schubert is doing a good job on maintaining the central tag repository, although he's very busy and there is always something more to do.

Tags are updated continuously, both with contributions from the packagebrowser and with contributions from debtags-edit. Every night, the central tag database is exported and made available to debtags update: the tags http://people.debian.org/~enrico/tags/ tag source is thus the most up-to-date tag source (and to our knowledge, also the only one) that is available for the main tag collection.

Is there a debtags mailinglist?

Sure: there is an Alioth project which also includes the debtags-devel mailing list. You're more than welcome to subscribe to it!

Using the data

What is a facet?

A facet is a group of tags which describe the same quality of a package. For more informations, see the notes on the theorical foundation of debtags.

What does a notation such as "works-with::image:raster" mean?

It means that the facet (the point of view from which we look at the packages) is "works-with", and that the tag (what kind of data this package can handle) tag: "image:raster".

In other words, "works-with::image:raster" should be read as "Looking at what kind of data a package can handle, this package handles raster images".

There seems to be a long and a short version of the tags e.g. mail::smtp vs. Electronic Mail::SMTP Protocol. Where can we find a mapping of those?

You can find it on the web at http://debtags.alioth.debian.org/tags/vocabulary.gz

If you have debtags installed in your system, you can also access the tag vocabulary locally at /var/lib/debtags/vocabulary

It seems you renamed/removed some tags. When you do that, do you also apply these changes to already tagged packages?

Yes we do. It may happen, however, that if you look at the Packages file there are still the old tags: that is because the ftp masters may take time before installing the updated tag database in the archive.

Providing new data

Is there a debtags-policy?

No, there hasn't been a need for it yet.

Can I create my own set of tags and add them to all the packages I want?

Definitely yes! You can add any tags sources to /etc/debtags/sources.list.

Try it yourself:
  1. create a directory /etc/debtags/personaltags

  2. create the file /etc/debtags/personaltags/vocabulary adding some facets and tags. For example, you could use this:

    Facet: personal
    Description: Personal preference
    
    Tag: personal::essential
    Description: I cannot live without it
    
    Tag: personal::useful
    Description: Tried it and found it useful
    
    Tag: personal::bad
    Description: Tried it and did not like it
    
    Tag: personal::interesting
    Description: It looks interesting, but I have not tried it yet
    
  3. create the file /etc/debtags/personaltags/tags-current adding some tag data. For example, you could use this:

    mmv: personal::essential
    mc: personal::essential
    xdiskusage: personal::essential
    buffy: personal::useful
    debtags: personal::interesting
    
  4. gzip both files. You should then have /etc/debtags/personaltags/vocabulary.gz and /etc/debtags/personaltags/tags-current.gz

  5. add the new tag source to /etc/debtags/sources.list:

    tags file:/etc/debtags/personaltags/
    
  6. run debtags update

This is it. You should now be able to run packagesearch or debtags-edit and find your own facets and tags. debtags-edit will also allow you to tag packages using your personal tags, and will save them in the local tag patch (in ~/.debtags/patch).

What makes a tag good for being added to the vocabulary?

This is a list of rule-of-thumb criteria:

  • It should represent a clear, atomic concept
  • It should have a facet to fit in
  • There should be more than 6 or 7 packages in Debian that can make use of it

Remember that categorisation in Debtags happens with a combination of tags; this means that instead of having a "dvdplayer" tag, we have the combination "use::playing, works-with::video, hardware::storage:dvd".

These combinations also allow to create reasonable approximations of tags that should not be added because they are not yet used by many packages. For example, the tag "devel::lang:brainfuck" should not yet be added because the corresponding packages in Debian are too few, but it can be reasonably approximated using combinations of devel::interpreter, devel::compiler and use::entertaining.

Do you have tips for tagging?

Justin says:

The following tools have been particularly useful for working out what unfamiliar packages are all about.

  • apt-cache (obviously; but nb "apt-cache rdepends")
  • apt-file ("does it put anything in /usr/bin? In init.d/?")
  • debman, in debian-goodies ("what does its man page say?")
  • surfraw (instant lookups of packages.debian.org/foo)

Any reason why there are no license:: tags in debtags?

It has been tried, but we had discouraging replies.

The main problem is that licensing information for a package are too complex to be represented in a single tag.

Please also read this thread in debian-devel for a discussion of other ways to implement this.

Integration in Debian

How can maintainers interact with debtags?

They can go in their DDPO page (http://qa.debian.org/developer.php) and click on the "Reports: debtags" link to view the Debtags situation of their packages and edit the categories.

When is Debtags going to be integrated into apt or aptitude or ...?

That is still a bit out of reach at the moment.

There are proof-of-concept implementations inside the debtags tool: you can do debtags search, which is like apt-cache search but also shows tags. or debtags grep which shows packages matching a certain tag expression (try debtags grep 'use::editing && media::rasterimage' and even debtags install that does the same as debtags grep but also invokes apt-get to install the resulting packages.

The hope lays in libept, which is still in the making but will provide a unique interface to all kinds of package metadata. It will hopefully be a solid and complete foundation to be used by package managers, and will also make Debtags information available to them.

In the meantime, if you want a graphical interface to look for packages you can use packagebrowser or debtags-edit.

When does a new package get tagged?

Everyone can tag new packages using the packagebrowser or debtags-edit, but you can see the new data in apt-cache only after Enrico manually reviews them.

When are the tags going to move in the control file?

Good tags are copied in the Packages file by means of an "override" file, which is a file that adds or overrides a field from the control file written by the package maintainers.

Tags are added to the override file after manual review. Think of the tags in the override file as the "stable" tags and the ones in the debtags database as the "unstable" tags.

Allowing the maintainers to specify tags in the control file could be difficult for many reasons:

  • Some tags are more easily added by people who are not the DD. The maintainer can add made-of::* and interface::*, but some other person could add works-with::* and accessibility::*.
  • Sometimes we do a reorganization (for example, moving protocol::icq to protocol::im:icq) and we can't ask all maintainers to handle those changes, and these reorganisations tend to happen quite often.

What should I do if I'm [also] packaging for derivative distributions?

You can provide tag information specific to your target group of users. See Can I create my own set of tags and add them to all the packages I want?.

Since new versions of debtags (>= 1.7.3), you can create a package that installs the tag data somewhere (say, in /usr/share/mydistro/tags) and installs a file under /etc/debtags/sources.list.d/ to automatically get debtags to use them.

Is there any plan to drop "Section:" ?

I don't think they'll be dropped, as they're serving a different purpose at the moment (that is, splitting the archive somehow). In my view, they should be ignored by package managers, using debtags instead.

How come there are different sets of tags in the Packages file and in /var/lib/debtags?

There are a few reasons:

  • Debtags supports merging different tag sources: for example, iterating.org provides a tag source with package rankings and debtags is able to download it and merge it to the other tags. Tag sources are listed in /etc/debtags/sources.list. This also allows some of us to use the unreviewed tags on Alioth instead of the ones in the Package database.
  • For many applications the tags are easier to access when aggregated on a small file rather than by parsing the very large package database
  • Finally, the debtags database in /var/lib/debtags is also indexed for fast access.

Have the Packages file as the primary tag storage has never been the main idea, although it's turned out to be useful to allow tags to be useable in software such as apt-cache, aptitude and grep-dctrl without them having to be modified to access an extra database.

The Packages file has tags like network::{client,server,service} and this breaks grep-dctrl

Those compressed tags are there because APT does not like long lines.

You can use debtags dumpavail or ept-cache dumpavail to feed data to grep-dctrl without the compressed tags.

debtags dumpavail also supports tag expressions, so you can even run commands like:

debtags dumpavail 'role::program && game::*' | grep-dctrl <options>

ept-cache dumpavail instead supports all ept-cache search and sort options, so you can do something like:

ept-cache dumpavail -t gui image editor -s p | grep-dctrl <options>

Web interface

How does the web interface work?

It is explained in the web interface itself: go to http://debtags.alioth.debian.org/todo.html or http://debtags.alioth.debian.org/edit.html, choose a package, then click on the [help] link on top of the page.

Where do the tags added through the web interface get stored?

They are stored in a file on Alioth, which you can download at http://debtags.alioth.debian.org/tags/tags-current.gz

Enrico regularly fetches the updates to that file, does a manual review, then commits the reviewed updates to svn://svn.debian.org/debtags/tagdb/tags, which also gets uploaded to Debian.

Development

How can I experiment writing applications using debtags?

One way to start is reading the apt-xapian-index introduction and follow to the next posts that show how to use the index.

For C++, have a look at libept-dev, which allows access to both debtags and apt package data.

For Python, the python-debian package has a good debtags module and various interesting code examples.

Otherwise, you just access the data files directly: when the debtags package is installed, you can find them in /var/lib/debtags.

And of course don't forget to subscribe to the debtags-devel mailing list, where you can ask for help.

How can I help?

There are three main things needing help:

  1. You can take care of the website, and keep it updated with the news that happen in the list.
  2. You can try to use debtags functions (you can now do it from C++, Python and Perl!), and ask questions that could then be turned into Doxygen comments, HOWTOs, tutorials, FAQs, example code and other forms of documentation.
  3. If you have knowledge of some specific field and a twist on categorization, you can help improving the vocabulary

Here are other things that would be needed, but might be a bit more difficult:

  • Help maintain library bindings to languages different than C++
  • Help improve the GUI tools
  • Help packaging all the various Debian packages related to Debtags
  • Help writing more C++ test cases for the libraries
  • Help with i18n/l10n issues, to take Debtags on a trip outside of the C locale
  • Use libtagcoll1 to bring the Debtags faceted classification approach to domains different than Debian packages: think browser bookmarks, multimedia repositories, mp3 archives, documentation, launcher menus... the approach has big potential in so many fields!

Older questions

Aren't debram and debtags duplicating the same effort?

Yes, but only up to some point: they started as two parallel projects that didn't know about each others. Debtags has a more solid theorical foundation, while debram has data for the entire set of packages in Sarge.

Thaddeus H. Black, the author of debram, intends to converge to debtags and is an active poster in the debtags-devel mailing list. For this reason the debram package suggests debtags: like saying "yes, I'm ok, but you might want to look at debtags as well".