FASTUS

Extracting Information from Real-World Texts

FASTUS is a (slightly permuted) acronym for Finite State Automata-based Text Understanding System. It is a system for extracting information from free text. Currently English and Japanese versions of the system exist. Typical applications mark text with annotations that indicate items of interest, such as names of people or companies, or it fills database templates with information that could be then entered into a relational database.

Buried by an avalanche of text. FASTUS was developed in response to the needs of the intelligence community for scanning and processing huge volumes of written texts. Government intelligence agencies collect information from around the world from both classified and unclassified sources. Assimilating important facts from this data can be a daunting task for an analyst. One analyst described the problem by saying that, ``If I read every bit of information that might be important to what I am working on, it would be like reading War and Peace every day.'' FASTUS provides the analyst with a tool that will help him or her to avoid being overwhelmed by the flood of information.

FASTUS is most appropriate for information extraction tasks, rather than full text understanding. That is, it is most effective for tasks in which (1) only parts of the text contain relevent information, and (2) there is a relatively simple, predefined target representation that the information is mapped into.

FASTUS has been under development since 1992. The system is implemented in Common Lisp, and has been transported to several hardware platforms, including Macs and PCs.

If you're curious, you can read a paper (in HTML) about how FASTUS works.

For more detailed information, you can download the following publications:

If you want more information about how FASTUS can solve your text processing problem, drop us a line and let's talk about it.