Annotated Bibliography for Natural Langugage Processing and Requirements Specification

Daniel Popescu

December 7, 2005

This bibliography was created while I was working on my master project. It describes all used literature, especially since no book is published about this topic.

Bibliography

1

Russel J. Abbott.
Program design by informal English descriptions.
Communication of the ACM, 26(11):882-894, 1983.

This paper first described the relation between informal English and datatypes. It develops an approach to use nouns for datatypes and verbs for operators. The used target language is ADA.

2

D. Barker and K. Biskri.
Object-oriented analysis: Getting help from robust computational linguistic tools.

This paper demonstrates an implementation of a CASE tool, which automatically creates an object oriented model. The authors also use only syntactical knowledge.

3

Daniel Berry.
Natural language and requirements engineering - nu?
In www.ifi.unizh.ch /groups/req/IWRE/papers&presentations/Berry.pdf, accessed on 2.12.2005.

This keynote talk presents an overview about natural language and requirements specifications. Berry argues that RE is where the formal meets the informal. Therefore, is inevitable that natural language is used for requirements. He discusses the tradeoff between natural language and formal languages in ambiguity and readability. Daniel Berry suggests three steps to improve the quality of natural language specifications.

Learn to write less ambiguously and less imprecisely.
Learn to detect ambiguity and imprecision.
Use a restricted natural language which is inherently unambiguous and more precise.

Berry argues that the traditional approach of solving this problem by trying to get everybody to be formal is doomed, that researchers should focus instead on doing a better job on natural language specifications.

4

Thorsten Brants.
Tnt: a statistical part-of-speech tagger.
In Proceedings of the Sixth Conference on Applied Natural Language Processing, 2000.

5

J. F. M. Burg and R. P. van de Riet.
The Impact of Linguistics on Conceptual Models: Consistency and Understandability.
In Proceedings of the First International Workshop on Applications of Natural Language to Databases (NLDB'95), pages 183-197, Versailles, France, 1995.

This paper tries to incorporate NL theories and knowledge into conceptual modeling. The authors claim that the lexicon is the central repository of all terminology and related linguistic elements. This paper discusses lexicon like WordNet.

6

John Carroll and Ted Briscoe.
High precision extraction of grammatical relations.
In In Proceedings of the 7th International Workshop on Parsing Technologies, 2001.

7

Nancy Chinchor and Beth Sundheim.
Muc-5 evaluation metrics.
In MUC5 '93: Proceedings of the 5th conference on Message understanding, pages 69-78, Morristown, NJ, USA, 1993. Association for Computational Linguistics.

This paper explains in detail the evaluation metrics for information extraction systems. It complements [14].

8

C. Denger, J. Dörr, and E. Kamsties.
A survey on approaches for writing precise natural language requirements.
Technical report, Fraunhofer IESE, 2001.

This report surveys the state of the practice and the state of the art in techniques that aim at making natural language more precise. Ten contributions to this problem are summarized in this report.

9

F.Fabbrini, M.Fusani, S.Gnesi, and G.Lami.
The linguistic approach to the natural language requirements quality: Benefits of the use of an automatic tool.
26th Annual IEEE Computer Society - NASA Goddard Space Flight Center Software Engineering Workshop, IEEE, November:-, 2001.

This paper presents a methodology for the analysis of natural language requirements based on a quality model addressing a relevant part of the interpretation problems that can be approached at the linguistic level. The evaluation of requirement documents following the presented method aims to support the passage from informal requirements to semi-formal/formal models. To provide an automatic support to this methodology a tool called QuARS has been implemented. For example, the tool can highlight vagueness in a specification, which is indicated to a verb like ``clearly''. (e.g. ``The C code shall be clearly commented.'')

10

N. Fuchs and R. Schwitter.
Attempto controlled english (ACE), 1996.

This paper presents the controlled language Attempto Controlled English (ACE). ACE is a computer processable subset of English for writing requirements specifications. Using ACE does not presuppose expertise in formal methods or computational linguistic. Specifications written in ACE are textual views of formal specifications in logic.

11

Norbert E. Fuchs, Uta Schwertel, and Rolf Schwitter.
Attempto controlled english (ACE) language manual, version 3.0.
Technical Report 99.03, Department of Computer Science, University of Zurich, August 1999.

The specification of Attempto Controlled English (ACE) [10].

12

W. A. Gale and K. W. Church.
Identifying word correspondences in parallel texts.
In In Proceedings of the DARPA SNL Workshop, 1991.

13

Vincenzo Gervasi and Bashar Nuseibeh.
Lightweight validation of natural language requirements.
Softw., Pract. Exper., 32(2):113-133, 2002.

This paper presents a lightweight formal methods for the partial validation of natural language requirements documents. Lightweight formal methods often perform partial analysis on partial specifications only. They do not require a commitment to translate an entire requirements document into a formal one. The methodology of this approach is a general methodology. They do not fixate the properties that they want to check within the specification. However, in the case study they demonstrate some models. For instance, one model is the VALSPACE model. The VALSPACE model collects all the values mentioned in the requirements as assignable to each data item. One of the properties they wanted to verify on this collection is that every non-constant data item had more than a single possible value. The case study is performed on a NASA Software Requirements Specification for the Node Control Software on the International Space Station.

14

Ralph Grishman.
Information extraction: Techniques and challenges.
In SCIE '97: International Summer School on Information Extraction, pages 10-27, London, UK, 1997. Springer-Verlag.

This paper gives an overview over information extraction. It introduces all fundamental terms and metrics. Additionally, it presents the history of information extraction and of the message understanding conference (MUC)

15

H. M. Harmain and Robert J. Gaizauskas.
CM-builder: A natural language-based CASE tool for object-oriented analysis.
Autom. Softw. Eng., 10(2):157-181, 2003.

This paper describes a natural language-based CASE tool called CM-Builder which aims at supporting the Analysis stage of software development in an Object-Oriented framework. CM-Builder uses robust Natural Language Processing techniques to analyse software requirements texts written in English and build an integrated discourse model of the processed text, represented in a Semantic Network. This Semantic Network is then used to automatically construct an initial UML Class Model. This paper contributes an evolution methodology for quantitatively evaluating NL-based CASE tools.

16

Mats Heimdahl.
An example: The lift (elevator) problem.
http://www-users.cs.umn.edu/ heimdahl/formal-models/elevator.htm, accessed on 14.12.2005.

17

Donald Hindle.
Noun classication from predicate-argument structures.
In Meeting of the Association for Computational Linguistics, 1990.

18

Natalia Juristo, Ana Maria Moreno, and Marta López.
How to use linguistic instruments for object-oriented analysis.
IEEE Softw., 17(3):80-89, 2000.

This article proposes an approach that is based on using linguistic information from informal specifications to apply during the process of creating an object oriented model. This method helps to analyze this information semantically and syntactically and employs a semiformal procedure to extract OO components. This works defines a formal correspondence between linguistic patterns and conceptual patterns. An engineer has to write a specification according to a controlled grammar (called SUL and DUL). Afterwards he can easily create an object oriented model, with the mapping rules, which this paper supplies. The copy of the IEEE online repository is hard to read, since only a black and white copy of the colored original work is available. Therefore, Moreno's other paper in this bibliography [27] complements this paper.

19

Christopher Kennedy and Branimir Boguraev.
Anaphora for everyone: pronominal anaphora resoluation without a parser.
In Proceedings of the 16th conference on Computational linguistics, pages 113-118. Association for Computational Linguistics, 1996.

This paper presents an algorithm for anaphora resolution. Anaphora is the use of a linguistic unit, such as a pronoun, to refer back to another unit. "The customer can buy text books and return them". Them is an example for an anaphora, which the automated tool must resolve. This anaphora resolver does not need full parsing, but works on the output of a part of speech tagger. A tool which includes this algorithm could preprocess the input text for the diagram creation CASE tool.

20

Leonid Kof.
Natural Language Procesing for Requirements Engineering: Applicability to Large Requirements Documents.
In Alessandra Russo, Artur Garcez, and Tim Menzies, editors, Automated Software Engineering, Proceedings of the Workshops, Linz, Austria, September 21 2004.

This paper describes a case study on an application of natural language processing to extract terms, to classify them and to build a domain ontology. The paper introduces a 9-step methodology to achive this. The case study contains a specification consisting of 80 pages.

21

Sascha Konrad and Betty H.C. Cheng.
Automated Analysis of Natural Language Properties for UML Models. In Satellite Events at the MoDELS 2005 Conference., Jean-Michel Bruel, Editor, Springer-Verlag, Lecture Notes in Computer Science, Volume 3844/2006, Montego Bay, Jamaica, October 2-7, 2005, pp. 48-57.

22

Beum-Seuk Lee and Barrett R. Bryant.
Automated conversion from requirements documentation to an object-oriented formal specification language.
In SAC '02: Proceedings of the 2002 ACM symposium on Applied computing, pages 932-936, New York, NY, USA, 2002. ACM Press.

This work describes an approach to convert a natural language specification into a formal specification language. It uses an intermediate language to bridge the two specifications, which allows reasoning about the content to resolve ambiguities.

23

David D. Lewis and Karen Sparck Jones.
Natural language processing for information retrieval.
Communications of the ACM, 39(1):92-101, 1996.

24

L. Mich.
Nl-oops: from natural language to object oriented requirements using the natural language processing system lolita.
Nat. Lang. Eng., 2(2):161-187, 1996.

NL-OOPS is a CASE tool that generates object oriented models from natural language requirements documentation. Although more tools like this exist, this is the only tool that is a long-term research project. It uses LOLITA that is a large scale NLP system. LOLITA creates a semantic net from the input text. NL-OOPS creates candidates classes from the nodes of the semantic net. In a second phase the classes with attributes and methods are created.

25

Luisa Mich, Mariangela Franch, and Pierluigi Novi Inverardi.
Market research for requirements analysis using linguistic tools.
Requir. Eng., 9(1):40-56, 2004.

This paper presents the results of an online market research intended to assess the economic advantages of developing a CASE tool that integrates linguistic analysis techniques for documents written in natural language and to verify the existence of potential demand for such a tool. This paper shows that there is a demand for such a CASE tool on the market.

26

Luisa Mich and Roberto Garigliano.
Nl-oops: A requirements analysis tool based on natural language processing.
In Proceedings of Third International Conference on Data Mining Methods and Databases for Engineering, Bologna, Italy, 2002.

An update on the Mich's original work. This paper is easier to read. It demonstrates the tool with screenshots in a case study.

27

Diego Mollá, Rolf Schwitter, Fabio Rinaldi, James Dowdall, and Michael Hess.
Nlp for answer extraction in technical domains.
In 11th EACL 2003, Proceedings of the Conference, 2003.

This paper describes a question answering system for technical domains. It motivates to especially look at the technical terminology. Terminology detection is needed to achieve high recall values for natural language processing tools in industry context.

28

A. M. Moreno.
Results of the application of a linguistic approach to object-oriented analysis.
International Journal of Software Engineering and Knowledge Engineering, 8(4):449-459, December 1998.

This article complements [18].

29

Hiroshi Nakagawa and Tataunori Mori.
A Simple but Powerful Automatic Term Extraction Method.
In 2nd International Workshop on Computational Terminology, 2002.

This paper presents a simple automatic term extraction method. The approach is based on the statistics between a compound nound and its component single-nouns. The authors present several scoring methods based on this idea to identify terms. On the homepage of the author a perl script that implements this approach can be downloaded. The homepage is in Japanese, but the tool is described in English.

30

Sastry Nanduri and Spencer Rugaber.
Requirements validation via automated natural language parsing.
Journal of Management Information Systems, 12(3):9-19, Winter 1995-96.
Journal version of hicss.ps.

This paper presents an approach to automatically extract object oriented models from a requirements specification. It uses syntactical information for extracting the model. This is achieved by applying extracting rules on the links of a sentence parsed by the link grammar parser.

31

Scott P. Overmyer, Benoit Lavoie, and Owen Rambow.
Conceptual modeling through linguistic analysis using lida.
In ICSE '01: Proceedings of the 23rd International Conference on Software Engineering, pages 401-410, Washington, DC, USA, 2001. IEEE Computer Society.

This paper presents a methodology to assisst an analyst in making the transition from natural language text to object oriented models. The developed tool uses a part-of-speech tager to identify nouns. The nouns are highlighted in a text. A human analyst decides if a noun should be a class or not. Additionally, it integrates a frequency analysis of candidate nouns. The methodology suggests to create the model iteratively. The tool is an assistant. It does not create models on its own.

32

J. Rumbaugh.
Object-Oriented Modeling and Design.
Prentice Hall, 1991.

This book explains the OMT methodology. This object oriented methodology introduces heuristics for extracting objects among others. Some heuristics are based on syntactical information. The implemented rules for the CASE are partially based on this book.

33

H. Schutze and J.O. Pederson.
A cooccurrence-based thesaurus and two applications to information retrieval.
In In Proceedings of RIA 0 Conference, pages pp. 266-274, 1994.

34

Manuel Serrano, Mario Piattini, Jose-Norberto Mazon and Juan Trujillo.
Using WordNet Ontology to automatically enrich dimension hierarchies in a data warehouse.

35

D. D. Sleator and D. Temperley.
Parsing English with a link grammar.
In Third International Workshop on Parsing Technologies, 1993.

This paper describes the theory behind link grammars. Furthermore, it presents link grammar parsing algorithm.

36

Frank Z. Smadja.
Retrieving collocations from text: Xtract.
Computational Linguistics, 19(1):143-177, 1994.

37

Richard Sutcliffe and Annette McElligott.
Using the link parser of sleator and temperley to analyse a software manual corpus, 1995.

This paper presents a case study about the link parser of Sleator and Temperley [33]. The authors parsed extracts from three different software manuals. They measured the accuracy of the parser and tried to find ways to improve it. In the end, they reach an acceptance rate of 90% after they extended the dictionary with terms from the manual.

38

M. Torii and K. Vijay-Shanker.
Using machine learning for anaphora resolution in medline abstracts.
In In Proc. of Pacic Symposium on Computational Linguistics, 2005.

39

Manuel Wimmer and Gerhard Kramler.
Bridging Grammarware and Modelware.
In Satellite Events at the MoDELS 2005 Conference, Jean-Michel Bruel, Editor. October 2-7, 2005, Montego Bay, Jamaica, Springer-Verlag, Lecture Notes in Computer Science, Volume 3844/2006, pp. 159-168.

About this document ...

This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.49)

The command line arguments were:
latex2html annotated_bibliography.tex -split 0 -dir biblio -no_navigation

The translation was initiated by Daniel Michael Popescu on 2005-12-15

Daniel Michael Popescu 2005-12-15