



INTEGRATED MANAGEMENT OF LAND BASED ACTIVITIES
IN THE SÃO FRANCISCO RIVER BASIN PROJECT
ANA/GEF/UNEP/OAS
Activity 4.7C Metadata Based Reference System
Executive Summary of the Final Report
SÃO FRANCISCO BASIN'S INFORMATION NETWORK
Brasília - DF
INTEGRATED MANAGEMENT OF LAND BASED ACTIVITIES
IN THE SÃO FRANCISCO RIVER BASIN PROJECT
ANA/GEF/UNEP/OAS
Activity 4.7C Metadata Based Reference System
Executive Summary of the Final Report
SÃO FRANCISCO BASIN'S INFORMATION NETWORK
Valdemar Santos Guimarães
Superintendency of Hydrological Information SIH
Agência Nacional de Águas ANA
Coordinator
Augusto Franco Malo da Silva Bragança
Agência Nacional de Águas - ANA
Consultant
Luiz Bursztyn
Collaborators
Francisca Neta A. Assunção
Iuri Marmo
Dimas Moreira Jr.
Liandra Silva Bezerra
Maria Rita Souza Fonseca
Wandersonn Nogueira Santos
Contract OAS CPR # 31519
March 2003
SÃO FRANCISCO BASIN´S INFORMATION NETWORK
Executive Summary
INTRODUCTION
The Activity 4.7C ("Metadata Based Reference System") is part of Component IV
("Formulation of the Integrated Watershed Management Program), of the GEF São
Francisco Project. According to the Terms of Reference, the Activity's objectives include:
· Creating a database with references regarding information on the São Francisco
Basin, distributed among the diverse governmental institutions;
· Providing a search and diffusion mechanism for this information, in a way to
promote its sharing among the current and potential users; and
· Aggregating the representatives of these users in a type of committee, contributing
to the constitution of a community of people interested in the network's
information and in its continuity.
The Activity was based at the Brazilian Water Agency (ANA), which provided the system's
hosting infrastructure, in view of the reasons presented ahead.
The ANA, executing agency of the National Water Resources Policy, created by Law
9,433/97 and part of the National Water Resources Management System, is responsible for
organizing, implementing and managing the National Water Resources Information System,
the SNIRH (articles 3rd and 4th, incise XIV, of Law 9,984/2000). The SINRH's basic
principles are the decentralization of data collection and processing, unification of the
System's coordination and guarantee of access to the information, to all the society (art. 26,
incises I to III, of Law 9,433/97).
The National Water Resources Council, the Brazilian Water Agency, The State and Federal
District Water Resources Councils, the Basin Committees, the Water Agencies and
governmental institutions related to water resources management are the participants in the
National Water Resources Management System, as mentioned in art. 33 of Law 9,433/97,
with the changes indicated in art. 30 of Law 9,984/00.
The responsibility for organizing, implementing and managing the SNIRH belongs to the
Superintendence of Hydrological Information (SIH/ANA), as stated in Resolution 09/2001,
of the Collegiate Directory of ANA (art. 29, incise III), which sanctioned the Internal
Rules, the Demonstrative Chart of Commissioned Posts and the Agency's Organizational
Structure.
i
To contribute to the structuralization of the SNIRH, a water resources information network
was designed, to allow the gathering, organizing and making available existing water
resources related information, which may be of interest to its management.
The proposal for building these systems of reference, in network form, contributes to this
objective. It will make possible to store in a unique repository the "data on the data", which are
the metadata. The different institutions and their data on water resources will be mapped and used
to support planning and management activities. Thus the aim of this Activity is to create an
information network on the São Francisco River Basin.
The tecnique used in this mapping (collecting and registering of metadata) is already adopted by
several libraries throughout the World, permitting a huge volume of documents circulating in the
Internet. The purpose of the present work is not ot discuss metadata in detail, but to gather
essential information regarding the efforts for creating the Metadata Based Reference System.
1. ACTIVITY'S
CONTEXT
The design of the São Francisco Basin's Information Network (RISF) belongs into a greater
context, where there are several similar national and international initiatives, such as the ones by
the Geospatial Data Clearinghouse, by the Federal Geographical Data Committee (FGDC) and by
the Baía de Guanabara's Information Center, among other.
The EPA Geospatial Data and the Federal Geographical Data Committee (FGDC) are North-
American institutions which inspired this work. They might be visited at their Internet addresses
(www.epa.gov/nsdi e www.fgdc.gov). In Brazil, The Baía de Guanabara Information Center has
been a source of inspiration to similar projects.
An Information Network is an association of institutions producing information on a given
subject and willing to share this information among them, viewing a reduction in production and
distribution costs. In this case, information is related to the São Francisco Basin.
Natural candidates to this are governmental and non-governmental institutions detaining
information of social interest. As the priority topics at this stage would be of scientific and
economy nature, governmental agencies, research institutes and Universities working in the field
would be the most interested.
Given the dimensions of the São Francisco Basin, this Activity was planned to be implemented in
stages. In this first stage, in view of the budget, in addition to the National Water Agency (ANA)
itself, the following federal institutions located in Brasília will be considered:
· Company for the Development of the São Francisco and Parnaíba Basins(CODEVASF)
· National Electrical Energy Agency(ANEEL)
ii
· Brazilian Institute of Environment and Renewable Water Resources(IBAMA)
· Research and Mineral Resources Company(CPRM)
· National Institute of Meteorology(INMET)
· Water Resources Secretariat (SRH).
2. CONCEPTS AND BASIC DEFINITIONS
2.1. GEOGRAPHICAL INFORMATION
The geographical information, by itself, is already a descriptive data, being a way of describing
objects and spatial phenomena. In other words, an object or geographical phenomena is either
direct or indirectly georeferenced.
2.2. GEOGRAPHICAL INFORMATION SYSTEM (GIS)
The so-called Geographical Information System (GIS) is a system oriented towards the
processing of geographical data. In a practical sense, this means that based on package also
known as sig (in lower case to differentiate) the GIS become a data processing system which
observes and highlights its spatial representation, in compliance with the current cartographic
standards. Therefore, the GIS is the state of the art of the geoprocessing, or geomatics, as this
field of information systems is known.
As any information system, the GIS is product of a social and technical constitution process,
dependent on the geographical information supporting it. Therefore, it depends on the proper
recognition and appropriation of the information that will constitute it.
2.3. SET OF GEOGRAPHICAL DATA (SGD)
The expanding use of this information technology (sig), inside and outside of geoscience, makes
indispensable its complete characterization or description as a geographical information. It was
decided to call those the Set of Geographical Data (SGD). The description of a SGD is aimed at
its correct and efficient appropriation, as well as its reproduction by the GISs, as those are created
in different research programs or institutions.
iii
2.4. METADATA
It is fundamental to document a SGD at the time of creation or reproduction. This record, data on
the data, or metadata (a definition already accepted by the technical literature) is as important as
the information itself. Only through the associated metadata, the digital SGDs created by a great
number of institutions, by means of research, projects or activities, may be effectively
appropriated by the GIS under development, by the diverse public and private organizations.
Thus, in order to meet the needs for search and recovery of those geographical information,
required for the constitution of a GIS, it was found necessary, along with the SGDs' creation and
reproduction processes, to think of activities which would guarantee its complete description.
This description should be in a common language to all those with interest in the future
appropriation of the data, in their own projects and activities.
The universal and institutional use of those SGDs depends only, and exclusively, of these efforts
for standardized, dependable and updated description of the metadata. The metadata are as
important as the SGDs which describe them, being their own digital identity. The standardization
must be compatible with the national and international standards. The American proposal,
defended by the Federal Geographical Data Committee (FGDC), has been unquestionably
accepted in many Countries, including Brazil.
The experience of the Baía de Guanabara's Information Center also presents a useful
contribution. Its solution recommends that the structure for appropriation of data of the FGDC
take into account the need to describe, besides the georeferenced SGDs, those non-georeferenced.
In other words, text information, spreadsheets, tables and graphs, containing or not explicit
georeferences, should be used.
It was interesting to characterize these non-georeferenced SGDs by a simpler set of descriptors
than those proposed by the FGDC. It required the evaluation of other widely accepted
international standard, the Dublin Core (DC), trying to make it compatible with the FGDC, as it
had already been attempted by other Countries. The DC is a European initiative, widely used in
the Internet, especially for presenting library stocks (vide http://purl.oclc.org/dc/).
A final structure resulted form the articulation of the FGDC's and DC's proposals, with a
predominance of the first. The following descriptors, in the FGDC standards were maintained:
metadata identification, quality and spatial organization of data, spatial reference, institutions and
attributes, distribution and reference.
As a pattern must describe any type of geographical information, the FGDC proposes a great
number of descriptors, for each of the above sections. There are, altogether, 334 distinct
elements, 119 of which are compounded, existing only for the purpose of grouping others. That is
the case of the Section 1 (Identification Information), which includes details on the registered
data (title, author, publishing date/local, number of issue, format, series' name/volume, editor,
cartographic scale, abstract of publication, purpose, stage of development, frequency of updating,
iv
geographical coordinates, key-words, location, etc.). For its adoption, it would be wise a
discussion regarding the descriptors extension.
A minimal set of elements must be supplied, thus assuring a simple reference for each of the
SGDs produced by the different institutions. As a matter of fact, it is over this basic set that the
indexing of the metadata bank is developed, for consultations.
3.
REFERENCE SYSTEMS AND CLEARINGHOUSE
3.1. STORING AND CATALOGUING FOR SHARING
Certainly, all the efforts aimed at the sharing of the SGDs produced by governmental agencies
are a demonstration of integrity of the public management. This sharing is fundamental for the
public activities which depend on the availability of geographical information.
Therefore, the geographical information deserves a certain privilege, in terms of actions viewing
its dissemination and sharing, mainly in view of the high costs associated with the production of
georeferenced SGDs, of its great potential for re-use and the possibility of interaction with other
SGDs. Additionally, it plays a fundamental role in the constitution of GIS based decision support
systems.
There seems to be an agreement that one of the key-points for developing of a policy for sharing
geographical information consists of making available a directory of the existing geographical
information, preferably in digital media. In the governmental context, it is necessary, initially, to
prepare the directory of those detaining the information.
The principle that "all digital data must also have a digital description" is the main drive of all
initiatives throughout the World, viewing the implementation of digital data reference systems.
The World Wide Web (WWW) is based on standard metadata definitions, in a way that their
search engines allow the interested users to find and retrieve information. The importance of
metadata for search and consultation in the WWW has been growing with the accelerated
expansion of the network, which has over 36 million addresses currently.
With such a volume of data and the global understanding that the network stores a lot of useless
information, it becomes evident the need for digital directories, to support navigation in the web.
The biggest problem, common to most information systems, and the WWW is, certainly, the
greatest of them, is that this detail was not considered in its technical and operational
infrastructure.
The Internet and the WWW were originally projected taking into account the directory of their
contents. The suite of TCP/IP protocols which support the basic infrastructure of the Internet
takes care only of conveying the information, while the Hyper Text Transfer Protocol (HTTP)
handles the flux in the WWW. This means that, in the level of the current protocols, there is not
support for finding available information resources in the web. For this reason, solutions began to
v
appear, from the beginning of the Internet and, afterwards, in the WWW, focusing this structural
problem.
The current web consultation and information retrieval instruments are, today, highly
sophisticated, overcoming flaws in their basic protocols. The presented solutions are based on
two basic components, which may work in conjunction, in some cases: Directories, or catalogues,
and search engines.
Directories offer a more effective way of searching. Its content, even though with an objective
structure, is filled in a more subjective way, granting it a pseudo-objective constitution. Its filling
follows an intensive activity, focused on the interpretation and classification of the information,
with little technological base. On the other hand, the search engines have problems such as:
·
Increasingly, information in the web is being presented after dynamic consultations to
databases, in response to parameters supplied by the users. Those databases belong to
the called "occult web".
·
Engines' elements are automated, allowing web resources to be retrieved without user
intervention. As a result of this, the quality of the search is directly dependent on the
indexing of web rsources.
·
Engines usually do not search the whole WWW, giving preference to commercial
addresses, according to surveys by Lawrence & Giles.
·
With automated searches, the engines present unreliable results, many times not
satisfying the users' interests, in spite of attempts to mitigate the problem, by classifying
the responses.
·
The exponential growth of information in the web might saturate the capacity of the
search engines, according to current surveys.
Evidently, the future points to the following:
(1) The development of reference systems for digital databanks, such as the one built into
the RISF and in the Baía de Guanabara's Information Center;
(2) The convenience of using standardized metadata, defined as the set of thematic
descriptors or collection of items of digital information1; and
(3) Intensification of the use of directories with their own search engines.
1 Digital information means a file with data (a spreadsheet, for example).
vi
3.2. DIGITAL DATA REFERENCE SYSTEM
If an Information Society is truly under development, it is unquestionable the value of a
Reference System to the stored data. Directories are useful tools for managing information banks,
and its worthiness is proportional to its volume. A catalogue with concise and well structured
descriptions should be as easy to control as the data it is referred to. In this sense, the act of
cataloguing a collection of items must be understood as a process of representing the knowledge.
While information systems are increasingly appropriating the information related activities, along
with the escalation of works with digital content, it becomes mandatory that the digital data
generation processes also register their description and the location, for future reference and use.
3.3. NATIONAL AND INTERNATIONAL INNITIATIVES AND STANDARDS
In the 90's, an enormous boost in the use of the Internet was noticed, in an attempt to disseminate
in an ample way set of geospatial data, with the understanding that they would be a basic element
for environmental management. In the United States, Europe, Austrália and New Zeland, to
mentionthe most advanced innitiatives, a common axis of concerns was formed, motivated by
factors such as:
·
Advent of the sig;
·
Recognition that te duplicity of efforts for using the sig should be reduced, through
integration among data producers and geospatial information users; and
·
The certainty that an adequate infrastructure would allow the dissemination of digital
information, optimizing its shared se.
Searching for addresses of similar projects at the Internet it is noticed that each of them follwed a
particular path to develop a policy for classifying and organiing th digital information. However,
as it would be expected, these paths converged, at slow pace, in the search of more universal
solutions. This happened both as a consequence of the Internet's need for standardization, and in
view of the interests of information users and producers.
In this aspect, the American proposal is paradigmatic. In 1994, the NSDI (National Spatial Data
Infrastructure) was officially created to coordinate collection activities and management of the
georeferenced data inn the United States. At that time, it was thought that the recordig of digital
maps by federal agencies was not efficient, due to the lack of planning and coordination of the
activities.
As the federal agencies' budgets were reduced, at the early 90's, while data requirements and
capacities augmented, it became more important to promote the integration among the private
and public sectors.
vii
With that in mind, one of the basic objectives of the creation of a virtual data infrastructure was
to improve the understanding of the SGDs and of the strategies to collect data for them. Those
strategies did not aim at the immediate sharing of the SGDs themselves, but in the sharing of
their collecting and maintenance costs, through data interchange agreements.
Through the Executive Decision 12906 (April 13th, 1994), of the President of the United States, a
directive was issued for implementation of the National Spatial Data Infrastructure, with the
following definitions:
·
National Spatial Data Infrastrure -NSDI: Technologies, policies, norms and human
resources for acquisition, processing, storing, disseminating and improving the use of
georeferenced data.
·
Georeferenced Data: Information which identifies the geographical location,
characteristics and relief, as well as other frontiers in the Earth's face. This information
may be obtained by remote sensing, cartographic or topographic methos, among other.
This definition might include statistical data, depending on the willingwess of the
collecting agency.
·
The National Geospatial Data Clearinghouse is a network of georeferenced data
producers, managers and users, electronically connected.
These words, in an intentionally generic context, allowed the development of a universal solution
for spatial data sharing. Unfortunately, the vague definnition of georeferenced data permitted that
some agencies considered their information as non-georeferenced, avoiding submission to the
Executive Decision. The power of the authorities to enforce this type of regulation is limited.
Despite of the limitations, there are a great number of organizations that know the benefits of this
partnership. Among those benefits, there are gratuitous publicity, sharing of information, use of
low-cost or free software and reduction in costs, through common approaches for data
development.
3.4. CLEARINGHOUSE
The Geospatial Data Clearinghouse is a central organism in the NSDI and was specifically cited
in the Executive Decision, as an electronic catalogue service, in harmony with the FGDC's norms
for spatial digital metadata.
The Federal Agencies, in the United States, are obligated to put their metadata in a knot accessed
by the Clearinghouse. The GIS experts in those agencies must consult the georeferenced data
available in the Clearinghouse, before collecting or producing new ones. In this manner, the
georeferenced SGDs might be used for multiple purposes, besides those they were originally
meant for.
viii
The Geospatial Data Clearinghouse, conceived and implemented by the U.S. Federal
Geographical Data Committee (FGDC)2, operates in the United States since 1996, with the
objective of providing services viewing consultation and retrieval of all kinds of geographical
information. Originally meant to contribute to the sharing of data collected by the government, its
technology was vastly spread through different institutions (governmental, private and
universities) in the United States and in other Countries. It connected GIS' users to data suppliers
of all types.
The Clearinghouse provides the NSDI a virtual consolidated space, where search for
geographical information might be done by simple searches. A user interested in finding
geographical information fills only one request form, specifying geographic, date and text fields,
directed to all the registered servers.
There is a great diversity of user interfaces in the Clearinghouse, but all of them using the same
search approach. This is achieved through the use of descriptive vocabulary (metadata) and
common serach protocols, in addition to a registration system for metadata servers.
The Clearinghouse assumes the ownership and the sharing. In the Internet, similar activities
assumed a centralized approach, regarding the metadata management, by placing those in a
unique index, stored in one or in several of the replicated servers.
In an increasingly dynamic data management environment, synchronization among metadata and
their indices is becoming more and more difficult. This problem is noticed when, during a web
search, a "page-not-found" error shows up, indicating that the document has been moved.
The important thing, as already mentioned, is the joint management of the metadata and their
respective data in a same database. The organizatiuons that already manage spatial data and are
interested in disseminating them are best candidate for maintaining their own metadata. When
those are placed together with their data, in the same server, they are more susceptible to be
updated and detailed, than those using external indices.
The Clearinghouse is more than a registration catalogue for consultation and data retrieval,
including request mechanisms, data research maps and other information. The metadata perform
three roles:
·
Register of information's location;
·
Detailing information content and structure; and
·
Providing instructions regarding its proper use.
A traditional directory, similar to those found in modern libraries, provides only information
regarding the location of the information. At the age of digital information, the boundaries
2 Iniciou seu trabalho em 1992 e seu padrão de metadados foi aprovado em 1994.
ix
between directory and data are subtle, indicating an ample information management, which might
be explored by software or directly by the user, for multiple purposes.
3.5. INNITIATIVES PRIOR TO THE CLEARINGHOUSE CONCEPT
3.5.1. Alexandria Project
Subsidized by the National Science Foundation, it is supported by the Alexandria Digital Library,
installed in the Davidson Library, in California. It consists of an attempt to build a distributed
library of geospatial digital resources, using traditional library cataloguing methods and a unique
metadata repository. Its experience in the construction of t is considered important.
3.5.2. National Geospatial Data Framework (NGDF)
The British initiative aimed at the solution of a series of problems diagnosed by the Chorley
Report (Report of the Committee of Enquiry into the Handling of Geographic Information).
Among those problems, outstood improperly documented, incompatible and inconsistent SGDs.
Additionally, the initiative attempted to expand the use of geospatial data, viewing the
development of a data market and associated services and greater data reliability.
The initiative presents itself as the provider of an infrastructure whose mission is to improve data
quality and to make it available, viewing benefits for the society, business expansion and
improved governmental administration, by means of better services, more user-friendly.
The core of the Project is a metadata base, which supports an Internet service that allows searches
for SGDs produced by the public and private sectors. The adopted standard for metadata follows
closely the one proposed by the American initiative, even though maintaining a strong adherence
to the Draft ISO Standard 15046-15 (Geographic Information Metadata).
The initiative anticipates a standard for metadata transfers, in compliance with the SGML (ISO-
8879) and with the XML formats, proposed by the W3C. This standard regulates the coding of 42
descriptors, considered as basic while "search metadata", in the above mentioned formats.
3.5.3. Australia New Zealand Land Information Council (ANZLIC)
It promotes a model similar to the American, viewing the construction of a spatial data
infrastructure, for Australia and New Zealand. The 1994/97 strategic plan established the
guidelines for establishing a geographical data reference system, viewing the "maximization of
community accesses to the geographical information, taking into account privacy and
confidentiality".
Since 1995, a special group works with the metadata standards to be adopted, following closely
the American proposal, defining the mandatory elements and discussing the creation,
maintenance and control of this metadata base, as well as specific problems associated with data
transfer among the participating agencies.
x
3.5.4. Canadian Geospatial Data Infrastructure (CGDI)
An initiative of the Inter-Agency Committee on Geomatics (IACG) and of the Canadian Council
on Geomatics, to promote approximation of public and private interests in the production, use
and dissemination of the geospatial information. Five themes orient the initiative:
·
Partnership among federal, state and municipal governments and the industry.
·
Policies focused on the reduction of bureaucratic barriers to the sharing of data.
·
Geospatial standards developed in national and international levels.
·
Access to geospatial data trough network of distributed resources.
·
Digital base for classifying the geospatial data, to assure the efficacy of the added value
to the gathering and production of the geospatial information.
The latter issue is extremely important, as it faces one of the greatest problems that is the
adjustment and integration of the bases of coordinates of the geospatial information produced by
the diverse institutions.
A by-product of the Canadian initiative is the implementation of the Canadian Earth Observation
Network (CEONet), which views the storage and dissemination of geospatial data, through
homepages in a distributed network structure.
3.5.5. European Spatial Metadata Infrastructure (ESMI)
This initiative congreagates public and private organizations from European Countries, such as
Great Britain and Portugal, aiming at the establishment of a service for distribution of
geographical information. Supported by the INFO 2000 Program, of the European Union, the
Porject develops Internet mechanisms for linking different geospatial data reference systems,
through metadata (http://esmi.geodan.nl/).
3.5.6. LaClef
Other European Project within the INFO 2000, viewing dissemination of geographical
information for the public sector. From this metadata system (GDDD), which includes interfaces
in several languages, there is a desire to developa prototype able to demonstrate the possibilities
of data distribution through electronic market.
The GDDD, distributed for free in the Internet, is based on the American norms of the FGDC,
which was used as basis for developing the European norm, the ENV 12657. The standards for
metadata, regarding the consistent referencing of the georeferenced SGDs, are already well
consolidated. Today, it is noticeable the effort to define also standards to regulate the transfer of
digital SGDs between interested parties.
xi
4. CONSTRUCTION OF THE SÃO FRANCISCO'S INFORMATION NETWORK
With respect to ANA, this initiative plays an important role. This system of references will help
in organizing the great data bank, which is supposed to be installed there, even while these data
do not assume the desired digital format.
Between January and March of 2002, ANA developed a software that serves as an engine for the
geospatial data information network, aiming to use it in the Basins included in its "management
map".
This Activity, creating an information network for the São Francisco Basin, is in perfect harmony
with that strategy.
4.1. ADAPTING THE SOFTWARE FOR USE IN THE SÃO FRANCISCO BASIN
The São Francisco's Information Network (RISF) was developed based on an information system
previously built for a similar Project, in the Baia de Guanabara, posteriorly adapted for the
Paraíba do Sul Basin. During the elapsed time, it suffered graphical and operational
modifications, to be turned into the São Francisco Basin's site engine. By the time of closure of
the Activity, the RISF was hosted at ANA's server (risf.ana.gov.br).
4.2. PERSONNEL RECRUITING, SELECTING AND TRAINING
To form the field team, for surveying the existing data on the São Francisco Basin, in the
institutions selected to participate in the Activity, students from the Geography College at UnB
were invited. The choice is justified by their supposed familiarity with the type of information
targeted for the network. At the conclusion of the work, in December of 2002, a group of 15
people had been trained, at ANA and at other participating institutions.
4.3. DEFINITION OF THEMES AND DATA OF INTEREST TO THE NETWORK
The themes of interest for the data surveys in the initial stages were defined jointly with the
people from the Superintendence of Hydrologic Information (SIH/ANA) and reflect the Agency's
focus of interest. For the metadata, the source of were the FGDC standards, previously adopted
for the Baia de Guanabara.
xii
4.4. IDENTIFYING INSTITUTIONS WILLING TO CONTRIBUTE WITH DATA
Besides the ANA, the following institutions were selected:
·
The Company for the Development of the São Francisco and Parnaíba Basins
(CODEVASF), with whom a Technical Cooperation Term allowed the survey and
cadastration of the SGDs;
·
The National Electrical Energy Agency (ANEEL), that authorized the cadastration of
the information banks, while a cooperation term was prepared, to formalize the
partnership;
·
The Brazilian Institute of Environment and Renewable Water Resources (IBAMA), that
allowed cadastration of the databanks on the São Francisco, stored at the National
Center for Information, Environmental Technologies and Editorial Services (CNIA),
while the Cooperation Term was processed;
·
The Research and Mineral Resources Company (CPRM) that, in addition to the
documents found in the GEF São Francisco's banks, has been including new registers,
using its own personnel;
·
The National Institute of Meteorology (INMET), where no agreements were
established;
·
The National Department for Works Against Droughts (DNOCS), not considered at this
stage, because its Libraries are located in Fortaleza; and
·
The Water Resources Secretariat (SRH), whose documentation of interest had already
been transferred to ANA, where it had already been registered.
4.5. EVALUATION OF EXISTING DATA IN THE PARTICIPATING INSTITUTIONS
Once defined the data to be obtained from each one of the participating institutions, it is possible
to assess their extension and status, and to dimension the work necessary for its cadastration. The
result obtained was as follows:
ANA
There are more than 600 documents on the São Francisco, mostly Sub-Basin's Master Plans,
made by the SRH, in partnership with the States.
CODEVASF
It contains more than 3,900 documents, excluding those in duplicity and those in themes not
interesting to the work. Supposedly, more than 2,000 of those are important to the planning and
management of the Basin's water resources.
xiii
ANEEL
There are more than 10,000 documents on the São Francisco, but most of those consist of
geographical charts, in which the several sheets were used to cover the Basin, thus resulting in
few records.
IBAMA
It has been verified the existence of about 200 documents which might interest the RISF.
4.6. TRAINING PERSONNEL FROM THE PARTICIPATING INSTITUTIONS
One of the conditions for continuation of the Network, after the Activity ends, is the participation
of the involved institutions, which will be in charge of updating their data. The objective of this
training is to create a basis for this community. All partner institutions designated employees to
accompany the Activity's team in registering the data. However, there was no commitment
regarding those personnel continuing the job.
Expectation of changes in the heads of those institutions was, in part, responsible for an elusive
behavior, with respect to taking new responsibilities. And, to support the System, additional work
will be necessary. Besides, the institutions complain about the lack of personnel to carry out the
current routine tasks and this new activity would be an overload.
4.7. SURVEY AND CADASTRATION OF THE EXISTING DATA
This activity occurred at ANA, IBAMA, ANEEL, CODEVASF and CPRM. Some registered
documents were originally created by other agencies, which explains the longer list of
institutions.
4.8. CONSTITUTING RISF'S COLLEGIATE
Constitution of the RISF's Collegiate depends on the establishment of partnerships among ANA,
IBAMA, ANEEL, CPRM and CODEVASF, currently under negotiation. It is recommended by
this Activity that ANA is to carry on this work, to which the General Secretariat has already been
appointed. Thus, creating the Collegiate will be one of its duties.
For that, representatives will be appointed by each one of those institutions. This group will meet
periodically to discuss problems related to the network such as improvement of equipment,
servers and software.
xiv
5. RISF'S DESCRIPTION
5.1. COMPONENTS
The RISF is composed by three elements: A metadata base, a database and a "magazine" about
the Basin (see Figure 1).
Figure 1. RISF'S COMPONENTS.
The "metadata bank" is a conventional database, constituted by a collection of tables,
implemented by a relational database (SQL-server) administrator, storing the SGDs metadata.
This data is to be inputed by the participating institutions and allow consultation with diverse
criteria.
What is being called a database is actually a collection of SGDs in digital format, organized in
folders that might be stored in the server itself or in any other computer which allow access to the
System. The portion residing in the RISF's server is used for consultation, in which the SGD, as a
whole (not only the associated metadata), is considered.
The third component, the "magazine", presents content of diverse formats (text, images, sounds,
etc.), to anyone navigating the Internet. This component is suitable for presentation of articles
about the Basin, in the form of on-line magazine.
5.2. SYSTEM'S ACTORS
With the presented components, the RISF has in its orbit a group of actors which may be
classified in the categories of data/metadata suppliers, System administrator, "magazine" editor
and general public. The Report defines the roles of each one of them.
5.3. SUBSYSTEMS
The RISF System is composed by two large subsystems: The Public Subsystem and the
Administration Subsystem. The first one is that which may be accessed by anybody navigating
xv
the Internet. It includes the pages of the "magazine" and other services (searches, "talk to us",
forums, etc.). The homepage of the RISF is presented in Figure 2.
The Administration Subsystem (backoffice) is the one providing the facilities required for
maintaining the webpages, the metadata bank and the stored SGDs, for download. It is password
protected, and is displayed as shown in Figure 3.
Figure 2. RISF's Homepage.
xvi
Figure 3. Access to the Administration Subsystem (backoffice).
5. 4. OPEARTIONAL STRUCTURE OF THE PUBLIC SUBSYSTEM
The System's services and functions are the following:
HOMEPAGE The "welcome page" presents an introductory text, stating objectives,
administrators and main services.
NEWS Selection of additional pages, aimed at guiding the regular users through the newly
inserted pages.
SEARCHES Service providing direct access to the site's contents. As the site has three
distinct fields (metadata bank, database and "magazine"), the search engine also unfolds into
three, one for each field.
Metadata Search Implemented in the metadata bank (reference on the documents catalogued
in the System). It may use, cumulatively, the following criteria:
·
Geographical location, by Municipality;
·
Geographical location , by Basin/Sub-basin;
·
Selection by period (initial and final dates);
·
Selection by keyword; and
·
Selection by specific content.
xvii
Dearch by Content done over a sub-group of catalogued documents, with copies available for
download. This service reaches the contents of the files stored in the electronic address (txt,
htm/html, pdf, doc, xls, mdb and among other extensions). In addition to those files, the search
covers the entire content of the metadata bank, as it is reproduced in *.txt format.
Search in the webpages this search is done in the pages of the "magazine". It may also be
done in any page in the electronic address, using the "search in the address" field, located at the
bottom of the pages. The result of these searches are the indication of the pages containing the
desired text.
TALK TO US This page is used to exchange e-mails with the RISF's administrators. The
presented form allows the selection of one among several stopics.
GUEST BOOK Space reserved for users critiques and suggestions. Information included here
may be read in the backoffice, by the System's administrator, who will decide about its
publication in the RISF.
ADDRESS MAP List of the addresses in the site, with a brief description of each one.
BASIN'S MAGAZINE Set of pages and sections presenting the most focused themes in the
Basin. These pages might be of two types: indexed and non-indexed pages.
6. POSITIVE AND NEGATIVE ASPECTS OF DATA COLLECTION
6.1. POSITIVE ASPECT
6.1.1. Integration among Institutions
Considering that there is room for integrated actions, among institutions with interests in a same
territory, especially in the production and exchange of techonologies, this initiative encourages
the approximation and promotes the integration among them.
6.1.2. Economy
While those institutions are taking part in this joint efforts, they will be saving resources. Today,
each one of them generates the data they need, without considering whether they have already
been produced by other institutions. Those efforts could be shared, resulting in savings.
6.1.3. Improving quality of the information
Beig made available to a larger number of users, the information is submitted to improvement in
its standards of quality, with positive effects in form and content.
6.1.4. Quick location of the information
xviii
If the information keepers in each of the institutions made their "announcements" in the System,
using metadata, a search engine, such as the currently used one, will be able to locate them easily,
according to diverse criteria.
6.2. NEGATIVE
ASPECT
6.2.1. Difficulties in establishing partnerships
Although it wasn´t difficult to win the affinity of intermediate level employees at the visited
institutions, introducing the System and proposing partnership, at Directory's level, was
remarkably difficult, due to bureaucratic barriers.
These obstacles reflect the need for formal collaboration agreements (Term of Cooperation),
which take long time to be formalized, requiring great efforts by the directories of the involved
institutions.
6.2.2. Data survey difficulties
For implementing the data survey in the participating institutions it is necessary at least an office,
with a computer with access to the Internet. Each participant must privide these facilities to those
in charge of the work.
6.2.3. System updating difficulties
Even after the survey of the libraries, work is not done. New data will be permanently added to
the databanks, and a constant updating of the System will be necessary. In this case, that should
be done by the own data generator. Therefore, their personnel must be trained for and inquired
about the activity.
6.2.4. Difficulties in constituting and maintainning a Network administrator organism
As suggested in this Activity's Terms af Reference, it is essential the creation of a collegiate to
aggregate all the participating institutions, to coordinate all activities supporting the System. This
collegiate must be permanently in touch, via the Internet. Nevertheless, there should be meetings
or seminars, periodically, to establish guidelines viewing improvements in the initiative.
7. STATISTICS ON THE OBTAINED DATA
The Activity anticipated the search for two types of information, in the participating institutions:
The metadata and the documents themselves. The first are the descriptive references of the found
documents. The documents include articles and maps, in digital format, which allows the
distribution via Internet.
xix
7.1. OBTAINED
METADATA
Chart 1 presents statistics of the registered metadata, by hosting institution.
Chart 1. Number of registers of metadata, by hosting institution (01/10/02-31/03/03)
Institution
Registers
ANA National Water Agency
662
ANEEL National Electrical Energy Agency
97
CHESF Hydropower Company of the São Francisco
34
CODEVASF Company for the Development of the São Francisco and
Parnaíba Basins
338
CPRH Company of the Environment of the State of Pernambuco
3
CPRM Research and Mineral Resources Company
28
EBAPE Supply and Rural Extension Enterprise of the State of Pernambuco
5
EMBRAPA Brazilian Enterprise of Agriculture and Livestock Research
7
EMDAGRO - Enterprise of Agriculture and Livestock Research of the State of
9
Sergipe
Roberto Marinho Foundation Globo Television Network
1
FUNDAJ - João Pinheiro Foundation
34
IBAMA Brazilian Institute of the Environment and Renewable Natural
117
Resources
IBGE Brazilian Institute of Geography and Statistics
1
IHGS Historic and Geographic Institute of Sergipe
7
IMA Institute of the Environment of Alagoas
38
Tobias Barreto Institute
14
LABMAR/UFAL Nature and Ocean Sciences Integrated Laboratory
8
Ministry of Agriculture
1
Ministry of the National Integration
1
SAGRI Secretariat of Agriculture, Supply and Irrigation
3
SEBRAE Small and Micro-Enterprise Support Office
1
xx
SESC Social Services of the Commerce
6
SUDENE Superintendence of the Development of the Northeast
10
UFAL - Federal University of Alagoas
24
UFPE - Federal University of Pernambuco
25
UFRPE - Federal Rural University of Pernambuco
1
UFSE - Federal University of Sergipe
10
UnB University of Brasília
1
Total
1486
7.2. COLLECTED
DOCUMENTS
Practically all the researched libraries are of the conventional type, in hard copies, and for this
reason were not completely transposed to the databanks. Some documents which escaped this
conditions and were in digital format exceeded the recommendable size for insertion (limited to 2
megabites). Larger files would take too long to be downloaded, impairing the performance of the
server, to handle additional consultations.
8. CONCLUSIONS
8.1. ANA'S INTERNAL REINFORCEMENT
In view of the inherited data, in addition to the ones it generates, certainly ANA will benfit from
the services made available by the São Francisco's Information Network.
8.2. TECHNOLOGY
The Network developed by this Activity, while a technical tool, is in harmony with other
initiatives adopted by the United States and by European Countries. A simple consultation to the
Google.com, with the terms "Clearinghouse" and "geo", led to 16,900 links, and the first ones
were just like the present one.
The system being used as the network's engine is technologically up-to-date, written in ASP,
with an SQL-server database administrator. This System is installed at ANA.
xxi
8.3. CONTENT
Despite some initial difficulties, regarding the installation of the System, the Activity has
registered more than 1,400 documents, including one of the most precious technical data on the
San Francisco Basin, the Master Plans. The obtained results may be summarized as follows:
·
Libraries of ANA's and CNIA/IBAMA interest;
·
at ANEEL, cadastration covered 20% of all the data on themes of interest;
·
At CPRM and CODEVASF, most of the registered metadata had already been covered
by a survey done for the GEF São Francisco Project.
8.4. QUICK
SEARCH
To implement any study on the basin, it is always necessary to carry out an ample bibliographical
research, looking for indication of previous studies, in the institutions that executed them. This
requires a long amount of time, in detriment of the time to be used in the investigation itself.
Using the search engine of the RISF (http://risf.ana.gov.br), it is possible to find data of interest
in the libraries of the ANA, IBAMA, ANEEL, CODEVASF and CPRM. This will be extended to
other institutions, in the future. The replies given by the System include a classification of the
data, allowing a decision concerning its usefulness for use.
8.5. INTEGRATING THE INSTITUTIONS OPERATING IN THE BASIN
The implementation of the network creates the expectation that the participating institutions will
work in close integration and will effectively share their data, among themselves and with the
interested public. This will permit savings in production of the information required for the new
projects, which will be able to verify what already exists in the network, thus avoiding a duplicity
of efforts.
8.6. THE SITE OF THE SÃO FRANCISCO BASIN
As the network becomes a reference for those looking for information on the San Francisco
Basin, its electronic pages will be increasingly accessed by the interested public. Therefore, these
pages and links, besides organizing all the existing information on the Basin, will disseminate the
governmental actions in the area.
xxii
The available tools in the RISF will contribute to the constitution of a community with interests
focused on the Basin. However, this aspect must be taken into consideration in its administration.
9. RECOMMENDATIONS
9.1. INTERNALIZING THE RISF IN THE ANA
·
Strengthening ANA's Library and Technical Archive, subject subordinate to the
General Secretariat, appointed by the Directory as responsible for carrying on the RISF
at the Agency.
·
Making a contract for maintenance of the RISF System, at a total estimated cost of US$
500.
The Agency's strengthening will be achieved through the training in the use of the System and in
the filling of metadata registers. Two professionals are to be designated: An editor (with a
journalist profile and background in social communication) and a technician with expertise in
metadata quality control, with a knowledge of the type of information kept by the participating
institutions. It is anticipated that these two contracts will correspond to monthly wages of
US$800 and US$ 600, respectively.
9.2. CONTINUATION OF THE RISF
·
Persuading directors and training the personnel working with libraries and technical
archives, in the participating institutions (CODEVASF, IBAMA, CPRM and ANEEL),
with the objective of maintaining it up-to-date with the inclusion of new data.
·
Continuing with the establishment of Terms of Cooperation between ANA and the
institutions, viewing the formalization of existing partnerships. The instrument of
cooperation must include, as obligation of the parties, the permanent updating of the
metadata registers. This obligation should be part of the future network's internal
operational rules.
·
Aim at the constitution of a collegiate organ for the RISF's administration
9.3. INCREASING THE BASIN'S DATA COVERTURE
·
Establishing Terms of Cooperation with institutions with a potential partnership,
viewing the expansion of the Network.
xxiii
Only a small group of institutions of the Federal Administration, all located in Brasília, had their
databanks worked upon, in this initial stage. The survey carried out by the Activity's team
recommends partnership with the following ones:
·
Ministry of the National Integration;
·
Ministry of the Cities;
·
Ministry of Agriculture and Supply;
·
Brazilian Water Resources Association (ABRH);
·
National Operator of the Electrical System (ONS);
·
Thermal-Electrical Expansion Committee (CAET);
·
National Association of Municipal Sanitation Works (ASSEMAE);
·
National Research in Basic Sanitation (PNSB/IBGE);
·
Brazilian Enterprise of Agriculture and Livestock Research (EMBRAPA);
·
Hydropower Company of the São Francisco (CHESF), with headquarters in Recife;
·
National Department of Works Against Droughts (DNOCS), with headquarters in
Fortaleza;
·
Superintendence of the Development of the Northeast (SUDENE), in Recife;
·
São Francisco's Navigation Company, in Pirapora;
·
Regional Directories of the Company for the Development of the São Francisco and
Parnaíba Basins (CODEVASF); and
·
Federal Universities located in the basin.
Besides those, in the local level, the State Environment Organs (OEMAs), environment related
NGOs and research institutes in the Basin deserve mention, as well as the prefectures'
environment secretaries of the most important Municipalities.
9.4. EXTENDING THE SYSTEM TO OTHER BASINS
·
Looking for financial support for the projects.
xxiv
9.5. IMPROVING THE SERVICES RENDERED BY THE SYSTEM
Contact with other institutions, besides ANA, brought new ideas for the System:
·
Sharing responsibilities for production of the contents.
Assuming that each participant will have pages in the RISF, the CPRM requested autonomy to
create and update its own webpages. It wants to handle directly its contents, even if it is to be
reviewed by an editor. This request, if approved by the other participants, might be implemented
in the System, through changes in their programs.
·
Incorporating the Otto Pfafstetter's Method of Basin Classification
The ANA uses the basin ordering method developed by Otto Pfafstetter3. With this method, a
sub-basin is ordered in a way that its rank reflects its relative position with reference to other sub-
basins in the same basin. The RISF may use this condition to make its search engine more
effective, in searches by sub-basins.
·
Document search by geographical coordinate.
As data regarding the geographical coordinates of the registered documents pile up, the System
may offer another type of search criteria, based on those coordinates. Following the methodology
used by the FGDC, in the United States, a map can be used as a base, where the region in which
information is required is pinpointed. Once defined the precise geographical coordinates, the
System will be able to provide information on the requested item.
9.6. TERMS OF REFERENCE FOR CONSULTING SERVICES (STAGE II)
Description
The several institutions operating in the basin generate information which should be shared,
contributing to a greater efficiency of the projects using them, thus improving the quality of their
products and reducing costs.
The RISF, created through the GEF São Francisco Project (ANA/GEF/PNUMA/OEA), promotes
this initiative, using the databanks of four governmental institutions located in Brasília. Other
institutions, in other administrative levels, in other locations, also have libraries of importance to
the São Francisco, which could be added to the Network. These Terms of Reference have that
purpose.
Antecedents
3 PFAFSTETTER, O - Classifying Basins Codification Methodology Rio de Janeiro, RJ; DNOCS, 1989.
xxv
In February of 2001, the Baía de Guanabara's Information Center (CIBG) was created, inspired
in the Federal Geographic Data Committee (FGDC). The CIBG's experience encouraged, besides
the RISF, another system for the Paraíba do Sul Basin, still waiting for the data to be inputted.
Objectives
·
Create a database on the San Francisco Basin, with references to the information
distributed in the diverse institutions operating in the Region.
·
Creating data search and dissemination mechanisms, to promote their sharing among
current and potential users.
·
Aggregating the institutional users of those data into a collegiate type of organization
around the information network. This organization will be responsible for the
maintenance of the bank of references, assuring its continuity.
Participating Organisms
The new project will also be hosted by the ANA, in Brasília. Other Federal and State institutions,
detainers of information on the São Francisco Basin, will participate. All the institutions listed in
the previous item 9.3A will be invited to participate in the Network.
Other institutions might be added to the list, depending on the facilities they offer to the access to
their data. In the second stage of implementation of the System, the focus will fall on those
Federal organisms not included in the first stage, and on the governmental organs in the States
located in the Basin. Other institutions might be contacted in a third stage.
Methodology and Activities
After the RISF starts to operate and is accessible through the Internet, the work methodology may
be described in the following terms:
·
Establishment of contacts and Terms of Reference between the ANA and the
institutions with potential to participate with data of interest.
·
Evaluation of the existing data banks, at each of the participating institutions.
·
Constituting a team of "cataloguers".
·
Recruitment and selection of personnel for data cadastration.
·
Training the personnel.
xxvi
·
Improving the used sofware, to implement the suggestions presented in item 7.4 of the
Activity's Final Report.
·
Training the personnel at participating institutions.
·
Survey and cadastration of the data in each of the visited institutions.
·
Constitution of the collegiate of data users.
·
Providing means for implementing the project's infrastructure.
Products
·
Expanded Information Network (RISF).
·
Metadata bank created, with information of the San Francisco Basin.
Anticipated duration and costs
·
It is anticipated that the Project will be implemented in a 2-year period.
The anticipated costs are:
item
R$
US$
(US$ 1 = R$ 3.50)
Project coordination
108,000
30.857
10 trainees (including a coordinator)
185.800
53.085
Travelling expenses (tickets and per diem
36.600
10.457
allowances)
Rental of computers / Office material
30.000
8.571
Computer infrastructure
38.000
10.857
Contingencies 16.600
4.742
Total
415.000
118.570
xxvii