This section describes how content is crawled and indexed by Krugle Enterprise. Once your data is crawled and indexed, it can then be searched by your organization's users using the client features in Krugle Enterprise. Krugle "Projects" define the collections of content that are searchable with Krugle Enterprise. Krugle Projects can be defined by an Administrator using the Krugle Administration Console.
Projects can refer to a single file or any/all files in any data repository. A data repository can be a file system, a source code management system, an issue tracking system, a database or any other information system that can be accessed over your network.
Factors to consider when setting Krugle ProjectsThe first step in setting up your Krugle Projects is deciding which content you want to define in Krugle Enterprise. Some of the factors that should be considered in defining individual Krugle Projects include:
- Access control requirements.
- Update (change) frequency of the content.
- The need to easily distinguish one body of content from a similar (branched/forked) set of files when searching with Krugle.
- Granularity of file modification and user access reports.
- Whether or not files from certain sources should listed ahead of (or below) similar files in Krugle search results.
- The need to easily filter search results by specific versions of version managed content.
- The need to identify persons or groups responsible for the creation, maintenance or protection of content.
- Descriptions that will best result in a match between user queries and the content.
Importance of defining Krugle Projects
The following scenarios illustrate how Projects in Krugle can be defined to ensure good search and reporting results:
- When Krugle users search on content, they can limit their search to include or exclude specified Projects.
- When Projects are defined, the Project creator can specify access control settings for each project. This ensures that only those users with proper credentials will be able to access content in the project.
- The Krugle overview page is organized by Projects. This allows you to see project activity trends related to the Projects you define.
- You can define a Project to control the scope of one or more of the reports in Krugle Enterprise.
- When users search on code they will see files from "boosted" Projects near the top of the list. Administrators can set the Project Rank priority to increase (or decrease) the visibility of selected Projects. This allows companies to ensure that code libraries, reuse components and other valued content collections have appropriate visibility in search results.
- If organizations have multiple releases or versions of a particular code or content, they can uniquely define each version as a separate Project. When you do this, it is best to include the version identifier in the Krugle Project Name to ensure that differences between projects will be clearly understood by users who will later find this information.
- Each content record or file has an easily accessible hypertext link to project information. This link gives users quick access to descriptions and references that help them fully understand the context of the content found in Krugle. As a result, users can quickly locate related source content, relevant documentation, and contacts for the content in question.
- User activity (views, downloads, etc.) can be aggregated by Project to monitor activity for reporting.
- Each Project contains a description; this description is used to assist in matching search results to content. The more unique information entered into a project description, the more likely users are to create relevant matches for the unique terms in the project description.
Projects are most commonly defined as collections of standalone content files, code or data records or libraries of content. These groups of files are commonly interrelated and maintained and accessed as a set of files. Most importantly, these file groupings fit the logical context and expectations of users who will be using Krugle.
Once you've decided how to organize your code into projects, collect the information needed to define each Project. At a minimum, the following information is required for each Project:
- A unique Project name that will allow you and your users to identify your content groupin in Krugle.
- Information (network domains, transfer protocols, access credentials, etc) that Krugle Enterprise needs to access the systems that manage the files that will be included in the project.
- A keyword rich description of the Project.
If you lack the time or information needed to group content collections into individual projects, the simplest way to make your information searchable in Krugle is to associate a single Krugle Project with all files in each file system, repository or version control system.
The downside of having only one Project per content repository is that less information (in the form of project metadata) is available to create effective queries and refine code search results.
You can add or remove Projects at any time after the initial configuration. This will allow for progressive refinement of Projects managed by Krugle Enterprise.
Some points to consider
- Krugle Basic comes preloaded with an SCMI connector for TFS Work Items.
- If Projects are specified for files that will be accessed through a Krugle Source Content Management Integration SCMI module (instead of directly from a SCM repository using the built-in client connector), it is first necessary to configure and install the appropriate SCMI script on a host system within your organization's network. Consult the Krugle Enterprise SCMI Integration Guide for more information.
Krugle Enterprise Data Access Mechanisms
For a given data repository, Krugle Enterprise can access files through one of two different mechanisms. Before defining a project in Krugle it is important that you understand these approaches:
1The first approach uses content access services that are built into Krugle Enterprise. These components support specific systems (e.g. Linux file system, Windows file system, Team Foundation Server for code only, SVN, Perforce, CVS) and are configured and managed directly through settings in the Krugle Administration Console.
2The second approach - using the Krugle SCMI connector interface - accesses data through connection services that are maintained outside of the Krugle Enterprise Appliance. The SCMI connector approach is not limited to specific SCM systems and is managed outside of the Krugle Administration Console. The only information required by Krugle Enterprise for a SCMI Data Repository is the information needed to access the SCMI connector code.
Krugle Basic Note - To download and install SCMI connectors, please contact firstname.lastname@example.org
A Krugle Project consists of one or more Data Sets. A Krugle Data Set is defined by reference to (i) a single Data Repository and (ii) a Data Set Location within that Data Repository.
This section explains how Data Sets are defined for different Data Repository types. Krugle supports a wide variety of repositories, including file systems, issue tracking systems and version control systems such as SVN, Microsoft Team Foundation Server, Perforce and Rational ClearCase.
Interactive Entry of Project Information
Adding a New Project to Krugle
To define a Krugle Project interactively, first sign in to the Krugle Enterprise Console and navigate to the Projects section:
- Sign in to the Krugle Administration Console from an internet browser.
- Enter the host name URL assigned to Krugle Enterprise, with ":8080" or ":admin" appended to the end of the URL. The Sign in dialog of the administration console will appear.
- Enter valid administration credentials (e.g. those specified during initial installation of Krugle Enterprise) to Sign In to Krugle.
- Click the Projects tab.
- Click the "Add New Project" link located in the upper right corner of the Projects Summary page
Specify Project Metadata
- From the Add/Edit Project page enter a name for your Project.
- OPTIONAL: Click the Advanced Settings link to access optional metadata fields for your Project. Complete the information in these fields to control how project results are ranked in Krugle search results and to help users access key online information related this project. This section also allows you to specify access control rules for information in the Data Repository and set the frequency of automatic updating. See the section labeled "Project Information" below for a description of each field.
- Click the Next button.
After specifying the Project Name and optional metadata, you define the Data Sets associated with this Project. The first step in adding a Data Set is to specify a Data Repoistory.
- If you haven't created a Data Repository for the data you want searched in Krugle, Select "Create New Data Repository" from the Data Repository dropdown list. If you wish to use a Data Repository that you've already created, select it from the Data Repository dropdown list.
- Then, follow the appropriate instructions below for either (i) a new Data Repository or (ii) an existing Data Repository.
When creating a new Data Repository, first specify the repository type and host location:
- Select your "Data Repository Type" from the dropdown list
- Enter the "Data Repository Host Location". This is the basic network address for the server that hosts your data repository. NOTE: check with your network administrator if you don't know the Data Repository Host Location.
- Click Next after entering the Location.
- Follow the link below which corresponds to your repository type for complete instructions on how to complete the connection details for that Data Repository type :
Creating A Data Set with An Existing Data Repository
If the Data Repository that you want to use for your Data Set has already been defined in Krugle:
- Select the Data Repository name from the dropdown list in the "Add a Data Set to this project" area.
- Enter the Data Set specification in the field(s) beneath the name just selected. This information specifies the location of data/content within the data repository. Click Next.
- Click the "Add Data Set" button. This will add your Data Set to the current project - as indicated by a list entry in the upper "Data Sets" section of the Add/Edit project page
Mass Import of Project Information
The Mass Import feature allows an Administrator to upload the definitions for multiple Projects with a single action. It is recommended that you verify proper operation of Krugle Enterprise and familiarize yourself with the interactive Project definition (previous section) before using the Mass Import feature. It is also recommended that when importing a large number of projects that you divide the mass import project collection into smaller groups - organized by repository. Start by importing several projects in a single file and increase the number of projects per mass import file as you progress.
To use Mass Import:
- Create a list of the Projects that you want Krugle Enterprise to manage.
- For each Project on the list, collect the Project Information described in the next section, "Project Information".
- Assemble all Project information in a mass import csv file. An Excel template is available. To access the template: the Projects section, click the Mass Import button and then click the "Download Sample Import Demo" link. Open the template in Excel, enter project information and then choose "Save as..". Select CSV and respond "Yes" when asked if you want to keep the workbook in the CSV format (and leave out incompatible features).
- Convert/save the table as a .CSV format (if needed).
- Click the Projects tab.
- Click the Mass Import button.
- Click the Browse button and specify the .csv file that contains the Project in-formation.
- Click the Import button.
The Mass Import feature can only be used to create the first reference to a Project. Once a Project has been defined in Krugle Enterprise, changes to that Project's specification can only be made through the Krugle Enterprise Administration Console
If a Mass Import file contain a Project that is already defined in Krugle, the instance of the duplicated Project in the Mass Import file will be ignored during the Mass Import process.
The following information is REQUIRED for each project defined in a mass import definition:
Project Name: A name that uniquely identifies a collection of content in Krugle Enterprise. This Project name can be used as a query filter by the end user and will be used in Project based reports and analysis. Whenever possible, use a descriptive name that will be familiar to users. A unique project name is required for each Project. Note: Krugle Project names are NOT case sensitive.
Data Repository Name: The network location specified by a Krugle Enterprise Administrator to identify the data repository or source code management system. The Data Repository Name is commonly defined by a Host Name and a Root Path.
Data Set Location: The Data Set Location unique identifies a Data Set in the Data Repository. The Location Identifier is listed below for each repository type:
|Repository type||Project Identifier|
|SVN or CVS||Relative path from root path to the project root|
|SCMI||Relative path from root path to the SCMI script|
The following information is used to fully specify a Data Repository name. This information is required ONLY for the first reference of a Repository name in the Mass Import file. This information is NOT required for a project entry if the Data Repository name for that project has been specified previously.
Data Repository Type The type of data repository (File system, CVS, SVN, Perforce, Team Foundation Server, SCMI* etc). NOTE: If the Data Repository Type is "SCMI", it is first necessary to configure and install the appropriate SCMI connector script on a host system within your organization's network. Consult the SCMI SDK documentation from Krugle for more information.
Login The login name required to access the Data Repository.
Password The password required to access the Data Repository system (or in the case of SCMI, the username and password required - if any - to access the SCMI host).
Connection Type The network communications protocol used to connect to and communicate with the Data Repository.
Host The name for the system that hosts the Data Repository.
Port The connection port used to communicate with the server that hosts the SCM system repository.
Data Repository Root Path The root path location of the Data Repository on the Host system.
The following information is optional in the Mass Import definition of a Krugle Project. Its specification will improve the quality of search results and aid user understanding. Because of this, it is recommended that this information be provided whenever practical.
Description This is a human readable description of the content in this Project. A one to two paragraph summary of the Project's capabilities, technologies, related Projects, Project dependencies, etc. will help future users of the Project better understand and use the information contained in this Project. The use of unique terms in the description will improve search matching for those unique terms.
Homepage URL The homepage or project page for this Project. Use this optional URL reference to provide users with one-click access to non-code related information, the Project wiki, etc.
Documentation URL This optional URL reference can be used to direct users to specifications, reference documentation and similar Project documents.
Knowledgebase URL This optional URL reference can provide users with a shortcut to an appropriate knowledge base from the Project description page. The knowledge-base can reference information such as development notes, hints, tips or discussions.
Bug Database URL This optional URL reference can provide users with a shortcut to the Project bug database from the Project description page.
Owner A person who can be contacted with questions or issues about this Project. Usually, it is recommended that you enter the email address of the person responsible the Project.
License A code license type to be associated with all files in the Project. This field is optional. By default, the License field is empty.
Access Control This setting specifies the LDAP groups that have access to this Project. When the LDAP server is specified, LDAP groups from the "Group domain" are identified in the "Access Control" list. Select a single group from this list or ctrl-click to select multiple groups. In order to see or access code files in a particular Project, a user must belong to one or more groups listed in the Access control settings for that Project. The default setting, --Unrestricted--, will allow all users to access the Project.
After initial LDAP group settings are changed, the project must be rebuilt (see chapter 3) for the changes to take effect.
The mass import feature is designed as an alternative to manual definition of projects. Important differences in capabilities between manual entry and mass import are:
- Mass import specifications will be ignored for any file already contained in the project list.
- Mass import projects can have only one Data Repository.