White paper on Maintenance And Troubleshooting
COURTESY :- vrindawan.in
Wikipedia
A white paper is a report or guide that informs readers concisely about a complex issue and presents the issuing body’s philosophy on the matter. It is meant to help readers understand an issue, solve a problem, or make a decision. A white paper is the first document researchers should read to better understand a core concept or idea.
The term originated in the 1920s to mean a type of position paper or industry report published by some department of the UK government.
Since the 1990s, this type of document has proliferated in business. Today, a business-to-business (B2B) white paper is closer to a marketing presentation, a form of content meant to persuade customers and partners and promote a certain product or viewpoint. That makes B2B white papers a type of grey literature.
The term white paper originated with the British government and many point to the Churchill White Paper of 1922 as the earliest well-known example under this name. Gertrude Bell, the British explorer and diplomat, was possibly the first woman to write a white paper. Her 149-page report was entitled “Review of the Civil Administration of Mesopotamia” and was presented to Parliament in 1920. In the British government, a white paper is usually the less extensive version of the so-called blue book, both terms being derived from the colour of the document’s cover.
White papers are a “tool of participatory democracy … not [an] unalterable policy commitment”. “White papers have tried to perform the dual role of presenting firm government policies while at the same time inviting opinions upon them.
In Canada, a white paper is “a policy document, approved by Cabinet, tabled in the House of Commons and made available to the general public”. The “provision of policy information through the use of white and green papers can help to create an awareness of policy issues among parliamentarians and the public and to encourage an exchange of information and analysis. They can also serve as educational techniques.
White papers are a way the government can present policy preferences before it introduces legislation. Publishing a white paper tests public opinion on controversial policy issues and helps the government gauge its probable impact.
By contrast, green papers, which are issued much more frequently, are more open-ended. Also known as consultation documents, green papers may merely propose a strategy to implement in the details of other legislation, or they may set out proposals on which the government wishes to obtain public views and opinion.
Examples of governmental white papers include, in Australia, the White Paper on Full Employment and, in the United Kingdom, the White Paper of 1939 and the 1966 Defence White Paper.
In Israeli history, the White Paper of 1939 – marking a sharp turn against Zionism in British policy and at the time greeted with great anger by the Jewish Yishuv community in Mandatory Palestine – is remembered as “The White Paper” (in Hebrew Ha’Sefer Ha’Lavan הספר הלבן – literally “The White Book”).
Since the early 1990s, the terms “white paper” or “whitepaper” have been applied to documents used as marketing or sales tools in business. These white papers are long-form content designed to promote the products or services from a specific company. As a marketing tool, these papers use selected facts and logical arguments to build a case favorable to the company sponsoring the document.
B2B (business-to-business) white papers are often used to generate sales leads, establish thought leadership, make a business case, grow email lists, grow audiences, increase sales, or inform and persuade readers. The audiences for a B2B white paper can include prospective customers, channel partners, journalists, analysts, investors, or any other stakeholders.
White papers are considered to be a form of content marketing or inbound marketing; in other words, sponsored content available on the web with or without registration, intended to raise the visibility of the sponsor in search engine results and build web traffic. Many B2B white papers argue that one particular technology, product, ideology, or methodology is superior to all others for solving a specific business problem. They may also present research findings, list a set of questions or tips about a certain business issue, or highlight a particular product or service from a vendor.
The technical meaning of maintenance involves functional checks, servicing, repairing or replacing of necessary devices, equipment, machinery, building infrastructure, and supporting utilities in industrial, business, and residential installations. Over time, this has come to include multiple wordings that describe various cost-effective practices to keep equipment operational; these activities occur either before or after a failure.
Maintenance functions can defined as maintenance, repair and overhaul (MRO), and MRO is also used for maintenance, repair and operations. Over time, the terminology of maintenance and MRO has begun to become standardized. The United States Department of Defense uses the following definitions:
- Any activity—such as tests, measurements, replacements, adjustments, and repairs—intended to retain or restore a functional unit in or to a specified state in which the unit can perform its required functions.
- All action taken to retain material in a serviceable condition or to restore it to serviceability. It includes inspections, testing, servicing, classification as to serviceability, repair, rebuilding, and reclamation.
- All supply and repair action taken to keep a force in condition to carry out its mission.
- The routine recurring work required to keep a facility (plant, building, structure, ground facility, utility system, or other real property) in such condition that it may be continuously used, at its original or designed capacity and efficiency for its intended purpose.
Maintenance is strictly connected to the utilization stage of the product or technical system, in which the concept of maintainability must be included. In this scenario, maintainability is considered as the ability of an item, under stated conditions of use, to be retained in or restored to a state in which it can perform its required functions, using prescribed procedures and resources.
In some domains like aircraft maintenance, terms maintenance, repair and overhaul also include inspection, rebuilding, alteration and the supply of spare parts, accessories, raw materials, adhesives, sealants, coatings and consumables for aircraft maintenance at the utilization stage. In international civil aviation maintenance means:
- The performance of tasks required to ensure the continuing airworthiness of an aircraft, including any one or combination of overhaul, inspection, replacement, defect rectification, and the embodiment of a modification or a repair.
This definition covers all activities for which aviation regulations require issuance of a maintenance release document (aircraft certificate of return to service – CRS).
The marine and air transportation, offshore structures, industrial plant and facility management industries depend on maintenance, repair and overhaul (MRO) including scheduled or preventive paint maintenance programmes to maintain and restore coatings applied to steel in environments subject to attack from erosion, corrosion and environmental pollution.
The basic types of maintenance falling under MRO include:
- Preventive maintenance, where equipment is checked and serviced in a planned manner (in a scheduled points in time or continuously)
- Corrective maintenance, where equipment is repaired or replaced after wear, malfunction or break down
- Reinforcement
Architectural conservation employs MRO to preserve, rehabilitate, restore, or reconstruct historical structures with stone, brick, glass, metal, and wood which match the original constituent materials where possible, or with suitable polymer technologies when not.
Preventive maintenance (PM) is “a routine for periodically inspecting” with the goal of “noticing small problems and fixing them before major ones develop. Ideally, “nothing breaks down.
The main goal behind PM is for the equipment to make it from one planned service to the next planned service without any failures caused by fatigue, neglect, or normal wear (preventable items), which Planned Maintenance and Condition Based Maintenance help to achieve by replacing worn components before they actually fail. Maintenance activities include partial or complete overhauls at specified periods, oil changes, lubrication, minor adjustments, and so on. In addition, workers can record equipment deterioration so they know to replace or repair worn parts before they cause system failure.
The New York Times gave an example of “machinery that is not lubricated on schedule” that functions “until a bearing burns out.” Preventive maintenance contracts are generally a fixed cost, whereas improper maintenance introduces a variable cost: replacement of major equipment.
Troubleshooting is a form of problem solving, often applied to repair failed products or processes on a machine or a system. It is a logical, systematic search for the source of a problem in order to solve it, and make the product or process operational again. Troubleshooting is needed to identify the symptoms. Determining the most likely cause is a process of elimination—eliminating potential causes of a problem. Finally, troubleshooting requires confirmation that the solution restores the product or process to its working state.
In general, troubleshooting is the identification or diagnosis of “trouble” in the management flow of a system caused by a failure of some kind. The problem is initially described as symptoms of malfunction, and troubleshooting is the process of determining and remedying the causes of these symptoms.
A system can be described in terms of its expected, desired or intended behavior (usually, for artificial systems, its purpose). Events or inputs to the system are expected to generate specific results or outputs. (For example, selecting the “print” option from various computer applications is intended to result in a hard copy emerging from some specific device). Any unexpected or undesirable behavior is a symptom. Troubleshooting is the process of isolating the specific cause or causes of the symptom. Frequently the symptom is a failure of the product or process to produce any results. (Nothing was printed, for example). Corrective action can then be taken to prevent further failures of a similar kind.
The methods of forensic engineering are useful in tracing problems in products or processes, and a wide range of analytical techniques are available to determine the cause or causes of specific failures. Corrective action can then be taken to prevent further failure of a similar kind. Preventive action is possible using failure mode and effects (FMEA) and fault tree analysis (FTA) before full-scale production, and these methods can also be used for failure analysis.
Usually troubleshooting is applied to something that has suddenly stopped working, since its previously working state forms the expectations about its continued behavior. So the initial focus is often on recent changes to the system or to the environment in which it exists. (For example, a printer that “was working when it was plugged in over there”). However, there is a well known principle that correlation does not imply causality. (For example, the failure of a device shortly after it has been plugged into a different outlet doesn’t necessarily mean that the events were related. The failure could have been a matter of coincidence.) Therefore, troubleshooting demands critical thinking rather than magical thinking.
It is useful to consider the common experiences we have with light bulbs. Light bulbs “burn out” more or less at random; eventually the repeated heating and cooling of its filament, and fluctuations in the power supplied to it cause the filament to crack or vaporize. The same principle applies to most other electronic devices and similar principles apply to mechanical devices. Some failures are part of the normal wear-and-tear of components in a system.
The first basic principle in troubleshooting is to be able to reproduce the problem, at wish. Second basic principle in troubleshooting is to reduce the “system” to its simplest form that still show the problem. Third basic principle in troubleshooting is to “know what you are looking for. In other words, to fully understand the way the system is supposed to work, so you can “spot” the error when it happens.
A troubleshooter could check each component in a system one by one, substituting known good components for each potentially suspect one. However, this process of “serial substitution” can be considered degenerate when components are substituted without regard to a hypothesis concerning how their failure could result in the symptoms being diagnosed.
Simple and intermediate systems are characterized by lists or trees of dependencies among their components or subsystems. More complex systems contain cyclical dependencies or interactions (feedback loops). Such systems are less amenable to “bisection” troubleshooting techniques.
It also helps to start from a known good state, the best example being a computer reboot. A cognitive walk through is also a good thing to try. Comprehensive documentation produced by proficient technical writers is very helpful, especially if it provides a theory of operation for the subject device or system.
A common cause of problems is bad design, for example bad human factors design, where a device could be inserted backward or upside down due to the lack of an appropriate forcing function (behavior-shaping constraint), or a lack of error-tolerant design. This is especially bad if accompanied by habituation, where the user just doesn’t notice the incorrect usage, for instance if two parts have different functions but share a common case so that it is not apparent on a casual inspection which part is being used.
Troubleshooting can also take the form of a systematic checklist, troubleshooting procedure, flowchart or table that is made before a problem occurs. Developing troubleshooting procedures in advance allows sufficient thought about the steps to take in troubleshooting and organizing the troubleshooting into the most efficient troubleshooting process. Troubleshooting tables can be computerized to make them more efficient for users.
Some computerized troubleshooting services (such as Primefax, later renamed MaxServ), immediately show the top 10 solutions with the highest probability of fixing the underlying problem. The technician can either answer additional questions to advance through the troubleshooting procedure, each step narrowing the list of solutions, or immediately implement the solution he feels will fix the problem. These services give a rebate if the technician takes an additional step after the problem is solved: report back the solution that actually fixed the problem. The computer uses these reports to update its estimates of which solutions have the highest probability of fixing that particular set of symptoms.
Efficient methodical troubleshooting starts on with a clear understanding of the expected behavior of the system and the symptoms being observed. From there the troubleshooter forms hypotheses on potential causes, and devises (or perhaps references a standardized checklist of) tests to eliminate these prospective causes. This approach is often called “divide and conquer”.
Two common strategies used by troubleshooters are to check for frequently encountered or easily tested conditions first (for example, checking to ensure that a printer’s light is on and that its cable is firmly seated at both ends). This is often referred to as “milking the front panel.
Then, “bisect” the system (for example in a network printing system, checking to see if the job reached the server to determine whether a problem exists in the subsystems “towards” the user’s end or “towards” the device).
This latter technique can be particularly efficient in systems with long chains of serialized dependencies or interactions among its components. It is simply the application of a binary search across the range of dependencies and is often referred to as “half-splitting”. It is similar to the game of “twenty questions”: Anyone can isolate one option out of a million by dividing the set of alternatives in half 20 times (because 2^10 = 1024 and 2^20 = 1,048,576).
One of the core principles of troubleshooting is that reproducible problems can be reliably isolated and resolved. Often considerable effort and emphasis in troubleshooting is placed on reproducibility … on finding a procedure to reliably induce the symptom to occur.
Some of the most difficult troubleshooting issues relate to symptoms which occur intermittently. In electronics this often is the result of components that are thermally sensitive (since resistance of a circuit varies with the temperature of the conductors in it). Compressed air can be used to cool specific spots on a circuit board and a heat gun can be used to raise the temperatures; thus troubleshooting of electronics systems frequently entails applying these tools in order to reproduce a problem.
In computer programming race conditions often lead to intermittent symptoms which are extremely difficult to reproduce; various techniques can be used to force the particular function or module to be called more rapidly than it would be in normal operation (analogous to “heating up” a component in a hardware circuit) while other techniques can be used to introduce greater delays in, or force synchronization among, other modules or interacting processes.
Intermittent issues can be thus defined:
An intermittent is a problem for which there is no known procedure to consistently reproduce its symptom.
— Steven Litt,
In particular he asserts that there is a distinction between the frequency of occurrence and a “known procedure to consistently reproduce” an issue. For example, knowing that an intermittent problem occurs ” within” an hour of a particular stimulus or event … but that sometimes it happens in five minutes and other times it takes almost an hour … does not constitute a “known procedure” even if the stimulus does increase the frequency of observable exhibitions of the symptom.
Nevertheless, sometimes troubleshooters must resort to statistical methods … and can only find procedures to increase the symptom’s occurrence to a point at which serial substitution or some other technique is feasible. In such cases, even when the symptom seems to disappear for significantly longer periods, there is a low confidence that the root cause has been found and that the problem is truly solved.