Wednesday, February 7, 2007

Measurements and Statistics for Stovers

The Berkeley Darfur stove development experience, comparing the efficiency of different designs, indicates that the noise in the measurements of the fuel use will make the clear demonstration of improvements in efficiency difficult – even with a very good testing method (designed just to reduce measurement uncertainties) a difference of 30% in performance between two tests (with different designs and/or cooks) may not be enough to show improvement. The statistical aspects of implementation are non-trivial and deserve plenty of attention, and good data retention practices.

In manufacturing we often have the need to compare two “populations” to see if there is a difference – say because the products are made on two different injection molding machines, are made during different shifts or at different factories, or because we are bringing out a new design. Here is how we go through the process, with just enough statistics to do the job:

  • Develop a test – which is a carefully written procedure and a testing station – that seeks to minimize test to test variation, measuring the right things and using what we know about the product’s end application to make the test robust and reliable. For a stove we know that the start up phase is awkward so the test must be long enough to reduce the impact of this, and we know that losses from the pot due to evaporation are a problem because this changes from test to test. So we might decide that boiling more liters of water (a typical cooking surrogate – at this phase there may be no need to duplicate actual cooking, since that test comes later) for a longer time will reduce test-to-test variation. The test should be designed to most easily highlight the differences between a good product and a worse one, so extreme conditions may be considered – in the case of a stove perhaps a consistent wind is used (if this is a realistic condition) since this has been show to differentiate similar stoves.
  • Analyze the test itself - a test is not good enough if there are too many variables impacting the results We must try hard to make sure that the differences between tests are due only to random variations, such as weather conditions, slight operator differences, random construction variations between stoves, etc. Systematic variations might include gross differences between manufacturing shops, poorly trained operators, or changes in construction materials – these are things that we are testing for (lapse in quality) so the test itself should eliminate their impacts as much as possible… In statistics we say that the test results should follow a “normal distribution” (also called Gaussian http://en.wikipedia.org/wiki/Normal_distribution), so that the variable that is being tested for – say the fuel consumption used for a specific task, such as boiling 5 liters of water for 45 minutes - has a central value that defines an average, and a bell shape due to the random fluctuations about this mean. There are many other types of population distributions (bimodal, etc.), but the normal distribution is the one that allows us to use the mean and standard deviation (and other traditional statistical tools) to describe its mathematical properties. In Excel you use the "frequency distribution" function to separate all your data points (each one is a value for fuel consumption from an individual test) into small ranges (or "bins") and then you plot the range values against the number of test results in each bin (just like in the figure). If it doesn't look like this figure then your testing method needs work for it to be reliable.
  • Demonstrate that the testing process is “well behaved” – the basic tool used is the “gauge R and R analysis” – examining how repeatability (the same person, testing over and over) and reproducibility (different people testing) affect the outcome of controlled testing. For this test with a particular stove, several cooks perform the same task several times each, then these results (say time to boil or amount of wood used) are put into a special spreadsheet to mathematically determine how much of the observed variation (ideally there should be none – it is the same stove after all) is due to the testing method and stove, and how much is due to the differences between the techniques of each cook. It’s a challenge, but this kind of approach – technical without being obsessive – may be the way that engineers can help most; in this manner new stove designs and implementation approaches can be clearly shown to be effective and funding agencies can feel that they can invest with confidence that they will get results and people will be helped. Until then, we will only be able to make crude wishful projections about what the potential impact of a project will be – and this is not good enough. More detail on R&R measurements (from NIST) at http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc4.htm but remember that you just have to plug your measurement results into a spreadsheet and the conclusions fall out.
  • Improve the testing if necessary – if the distribution is not normal, or the gauge R&R results are too large (i.e. we can’t reasonably tell if one stove is better than another, even though one should be) then the test must be modified. Examples of changes might be a clearer written procedure, better operator training, or actual test changes (such as narrowing the allowable temperature range in a simmer test or using a longer boil time, to reduce startup/transient effects or brief operator errors). Remember, a poor test wastes time – it may make more testing necessary (so that results can be averaged) or require superfluous testing (like continual testing of the control stove). The goal is the minimum testing that demonstrates the smallest difference between dissimilar stoves. The results of each individual test must be trustworthy, so that the results are as clear as they can be. The R&R method is well worth the effort – in the case of a 3x3 evaluation (3 operators test the same stove 3 times each – done just one time, just to evaluate the test procedure) it may take 1 day for the R&R but it cuts the new stove testing time in half (because the control stove no longer needs to be tested every time) for the rest of the program life! And regular R&R testing at different locations (such as at Khartoum and Nyala and the IDP camps) eliminates uncertainty about place-to-place testing variations – a real worry when testing is geographically distributed and can only be lightly supervised.

With Brian in Khartoum and now making stoves for Darfur, it is worthwhile talking about new product introduction quality control – there are several ways to make sure that your new stove performs in every way like you designed it to, and you don’t necessarily have to build a fire in each one. In manufacturing we use several statistical techniques to prove within a certain degree of confidence that your manufacturing effort is good. The process goes something like this:

  • Determine what it is about your stove that makes it work right – tight construction, the right pot gap width, weight (are the right materials used throughout?), time to boil X liters of water (firepower), amount of wood for a specific task (efficiency), etc. You are correlating some few things that says that new stove owners will be pleased, you want to make the correlation as reliable as possible, and you want the measurements be as easy and quick as possible.
  • For just a few stoves you’ll be handling every one to see if it feels right – if you know how your stove tests well enough then you should be able to tell bad product, but LOTS of stoves means too much handling, so this start up phase is the time to practice inspections and testing. Do sloppy stoves mean bad efficiency and short life in homes? How sloppy is too sloppy? Nothing is perfect so some issues are OK (small gaps, poor joints, etc.) and have to be passed, and others are unacceptable; don’t sweat the small stuff.
  • Decent measurements might be weight, air gap width using a standard pot (or no pot – just a measurement), and fuel efficiency. Often you can do the efficiency measurement only when you suspect it needs to be tested (new manufacturer, new materials, different city, etc.) – this test is the hardest and has the most error. In any case you should do a gauge R&R of the measurement, to see that the results are normally distributed so that you trust your measurements – how efficient your stove is will be something people want to talk about. Accumulate measurement numbers all the time and they will add up to a good record that someday you can publish to show the stove’s effectiveness.
  • Set specifications – create some measurements that show obviously that something is either acceptable or not. The stove maker can use them, you can, people in different cities can. Any spec is better than no spec – you can’t talk about quality unless you have even a tiny amount of information on how things vary. The challenge of doing this in strange countries is there, but maybe you will get lucky and everything will be perfect - there is no variation to measure!
  • The best quality technique is 6 Sigma (http://en.wikipedia.org/wiki/Six_Sigma, and see the section on DMAIC) – only a few defective parts in a million allowed. Here you measure parts even as they are made, so bad stoves are never created. If the parts are handmade and there are weighing scales available, then using can be quick method of checking craftsmanship. And there are lots of variations of 6 Sigma – I practice the “lean” version where the main goal is to eliminate as many parts and operations during manufacture as possible. Less parts = less things to worry about. Under ideal circumstances you do very little inspecting at the end, since why would there be a bad stove? But if you can’t be there while things are made, all you can do is simplify the design, and ask to get the first few asap.
  • The government and other folks use the AQL method – specifying an Acceptable Quality Level – where there are tables to tell you how many out of each thousand to test, so that you don’t have too many bad stoves (all based on the statistics of a normal distribution). This website http://www.sqconline.com/ lets you plug your desired quality level into a on-line form and it tells you how many stoves to test, but you still have to decide what is an effective measurement. I have used this method but am not excited by it – it assumes that you accept bad parts, and that you don’t have good enough control of manufacturing. And this is true if you don’t have a good enough relationship with your machine shop.
  • Continuous improvement is always a part of the equation – your first stoves need to be good or people will be disappointed and then you’ll spend the rest of your life answering questions and fixing things. Half of quality is about not wanting to waste time like this. Nip problems in the bud as quickly as possible, bring things to the attention of your stove man and emphasize why problems hurt his chances of future business. Keep reminding them of their defects – details about problems will hopefully keep them thinking about improving. Unfortunately, this is about the time when you need competitors, so that you have negotiating power. Having a second supplier is a great thing, because distributing business between them (even if one is a little more expensive) keeps both on their toes – you can have one make a much smaller percentage, but not having two will cause you problems eventually.
References on stove measurements and performance statistics:
Next: Links just on stove implementation experiences from around the world

No comments: