Tuesday, March 19, 2019

Ethiopian ET302 similarities to Lion Air JT610

Reports from Ethiopian investigators have implicated the same Angle of Attack (AoA) sensor malfunction that was observed on Lion Air. Lion Air captain AoA sensor read about 22 degrees higher than the First Officer AoA sensor (a large bias error). Initial assessment of Lion Air AoA failure modes did not reveal any obvious electrical malfunction that could create the bias. The simplest explanation was that the AoA vane had been bent, causing a gross aerodynamic offset in the readings. If ET302 encountered the exact same offset, with the likelihood of it being bent exactly the same way not being conceivable, some other factor must be in play. For example, the ARINC 429 representation of AoA uses two's complement fraction binary notation (BNR). It is interesting to note that bit 26 represents 22.5 degrees which would be the bit "flipping" between the Captain and F/O AoA values (all other bits would match).  Is it possible that the ARINC 429 word is getting corrupted (software defect)?  If the ET302 offset was something like 20 or 24, this theory falls apart.

With this in mind, what are the issues left to restore 737 MAX airworthiness?

AoA malfunction modes are well known. The following is a list from a paper prepared in part by Airbus.

What has become apparent is Boeing did not adequately test MCAS against these AoA failure scenarios. AoA readings that are valid but misleading were accepted by MCAS processing without any restriction. The fact that AoA did not respond as expected (e.g. recovery from approach to stall) but remained above the trip point perverted the MCAS logic to allow it to reset with any pilot manual trim command, upon which after about five seconds it would apply another nose-down trim command. 

The functional hazard assessment for MCAS appears to be based only on a single application of nose-down trim. Whether assessed as a 0.6 deg "threat" or 2.5 deg "threat" would pertain to the hazard classification.  Whether minor, major, or hazardous is the question.

If assessed against "unlimited", it would have been hazardous. That must also be assessed in the context of the aft column cutout switch being disabled.

The decision to disable the aft column cutout switch may not have been assessed as a hazard at all. Yet, this was the most threatening change made by Boeing, and very likely took away the last thread for survival on JT610 and I fear ET302. 

How MCAS malfunctioned is irrelevant, the possibility exists that MCAS could malfunction. The aft column cutout switch has been a long-standing safety feature. Human factors must be taken into account. In the scenario where the stabilizer is running away nose down, the pilot may only fixate on pulling the column back in response.  They may not be mentally capable to trim back or cutout the trim - instead they just keep pulling. That is where the aft column cutout switch saves the day. It very well could have been the last straw to save JT610.

Going forward, Boeing must demonstrate how MCAS malfunction will be limited in trim authority. Associated with this must be an assessment of the need to restore the aft column cutout switch.

Recognition of a stabilizer runaway has generally been associated with a continuous movement. MCAS has created a new runaway mode that is related to "uncommanded" motion. Speed Trim is another feature on the 737 that applies stabilizer trim in manual flight based on airspeed. Speed Trim stabilizer movements are intermittent. Pilots are used to the stabilizer moving under manual flight. MCAS runaway can be mis-interpreted as speed trim commands, and for this reason, may be ignored as a failure or hazard.  

Speed Trim has existed since the 737 classic series without any notable failure or incident. It was implemented as a single thread solution, which appears to be the model for MCAS. Given the MCAS scenarios, does Speed Trim bear scrutiny for the same concerns as MCAS - susceptibility to a single sensor malfunction? Should Speed Trim be redesigned along with MCAS to apply the same protections?

Finally, Boeing changed the 737 MAX cutout switches. There appears to be no awareness of the change, yet they function differently. Before, there was a switch that disabled the autopilot stab trim command alone. The flight crew could continue to use electric trim with the autopilot trim disabled. The 737 MAX cutout switches (PRI, B/U) are supposed to be thrown together. If so, why two switches? Was any assessment made on the workload or performance of a 737 MAX crew having to manually crank the stabilizer, when all other 737 models could have retained electric trim?

In summary

  1. The source of the AoA vane error must be found and fixed or explained. The flight deck effects of stick shaker, elevator feel shift module (EFSM) activation, Airspeed and Altitude disagree may overwhelm some flight crews (feared for ET302 in particular). The fact that these features have existed on 737 for decades without any reported incident is in marked contrast.
  2. MCAS authority and ability to reset must be rectified.
  3. MCAS use of a single input and without the ability to reject misleading data must be rectified. It is better for MCAS to be fail-safe on AoA disagree unless the flight scenario where MCAS is needed overlaps sufficiently to AoA disagree.
  4. The aft-column cutout switch must either be restored with MCAS, or there must be evidence that recovery from an MCAS failure would not benefit from it.
  5. Does Speed Trim bear a redesign to address its single sensor aspects and to make if fail-safe?
  6. Does a special alert need to be applied with MCAS application to allow the flight crew to recognize MCAS separate from Speed Trim?
  7. Should the ability to use electric stab trim be retained if the autopilot trim command malfunctions?
The slings and arrows against Boeing and FAA certification processes are flying. A criminal probe is somewhat counter-productive in any investigation, as it tends to chill the open dialogue foundation for air safety.  Everyone I know that is has been or is a Designated Engineering Representative (DER) or ODA Engineering Unit Member (EUM) is fully dedicated to safety and above reproach. Yet the test program for the MCAS and the FHA do not appear to have been done adequately, which is troubling.

Common Type Rating is a bit of a disease, much more a concern to me than the certification processes. There is only one right answer if Common Type Rating is the goal. Management and the design community would be highly motivated to stay in their lane and not rock the boat with anything that could jeopardize Common Type Rating. It seems possible information may not have been openly shared amongst all those involved. In particular, the existence of MCAS may not have been fully communicated internally and the hazard presented by MCAS may not have been fully communicated or understood. 

A follower has examined the 737 MAX Airplane Maintenance Manual (AMM), specifically chapter 27 autoflight. He reports there is no mention of MCAS. If this is true, it is a significant departure from prior standards. For example, speed trim is described in great detail. 

Stay tuned!

Peter Lemme

peter @ satcom.guru
Follow me on twitter: @Satcom_Guru
Copyright 2019 satcom.guru All Rights Reserved

Peter Lemme has been a leader in avionics engineering for 38 years. He offers independent consulting services largely focused on avionics and L, Ku, and Ka band satellite communications to aircraft. Peter chaired the SAE-ITC AEEC Ku/Ka-band satcom subcommittee for more than ten years, developing ARINC 791 and 792 characteristics, and continues as a member. He contributes to the Network Infrastructure and Interfaces (NIS) subcommittee developing Project Paper 848, standard for Media Independent Secure Offboard Network.

Peter was Boeing avionics supervisor for 767 and 747-400 data link recording, data link reporting, and satellite communications. He was an FAA designated engineering representative (DER) for ACARS, satellite communications, DFDAU, DFDR, ACMS and printers. Peter was lead engineer for Thrust Management System (757, 767, 747-400), also supervisor for satellite communications for 777, and was manager of terminal-area projects (GLS, MLS, enhanced vision).

An instrument-rated private pilot, single engine land and sea, Peter has enjoyed perspectives from both operating and designing airplanes. Hundreds of hours of flight test analysis and thousands of hours in simulators have given him an appreciation for the many aspects that drive aviation; whether tandem complexity, policy, human, or technical; and the difficulties and challenges to achieving success.


  1. Good thinking Peter. We both have problems with it being a typical sensor failure, now you might have found it. I'm sure Boeing has too.


    1. I hope so. I expect they have at least the vane removed from the Lion Air airplane before the JT043 flight. It was presenting invalid information. It will be telling if "no fault found", as then the problem is surely downstream.

  2. Have read a number of your article on this subject and am interested in knowing which A/D converter and ARINC 429 chip / chip sets are used in the 737 MAX primarily because more than one of the analog sensors exhibited what appeared to be "noisy" data. Depending on the parameters sensor ranges and voltage output spans, the issue could be flawed A/D converter; i.e. a hardware issue related to a lot number and not a software issue. Could you respond with any know info?

    1. I have no information. This is done in the Stall Management Yaw Damper (SMYD) on 737NG. I don't know what was changed with the MAX.

  3. I can't remember ever having seen single bit flips in serial transmissions. shift by one, errors in word (dis)assembly for transmit register width smaller than word width yes.

    Assumed AoA offset is the visible error. Apparently mentioned by crew were other "integral" air data errors?
    Q: is it a "decide source and pass through" or a "munge and pass on" data path solution?

    1. Memory over-write would be my first concern for bit flipping. But it is a reach for sure. The issue with Airspeed and Altitude was the erroneous AoA applied the wrong correction to the static pressure source, which caused the disagree with the other side based on the correct AoA.

    2. Thank you.
      So apparently the whole "lobe" works on that damaged AoA data.

      you'd see full word(byte?) erroneous value.
      I've seen funnies from flags being not saved in an interrupt routine but touched in the isr nonetheless.

      what kind of processor would I have to expect?

    3. I have no idea on any of the specific hardware. There are generic 429 chipset that do all the heavy lifting in an integrated manner. There are memory pages that are written by inputs and read by the processor. I spent my formative years machine coding and debugging. The idea that a bit flipped is not my favored answer. If we know what the bias was on ET302 it would help. But also take note that the vane replaced prior to JT043 was out of range - perhaps a very large bias was being applied. Too little information to know anything. I expect the Indonesian Report will have some answers on what they found with that vane, and any testing or analysis that was conducted. The concern is whether there is something else that needs to be fixed. The latest thoughts are around what has changed with the SMYD on the MAX, and whether it is a "function" now, and does it solely rely on ADIRU AOA via ARINC 429 Versus on NG where it had a direct resolver analog interface.

  4. It is impossible to restore the aft-column cutout switches. MCAS will activate in a high AoA regime where the column is fulled aft pretty heavily. Therefore keeping the aft switches would render MCAS functionless & useless.

    1. I know! But is it OK to ignore it then? Which takes precedence? I would argue we have lost two airplanes due to this feature as the last resort or saving grace, and that MCAS will never even save one. Did Boeing pursue every (any) other aerodynamic patch as an alternative?

  5. Hi Peter,
    I agree, that an aerodynamical fix would be the way to go. Right now they have the problem of having a bit of fly by wire but without the redundancy usually incorporated in such systems.


Blog Archive