Thursday, May 16, 2019

The Beat Goes On

On Wed, May 15, House Transportation and Infrastructure Committee, Aviation Subcommittee convened a Hearing with the NTSB and FAA regarding 737 MAX airworthiness.  Information was provided, that which was new, still confusing, important, repeated, wrong, left out, and offensive.


Representing the NTSB was Chairman Sumwalt.

Representing the FAA was acting Administrator Elwell

Information that had not been heard before

-->The FAA discussed a third change forthcoming to MCAS. Already known was that MCAS would be inhibited if AOA disagreed by more than 5.5 degrees and that MCAS will only trigger one time if AOA remains above the trip threshold. What is new is that MCAS stab authority will be limited based on available elevator pitch up authority - that the elevator will be able to apply 1.5g pitch up after one MCAS stabilizer trim command. This new feature addresses a major concern that MCAS, even when used as designed, could create a mistrim that overwhelms the elevator authority, which was the case in both accidents.  The final MCAS stabilizer trim on both accidents, done then "open loop", would now be limited and would have stopped the forced dives. A good thing!

-->NTSB professes that the Ethiopian government was not willing to collaborate with the NTSB until the FAA had grounded the airplane. 

Information that is Still Confusing

-->FAA stated, on more than one occasion, that MCAS was a critical flight control system. Critical is a charged word.

Keep in mind that the availability of MCAS is a different factor than considering the malfunction of MCAS. A critical function would generally be expected to be available. In this sense, redundancy would be a welcome attribute.

MCAS function is available from either the Left FCC or the Right FCC, The Speed Trim and MCAS functions default to the Left FCC but swap to the Right FCC and back again after each flight. If reset, the process starts on the Left FCC again. If the master FCC, the one with Speed Trim and MCAS functions, fails in some manner, the off-side FCC, the other side, can take over. In this sense, MCAS is fail-operative.

The conditions causing the FCC to switch master role are not entirely clear, and this is a factor in a true fail-operative design, where constant monitoring reveals a failure and engages a working channel automatically.

The likelihood for arriving into a flight condition requiring MCAS involves flying into the limit flight envelope, to which the probability is established in AC 25-7d as 10-5.


One can argue that MCAS failure is independent of flying into or beyond the Limit Flight Envelope.  Therefore probability of MCAS failed and MCAS needed is not that same as needing MCAS working all the time, and this gives relief on the MCAS architecture to be single thread; the backup of the other FCC is a bonus.

MCAS malfunction is the hazard presented by MCAS. The hazard is mitigated by the flight crew when recognizing MCAS malfunction and taking action: notably the runaway stabilizer checklist and using the cutout switches. If the pilot takes the responsibility for preventing a hazardous situation, than the system creating the hazard is afforded relief from protecting from the hazard itself.

The expectation that MCAS offered only one trim command per malfunction gets to constraining the hazard (whether it be 0.6 or 2.5 degrees of stab trim). Instead, MCAS commanded incessantly and this persistence was deadly. The question is, in light of what happened on all three (JT043, JT610, ET302) cases, where MCAS malfunction was difficult to detect due to stall warning and due to concurrent Speed Trim commands, its malfunction is clearly HAZARDOUS, You must accept the accidents as making clear human aspects and limitations, they cannot be simply ignored by assuming better training will solve the problem. 

If MCAS malfunction in HAZARDOUS, then MCAS must be built as a fail-safe design. There would be the Left FCC AND the Right FCC each making MCAS command calculations, and each required to agree before any stab trim command is issued. This is as relevant to Speed Trim System. Command mode stab trim is not subject to the same rule, as it can be disconnected.

What is being proposed is input signal processing (comparing AOA vanes), but otherwise, it is still a single thread, a single failure still can drive malfunction.

--> The expectation that AOA DISAGREE was a basic feature independent from AOA Indicator, as a maintenance alert, was apparently part of MCAS. If so, why was it not also driving MCAS test criteria? Did one engineer advocate for AOA DISAGREE for fear of AOA malfunction, yet the engineer that was responsible for MCAS never tested AOA malfunction???

Other Information that is Important

-->FAA confirmed that AOA DISAGREE was an advisory alert designed for maintenance alone. FAA stated AOA DISAGREE has no pilot action associated to it, and therefore is non-safety. FAA stated that AOA DISAGREE would not have prevented either accident.

FAA did not acknowledge that AOA DISAGREE would have been noted after JT043, as a maintenance item. That entry would have directed maintenance to the AOA vane, which would have quickly revealed the large bias error. The vane would have been repaired, and JT610 would have not crashed.

-->FAA confirmed that one stick shaker on, the other off, is an AOA DISAGREE itself.

-->FAA Technical Advisory Board (TAB) comprised of NASA, FAA Tech Center, Air Force is a part of the decision making process for ungrounding.

-->FAA Joint Authorities Technical Review (JATR) will not be a part of the ungrounding decision making.


Other Information that kept coming up

-->Use of video recording of the flight deck has not been adopted.

-->Use of telemetry has not been adopted. 

However, ADS-B was a part of the data used, which is technically telemetry. Telemetry is inherently unreliable and would only augment the crash recorders.

-->The Colgan Air Accident.


Information that was Wrong

-->You cannot ascertain AOA by looking out the window. This claim was egregious. AOA is particularly hard to visualize from looking outside while in a turn, especially when nearing accelerated stall.

-->High speed itself did not prevent manual trim on ET302. In fact, high speed would have aided trimming nose up in the situation at the end of ET302 flight, if elevator was not commanded nose up as well.

-->Boeing offers the AOA Indicator on all models.

-->Airspeed on Ethiopian got to about 380 knots before the dive, and accelerated to over 400 knots only during the dive. (note the blue, Right Airspeed values as accurate).


-->The FAA reiterated that the malfunction of MCAS was simply a runaway stabilizer, that when the yoke forces don't align with expectations, assume runaway stabilizer.

Speed Trim System adds nose-up trim as the airplane accelerates after takeoff. This is contrary to the trim command the autopilot would apply, in command, under the same circumstance; as well what a pilot would expect to do themselves. 

The pilot is trained to expect STS to apply stabilizer trim under manual flight and that it will impact control forces. 

MCAS malfunction was not a true runaway, it is not a continuous motion "to the stops". MCAS malfunction is intermittent. You cannot deny the fact that the Lion Air Crews (JT043 and JT610) could not identify the malfunction as a runaway. Yet the FAA STILL insists it is blatantly obvious.

Information that was Left Out

-->FAA alluded to the fact that failing to trim out column forces before hitting the cutout switch left Ethiopian in a grave situation, but did not elaborate. 

No one asked about the issues of manual trimming, especially with mistrim. It was disappointing that the Aviation Week story showing the issue in the simulator was not mentioned.

No one pushed back on the FAA AD instructions that just said you "can" trim out the forces, not that you "must" trim out the forces before hitting the cutout.

Information that was Offensive

--> There were repeated blanket attacks degrading the capability of foreign governments, airlines, and pilots.

NTSB said that Boeing must take into account the lowest common denominator when selling the airplane as a retort. That is the fairest response to such a blanket charge.

--> There was a statement alluding to the Ethiopian pilots falsifying airman logbook entries.

-->The flight crew capability and skill was questioned, specifically as being a major contributor to the accident.

The criticism made no mention of the issues with having stall warning go off, the difficulty in picking out an intermittent trim command, the ambiguity of the AD regarding the need to trim before cutout, or the instructions for dealing with manual trim while mistrimmed.

Mistakes were made. The flight crews could have prevented both accidents. There are contributing factors. There are other aspects relating to those contributing factors. But the primary cause of the accident has not been declared. It is hugely presumptive to shoulder the pilots as the guilty party. It is apparent that MCAS precipitated the danger, that it malfunctioned well outside of its boundaries, and in a manner not anticipated by its design.




Peter Lemme

peter @ satcom.guru
Follow me on twitter: @Satcom_Guru
Copyright 2019 satcom.guru All Rights Reserved

RSS: http://satcomguru.blogspot.com/feeds/posts/default?alt=rss

Peter Lemme has been a leader in avionics engineering for 38 years. He offers independent consulting services largely focused on avionics and L, Ku, and Ka band satellite communications to aircraft. Peter chaired the SAE-ITC AEEC Ku/Ka-band satcom subcommittee for more than ten years, developing ARINC 791 and 792 characteristics, and continues as a member. He contributes to the Network Infrastructure and Interfaces (NIS) subcommittee developing Project Paper 848, standard for Media Independent Secure Offboard Network.

Peter was Boeing avionics supervisor for 767 and 747-400 data link recording, data link reporting, and satellite communications. He was an FAA designated engineering representative (DER) for ACARS, satellite communications, DFDAU, DFDR, ACMS and printers. Peter was lead engineer for Thrust Management System (757, 767, 747-400), also supervisor for satellite communications for 777, and was manager of terminal-area projects (GLS, MLS, enhanced vision).

An instrument-rated private pilot, single engine land and sea, Peter has enjoyed perspectives from both operating and designing airplanes. Hundreds of hours of flight test analysis and thousands of hours in simulators have given him an appreciation for the many aspects that drive aviation; whether tandem complexity, policy, human, or technical; and the difficulties and challenges to achieving success.

7 comments:

  1. Thanks for yet another substantive analysis Peter Lemme. Elwell's argument on grounding the plane is nonsensical. He claims "data" is needed but the data used were interpreted with hand waving, nothing rigorous, and (as you've pointed out) turn out to be wrongly interpreted. So it was all a heuristic, which defaults to what pilots suspected in November and what was known before ET302 as being enough to have grounded the plane - then. It is astounding that Elwell is allowed to continue parroting the line about how the MCAS malfunction was easily recognizable as a runaway stabilizer, but, isn't this why an engineer would not have tested AoA malfunction, plain and simple? Lastly, I think it is clear that another model for these two crashes is needed because it is likely that the Boeing-FAA light emphasis on training and withholding information about MCAS is causally bound up with the failures by flight crews to prevent both accidents. If we look back at the early 727 accidents by today's standards, "pilot error" might also be characterized differently, it seems to me.

    ReplyDelete
  2. Thanks Peter, as always interesting and informative.

    But what do we have to do to include the (present?) lowest common denominator? At first thought two routes seem logical; - more scenarios must be handled by the FCC and we need to look at raising the level of the lowest common denominator.

    The FCC functions must be robust, i.e. fault tolerant on both sensor side (typically AOAs) and final side (typically Hstab + elevator). The FCC should be able to act as the third pilot, and do most of what the 'other two' should do. More automation with other words.

    We know that in technology nothing is 100% (safe). Computer systems like the FCC/FCS do one of three things; they do as expected (excellent), the do nothing (not that bad - I do the job), they act on their own (could be dangerous). We know that two of the three sensors can freeze in the same position, so the 'impossible is possible'.

    The other subject to look at is the 'lowest possible denominator'. What can we do to improve the 'flying skills' of pilots, in particular in handling the non-normal stuff. How much of the pilot's flying time (on the type) has ben reached by 'pushing buttons', talking to ATC and other crew members? Is the distanse between the lowest common dominator and the engineers/pilots who design, build and test aircrafts, 'an ocean'? That the LA pilots failed in saving the aircraft following what happened in the days prior to the accident, is one thing. But what had the EA pilots and EA learned from the LA accident?


    Peter, how many pilots (rough percentage) would recover from an MCAS runaway? We could test in a simulator and find out - we could have different classes - would be interesting. You may argue that the competitors may prepare/train prior the their test. Even more interesting, now we can verify(?) that flying skills are improved by training and focus.

    ReplyDelete
    Replies
    1. We are safest either having complete trust in automation, or none at all. It is the middle ground where danger lurks. The action to shut off automation when it malfunctions is paramount. But there was no switch that simply turned off MCAS, and the one switch that could do it surgically was removed. The answer is to fly manually, and that means to be proficient, and that takes practice.

      Delete
    2. The switch was removed because the motor that switch controlled was removed. AFAIK the same switch/motor situation is on the NG.

      The other problem is that arguments over what defines "runaway" are bringing out really stupid evaluations. Are pilots supposed to let the stab get to a stop and, if so, how long are they to wait to see that the stab is actually there? (maybe it pauses 0.00001 units just shy and therefore it's not really a runaway?) Just because there's momentary interruption after it has gone far enough that letting go of the controls will crash the plane doesn't mean it's not a runaway. It's a runaway the moment it's a problem to hold the desired AoA or pitch.

      Delete
  3. So MCAS was required for certification? If one were a technical writer, how would "runaway" be defined? Uncommanded, or appearing to not function like normal autocommanded operation? If it can't be succinctly put into words, how should someone be expected to recognize it?

    ReplyDelete
  4. I'd like to know:
    Was MCAS needed for certification? Probably yes.
    Would MCAS be needed if the NG were certified today? Let's see the AOA versus restoring moment data.
    How much do the 7 and 10 differ in terms of restoring moment approaching stall?
    How much do the NG and MAX differ?
    What is the range of the stabilizer incidence?
    What is the range of electric stabilizer?
    What is the new limited range of MCAS 2.0 on the stabilizer?
    What is the speed of electric pilot trim?

    ReplyDelete
  5. This SLF with a bit of long ago non computer test experience thinks the discussion is either deliberate pettyfogging or MOST probably a simple way of duplication for TEST various simultaneous failures of various kinds instead of the cosmic ray bit flipping ad naseaum discussions which have evolved. Can you shed some light on this ? Thank you

    ReplyDelete