Friday, March 15, 2019

What have we learned this week?

The crash of Ethiopian ET302 brought a tragic beginning to what must be one of the worst weeks in aviation, ending with the grounding of the Boeing 737 MAX. What does Lion Air JT043 and JT610 teach us? Did we, as an industry do everything we should after JT610? Accusations of impropriety levied at Boeing and the FAA seemingly are always on the ready. Adding to the week was an update on Atlas 5Y3591 which at first seemed to be one thing, but with a slight wording change thankfully seems to be another altogether.

The impact of Ethiopian ET302 must have been nearly vertical in profile. The end of the short flight came with immense force, literally leaving just a hole in the ground about 200 feet on a side.

The FDR and the CVR memory units are under analysis at a BEA facility. Investigators may get their first look at the data by this weekend.

Late breaking reports that the stabilizer trim jackscrew was located and was observed in a "troubling" configuration. A steep dive would suggest it was in a severe nose down position.

Lion Air JT610 encountered severe nose down trim due to MCAS command and succumbed in a steep dive.



Looking for any similarities from the ET302 data to JT610/JT043, I have noted that the pitch down at what appears flap retraction altitude, causing a loss in altitude and build up of speed, was evident in all three flight records.

ET302 struggled at the point of taking off. The initial climb out is usually done in a fixed configuration up to about 1000 feet above the field. If an excessive Angle of Attack (AoA) reading was encountered, as in JT610, one stick shaker would go off at liftoff and alerts would indicate airspeed and altitude disagree. If the pilot flying is the one with the stick shaker, and if the airspeed was uncertain, the natural tendency might be to nudge the stick forward and lower the nose. This could explain the shallow climbout and brief level-offs. 

At some point, both pilots should be able to figure out which airspeed is correct by reference to the standby instruments. The corresponding airspeeds and lack of stick shaker on the other side should offer some confidence the airplane is not stalling, that the indications are erroneous. 

Upon flap retraction, if the AoA vane was reading too high and if the MCAS was referring to that vane, then MCAS would trim the stabilizer down. The 400 foot descent after climbing 1000 feet on ET302 matches both JT610 and JT043 on this point.

What happens next on ET302 is that the pilot does not seem to be paying attention to airspeed and is flying level very low to the ground. After about 30 seconds the plane climbs away to the end of the data from Flightradar24.com. We don't know why the pilot flying did not continue to climb right away. One explanation is the pilot was heavily distracted. 

What did we learn from JT610?

The AoA vane reading was about 22 deg higher than it should be. Otherwise it moved up and down in sync with the other AoA vane.  The source of this error is not known. I had concluded it must have been bent, presenting a gross aerodynamic offset. The proposition that a second 737 MAX has encountered another vane with a significant offset is troubling. I confess, I don't track AoA vanes. But could two fail this way with this frequency?  

The source of the AoA vane error is a significant unknown.

The systems that consume AoA accepted the value as valid, because it was received as valid data, and the value was in an acceptable range. A brick-wall architecture purposely does not cross-compare inputs, to prevent common-mode failures.  

A fail-safe design uses two channels to prevent a single fault from producing erroneous commands.

What is missing is an absolute test, or a third AoA vane. The absolute test I envision is looking at the AoA value in the takeoff roll at around 80-100 knots airspeed. At that point the AoA vane would be set by the attitude of the gear. A gross AoA error would be relatively easy to detect at this point. Without knowing if the AoA vane is stuck or just with aero offset, there should be no use of the vane for the duration of that flight.

A single AoA vane with a gross error should not trip stick shaker and should not be used for static pressure source compensation. 

The presence of an AoA disagree should be annunciated for crew awareness and for maintenance.

The presence of an AoA disagree should be accounted for in fault isolation to associate with the other flight deck effects.

Boeing is apparently preparing a software update to address the MCAS issues that were revealed in JT610.

The first issue was letting MCAS operate on a single vane. Boeing can revise MCAS to be a legitimate fail-safe design by fully utilizing both Flight Control Computer (FCC) channels in a brick-wall fashion. Any software patch to stub in a voted AoA vane on one side may not be fully fail-safe. As each FCC has a dual processor, both processors should agree for any command to be issued, yet this still may not be as compelling as using both FCC channels.

The second issue is to do with the authority of MCAS. From my perspective, MCAS should have whatever authority it needs to meet its functional requirements. The extent of that authority would have to accounted for in the hazard assessment. As the authority increases, so does the hazard.

What was most troubling is the ability for MCAS to re-trigger while the AoA never showed a recovery since the first trim command. It was this aspect that extended MCAS to trim to the limits. It was this persistence that eventually wore down JT610 crew. MCAS should get one "trim step command" and then it needs to stand down until whatever happened is resolved - it doesn't get to keep trimming.

The last issue with MCAS is the most significant. The flight condition that MCAS is designed for (accelerated stall) involves the column being pulled back when MCAS needs to trim nose down. This creates a conflict with the aft-column cutout. The aft-column cutout stops a nose-down "mistrim" command.

The aft-column cutout involves a signal from the column cutout switch to the FCC. The FCC software uses the input to turn off its trim commands.

The FCC removed the aft column cutout feature from MCAS commands. MCAS can trim nose down in spite of the aft column cutout being asserted.  This is a grave error and must be resolved.

Human factors cannot be denied. In the event of a nose-down stabilizer uncommanded motion, the pilot will pull back on the column. That much is certain. As the airplane pitches over, the pilot will pull back harder. With the aft column cutout, the trim command will cease, the situation will stabilize, the pilot can compose themselves, trim the stab back up, and maybe hit the trim cutout switch.

Without the aft column cutout, some human beings may freeze up.  They may not get to the step of trying to trim the stab back up, but instead fixate on pulling the column back.  I fear this is exactly what happened in JT610.

The column cutout switches have served aviation well. Boeing NEVER should have removed aft-column cutout for MCAS. Aft-column cutout must be restored. If that means MCAS cannot work as planned, THEN SO BE IT, take MCAS out.

The FAA issued an AD after JT610 crashed. The AD brought attention to the AoA vane error indications and the use of the cutout switches to disable uncommanded stabilizer motion. In the four months since then, I wonder how many line pilots were exposed to these circumstances in a flight simulator. What we fear in ET302, in spite of the AD, is that the pilot was not able to cope with the situation. The FAA expected that the crew in ET302 would recognize the stick shaker and disagree alerts as an AoA vane issue and would be ready to cutout the stabilizer trim if MCAS did anything. 

What is feared is the ET302 crew made no correlation to the AD, reacted to the stick shaker and disagree, become overwhelmed with workload, and ultimately just pulled on the column to fight MCAS, never hitting the cutout. That thinking flies in the face of the AD. For this reason, the FAA wanted to see data to prove that the AD had not been effective. We don't have that data as I write this.

The FAA was roundly criticized for not rushing to ground the 737 MAX after ET302 crashed. The issue at hand is credible data. The FAA was waiting for confirmation before taking any action. The issue was politicized. Aviation is driven by data and regulation, not conjecture and fear. 

Rushing to judgement is the issue here. We always say wait for the investigation, don't speculate. Yet the world seemed to flip and say the opposite, err on the side of caution. Aviation inherently errs on the side of caution, because we are data driven.  Without the data, you don't take the action. Going forward, I wonder if this is the new reality, where open speculation outside of the official investigation becomes actionable.

Associated to the FAA criticism are complaints that the FAA is too cosy to Boeing when it comes to regulation. As a point of reference, I was an FAA Designated Engineering Representative (DER) as a Boeing employee for 747-400 and 767 satellite communications, data link, and flight recording (including Flight Data Recorder, FDR). I trained the DERs that would follow me. Boeing Airworthiness was a parallel organization that sliced across all the programs and disciplines to represent the DERs and to care for them. Since the the Organization Designation Authorization (ODA) process has come to be, the DERs became Airworthiness Representatives, now Engineering Unit Members (EUM). The FAA made the change to save industry money.

Any assertion that the EUMs are not doing their best to do the right thing is particularly galling.  Please don't disparage these individuals categorically. Please accept that they are humans that can make mistakes. There are more layers between the EUM and anyone on the FAA that has pertinent system knowledge than in days gone by. It could be better, but the FAA needs to take the lead on that, and our government seems wholly focused on abandoning regulation.

In the meantime, take more faith on the EUMs and the ODA integrity.

Finally, the NTSB update on 5Y3591 left open a path to suggest the plane could have been crashed intentionally. I greatly regret taking the wording from the NTSB too literally and assumed that was what they meant.  I had a long discussion with a great pilot associate that pieced together another thread. His inputs and the NTSB wording change leave me much less convinced of any malicious act.  

Stay tuned!


Peter Lemme

peter @ satcom.guru
Follow me on twitter: @Satcom_Guru
Copyright 2019 satcom.guru All Rights Reserved

Peter Lemme has been a leader in avionics engineering for 38 years. He offers independent consulting services largely focused on avionics and L, Ku, and Ka band satellite communications to aircraft. Peter chaired the SAE-ITC AEEC Ku/Ka-band satcom subcommittee for more than ten years, developing ARINC 791 and 792 characteristics, and continues as a member. He contributes to the Network Infrastructure and Interfaces (NIS) subcommittee developing Project Paper 848, standard for Media Independent Secure Offboard Network.

Peter was Boeing avionics supervisor for 767 and 747-400 data link recording, data link reporting, and satellite communications. He was an FAA designated engineering representative (DER) for ACARS, satellite communications, DFDAU, DFDR, ACMS and printers. Peter was lead engineer for Thrust Management System (757, 767, 747-400), also supervisor for satellite communications for 777, and was manager of terminal-area projects (GLS, MLS, enhanced vision).

An instrument-rated private pilot, single engine land and sea, Peter has enjoyed perspectives from both operating and designing airplanes. Hundreds of hours of flight test analysis and thousands of hours in simulators have given him an appreciation for the many aspects that drive aviation; whether tandem complexity, policy, human, or technical; and the difficulties and challenges to achieving success.

7 comments:

  1. "I wonder if this is the a new reality, where open speculation outside of the official investigation becomes actionable."

    This is in a way self afflicted isn't it?
    FAA behaved not "neutral, we will wait and see"
    but "everything is save and super OK".

    in both cases (787, MAX) overconfidence in decisions and bad process
    came to light afterwards.

    ReplyDelete
    Replies
    1. The FAA issued an AD informing the flight crews of the flight deck effects possible with an AoA vane erroneous indication, the potential for MCAS to kick in, and how to respond with cutout switch. With no data from Ethiopian, the FAA expected that the MCAS issues would not be a factor - the AD should have ensured that.

      Delete
  2. great analysis Peter ! thank you

    ReplyDelete
  3. Hi peter. Thanks for your analysis. As you were associated with Satcom and connected developments, I have a question on MAX. MAX has a fault reporting system similar to B777 or B787, where maintenance message are set and down-linked to maintenance engineering base continuously. Are you aware of any maintenance messages (Fault codes) sent by both crashed planes. I am asking this question because everyone was waiting for FDR data to determine cause. Normally in every incident, these fault messages give an early clue.

    ReplyDelete
    Replies
    1. Boeing has the ONS available for the MAX. From this, it seems likely that AoA is reported routinely. The onboard system may only report when on the ground, after a flight. The situation prior to JT043 (Lion Air) was a failed AoA (out of range). The situation on JT043 and JT610 was AoA erroneous output - not a failure, not reported as a failure. We don't know for certain on ET302 yet.
      http://www.boeing.com/commercial/aeromagazine/articles/2014_q3/pdf/AERO_2014q3.pdf

      Delete
    2. Thanks Peter. If the airline has signed up for BHM ( Boeing health monitoring),the down link is continuous thro CARS using VHF data or Satcom.
      That was so in my airline operating B777. I am sure ET with a large fleet of B777 and B787 would have. Ram

      Delete