fix NetApp Burts with Ansible

Hi *,

during my last NetApp SAM meeting, I noticed several todo's.

In some cases, NetApp offers an ansible script fixing an issue directly into the https://activeiq.netapp.com/ page.

For my special cases, the issue pointed out into Applications experience latency due to a single slow SSD KB and Unexpected bootargument is set KB. Both bugs need to be fixed manually.

Both KB's were pointed out to more than 50 Nodes...

That's why I created the following ansible roles for the bootarg KB:

---
- hosts: all
  gather_facts: no
  vars_prompt:
    - name: password
      private: yes

  vars:
    login: &login
      username: ansible #"{{ username }}"
      password: "{{ password }}"
      hostname: "{{ inventory_hostname }}"
      https: yes
      validate_certs: false
  
  tasks:
    - name: run ontap cli command
      delegate_to: localhost
      netapp.ontap.na_ontap_ssh_command:
        command: 'node run -node * -command bootargs unset bootarg.gb.override.lmgr.veto'
        privilege: diag
        <<: *login

    - name: Send message
      delegate_to: localhost
      netapp.ontap.na_ontap_autosupport_invoke:
        autosupport_message: "fix A boot argument that is only expected to be set during ONTAP update is still set."
        <<: *login

#https://kb.netapp.com/onprem/ontap/os/Unexpected_bootargument_is_set_-_Active_IQ_Wellness_Risk

For Applications experience latency due to a single slow SSD KB I simply recycled the role above and replaced the command.

---
- hosts: all
  gather_facts: no
  vars_prompt:
    - name: password
      private: yes

  vars:
    login: &login
      username: ansible #"{{ username }}"
      password: "{{ password }}"
      hostname: "{{ inventory_hostname }}"
      https: yes
      validate_certs: false
  tasks:
    - name: run ontap cli command
     delegate_to: localhost
     netapp.ontap.na_ontap_ssh_command:
       command: 'system node run -node * options disk.latency_check_ssd.fail_enable on'
       privilege: diag
       <<: *login
 
   - name: Send message
     delegate_to: localhost
     netapp.ontap.na_ontap_autosupport_invoke:
       autosupport_message: "fix Burt 1479263 - Applications experience latency due to a single slow SSD"
       <<: *login

#https://mysupport.netapp.com/site/bugs-online/product/ONTAP/BURT/1479263

Please note that you should only run these commands against affected systems.

Both Ansible scripts enabled me to fix both issues with really less effort.

With the invoke autosupport the changes were pushed into the NetApp data warehouse and systems were remove from the affected list after 24hours. 

Fix the ontap boot arg issue and the ssd issue can be found into my github repo.

Cheers 

Eric

NetApp Keystone Login Banner

Hello *,

as mentioned in my session at NetApp INSIGHT this year, I have created a smart login banner for NetApp Keystone Systems.

The banner makes NetApp Keystone Systems more visible and suggest admin/users that support cases are opened via a different path than usual. It also points out that QoS values should not be adjusted.

 

---
- hosts: all
  gather_facts: no
  vars_prompt:
    - name: ansible user password
      private: yes
  
  vars:
    login: &login
      username: ansible #"{{ username }}"
      password: "{{ password }}"
      hostname: "{{ inventory_hostname }}"
      https: yes
      validate_certs: false
  tasks:
    - name: modify motd cluster - REST
      delegate_to: localhost
      netapp.ontap.na_ontap_login_messages:
        motd_message: "\n authorized access only \n\n Keystone System, please be aware of special QoS values \n Keystone Germany Hotline +49 8006 2730 17 \n Email keystone.services@netapp.com \n For any escalations, please reachout keystone.escalations@netapp.com \n\n case management https://netappgssc.service-now.com/csm" 
        show_cluster_motd: True
        <<: *login

The last version can be found into my GitHub repo.

And yes if you are not into german region you should replace the phone number to one of the following numbers:

  • Keystone USA and Canada: +1 866 363 8277 (toll free) / 1 919 629 3399 (non-toll-free)
  • Keystone Belgium: +32 8005 8797 
  • Keystone UK: +44 8000 2600 66 
  • Keystone Austria: +43 8003 0101 0 
  • Keystone Netherlands: +31 8003 8000 13 
  • Keystone Switzerland: +41 8005 6689 5 
  • Keystone Germany: +49 8006 2730 17 
  • Keystone Japan: +81 800 6000 140
  • Keystone Australia: +61 1800 0072 01 

Cheers
Eric

ONTAP SSL Security Hardening

Hi *,
during this year I ran two times into an ONTAP hardening issues which was related to snapmirror failures.

For internal hardening projects, I created the following Ansible workbook.

---
- hosts: all
  gather_facts: no
  vars_prompt:
    - name: password
      private: yes

  vars:
    login: &login
      username: ansible #"{{ username }}"
      password: "{{ password }}"
      hostname: "{{ inventory_hostname }}"
      https: yes
      validate_certs: false

  tasks:
    - name: Modify SSH algorithms at cluster level
      delegate_to: localhost
      netapp.ontap.na_ontap_security_ssh:
        vserver:
        ciphers: ["aes256-ctr","aes192-ctr","aes128-ctr","aes128-gcm","aes256-gcm"]
        key_exchange_algorithms: ["diffie-hellman-group-exchange-sha256","ecdh-sha2-nistp256","ecdh-sha2-nistp384","ecdh-sha2-nistp521","curve25519-sha256"]
        mac_algorithms: ["hmac-sha2-256","hmac-sha2-512","hmac-sha2-256-etm","hmac-sha2-512-etm","umac-64","umac-128","umac-64-etm","umac-128-etm"]
        max_authentication_retry_count: 6
        <<: *login

    - name: Modify SSL Security Config
      delegate_to: localhost
      netapp.ontap.na_ontap_security_config:
        #name: ssl
        is_fips_enabled: false
        supported_cipher_suites: ["TLS_PSK_WITH_AES_256_GCM_SHA384","TLS_RSA_WITH_AES_128_GCM_SHA256","TLS_RSA_WITH_AES_256_GCM_SHA384","TLS_DHE_DSS_WITH_AES_128_GCM_SHA256","TLS_DHE_DSS_WITH_AES_256_GCM_SHA384","TLS_DHE_RSA_WITH_AES_128_GCM_SHA256","TLS_DHE_RSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256","TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256","TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384"]
        #supported_ciphers:  'ALL:!LOW:!aNULL:!EXP:!eNULL:!3DES:!RC4:!SHA1'
        supported_protocols: ['TLSv1.2']
        <<: *login

# src https://docs.ansible.com/ansible/latest/collections/netapp/ontap/na_ontap_security_ssh_module.html
# src https://docs.ansible.com/ansible/latest/collections/netapp/ontap/na_ontap_security_config_module.html

I was asked to remove TLS_PSK_WITH_AES_256_GCM_SHA384.
This was a terrible idea. Removing TLS_PSK_WITH_AES_256_GCM_SHA384 leads to that snapmirror stops working or new cluster peering can't be created.

Peering errors looked like:

yourcluster::> cluster peer create -peer-addrs x.x.x.x,x.x.x.y -applications snapmirror -ipspace snapspace -address-family ipv4
 
Notice: Use a generated passphrase or choose a passphrase of 8 or more characters. To ensure the authenticity of the peering
        relationship, use a phrase or sequence of characters that would be hard to guess.
 
Enter the passphrase:
Confirm the passphrase:
 
Error: command failed: Using peer-address x.x.x.x: An introductory RPC to the peer address "x.x.x.x" failed to connect: RPC:
       Remote system error [from mgwd on node "nodeA" (VSID: -1) to xcintro at x.x.x.x ]. Verify that the peer
       address is correct, and then try the operation again.
 
yourcluster::>

There is an NetApp Knowlge Base article regarding this issue -> https://kb.netapp.com/onprem/ontap/metrocluster/Connectivity_to_peer_cluster_is_broken_with_missing_PSK_Cipher

Please be aware of removing TLS_PSK_WITH_AES_256_GCM_SHA384 of your ONTAP SSL Cipher list.
And please also keep in mind, that hardening change for SSL config gets effective only after a takeover/giveback.

The current version of my Harding workbook can be found into my GitHub repo.

Cheers
Eric