fix NetApp Burts with Ansible

Hi *,

during my last NetApp SAM meeting, I noticed several todo's.

In some cases, NetApp offers an ansible script fixing an issue directly into the page.

For my special cases, the issue pointed out into Applications experience latency due to a single slow SSD KB and Unexpected bootargument is set KB. Both bugs need to be fixed manually.

Both KB's were pointed out to more than 50 Nodes...

That's why I created the following ansible roles for the bootarg KB:

- hosts: all
  gather_facts: no
    - name: password
      private: yes

    login: &login
      username: ansible #"{{ username }}"
      password: "{{ password }}"
      hostname: "{{ inventory_hostname }}"
      https: yes
      validate_certs: false
    - name: run ontap cli command
      delegate_to: localhost
        command: 'node run -node * -command bootargs unset'
        privilege: diag
        <<: *login

    - name: Send message
      delegate_to: localhost
        autosupport_message: "fix A boot argument that is only expected to be set during ONTAP update is still set."
        <<: *login


For Applications experience latency due to a single slow SSD KB I simply recycled the role above and replaced the command.

- hosts: all
  gather_facts: no
    - name: password
      private: yes

    login: &login
      username: ansible #"{{ username }}"
      password: "{{ password }}"
      hostname: "{{ inventory_hostname }}"
      https: yes
      validate_certs: false
    - name: run ontap cli command
     delegate_to: localhost
       command: 'system node run -node * options disk.latency_check_ssd.fail_enable on'
       privilege: diag
       <<: *login
   - name: Send message
     delegate_to: localhost
       autosupport_message: "fix Burt 1479263 - Applications experience latency due to a single slow SSD"
       <<: *login


Please note that you should only run these commands against affected systems.

Both Ansible scripts enabled me to fix both issues with really less effort.

With the invoke autosupport the changes were pushed into the NetApp data warehouse and systems were remove from the affected list after 24hours. 

Fix the ontap boot arg issue and the ssd issue can be found into my github repo.



NetApp Keystone Login Banner

Hello *,

as mentioned in my session at NetApp INSIGHT this year, I have created a smart login banner for NetApp Keystone Systems.

The banner makes NetApp Keystone Systems more visible and suggest admin/users that support cases are opened via a different path than usual. It also points out that QoS values should not be adjusted.


- hosts: all
  gather_facts: no
    - name: ansible user password
      private: yes
    login: &login
      username: ansible #"{{ username }}"
      password: "{{ password }}"
      hostname: "{{ inventory_hostname }}"
      https: yes
      validate_certs: false
    - name: modify motd cluster - REST
      delegate_to: localhost
        motd_message: "\n authorized access only \n\n Keystone System, please be aware of special QoS values \n Keystone Germany Hotline +49 8006 2730 17 \n Email \n For any escalations, please reachout \n\n case management" 
        show_cluster_motd: True
        <<: *login

The last version can be found into my GitHub repo.

And yes if you are not into german region you should replace the phone number to one of the following numbers:

  • Keystone USA and Canada: +1 866 363 8277 (toll free) / 1 919 629 3399 (non-toll-free)
  • Keystone Belgium: +32 8005 8797 
  • Keystone UK: +44 8000 2600 66 
  • Keystone Austria: +43 8003 0101 0 
  • Keystone Netherlands: +31 8003 8000 13 
  • Keystone Switzerland: +41 8005 6689 5 
  • Keystone Germany: +49 8006 2730 17 
  • Keystone Japan: +81 800 6000 140
  • Keystone Australia: +61 1800 0072 01 


ONTAP SSL Security Hardening

Hi *,
during this year I ran two times into an ONTAP hardening issues which was related to snapmirror failures.

For internal hardening projects, I created the following Ansible workbook.

- hosts: all
  gather_facts: no
    - name: password
      private: yes

    login: &login
      username: ansible #"{{ username }}"
      password: "{{ password }}"
      hostname: "{{ inventory_hostname }}"
      https: yes
      validate_certs: false

    - name: Modify SSH algorithms at cluster level
      delegate_to: localhost
        ciphers: ["aes256-ctr","aes192-ctr","aes128-ctr","aes128-gcm","aes256-gcm"]
        key_exchange_algorithms: ["diffie-hellman-group-exchange-sha256","ecdh-sha2-nistp256","ecdh-sha2-nistp384","ecdh-sha2-nistp521","curve25519-sha256"]
        mac_algorithms: ["hmac-sha2-256","hmac-sha2-512","hmac-sha2-256-etm","hmac-sha2-512-etm","umac-64","umac-128","umac-64-etm","umac-128-etm"]
        max_authentication_retry_count: 6
        <<: *login

    - name: Modify SSL Security Config
      delegate_to: localhost
        #name: ssl
        is_fips_enabled: false
        #supported_ciphers:  'ALL:!LOW:!aNULL:!EXP:!eNULL:!3DES:!RC4:!SHA1'
        supported_protocols: ['TLSv1.2']
        <<: *login

# src
# src

I was asked to remove TLS_PSK_WITH_AES_256_GCM_SHA384.
This was a terrible idea. Removing TLS_PSK_WITH_AES_256_GCM_SHA384 leads to that snapmirror stops working or new cluster peering can't be created.

Peering errors looked like:

yourcluster::> cluster peer create -peer-addrs x.x.x.x,x.x.x.y -applications snapmirror -ipspace snapspace -address-family ipv4
Notice: Use a generated passphrase or choose a passphrase of 8 or more characters. To ensure the authenticity of the peering
        relationship, use a phrase or sequence of characters that would be hard to guess.
Enter the passphrase:
Confirm the passphrase:
Error: command failed: Using peer-address x.x.x.x: An introductory RPC to the peer address "x.x.x.x" failed to connect: RPC:
       Remote system error [from mgwd on node "nodeA" (VSID: -1) to xcintro at x.x.x.x ]. Verify that the peer
       address is correct, and then try the operation again.

There is an NetApp Knowlge Base article regarding this issue ->

Please be aware of removing TLS_PSK_WITH_AES_256_GCM_SHA384 of your ONTAP SSL Cipher list.
And please also keep in mind, that hardening change for SSL config gets effective only after a takeover/giveback.

The current version of my Harding workbook can be found into my GitHub repo.
