Getting started with NetApp Autonomous Ransomware Protection (ARP)

Before you begin

ARP has been available since ONTAP 9.10.1. The ARP function is focused on detecting ransomware attacks for file workloads.
In the case of an attack, ARP automatically creates a snapshot and can forward this event via BlueXP or syslog to an external SIEM solution.
You also need to be aware that ARP is not the only layer of protection that you have in place in your ransomware strategy.
Smart snapshot policies , tamperproof snapshots, snapmirror/vault, multi-admin approval and role based access should also be part of your ransomware protection strategy.

ARP use three ways to detect ransomware attacks Entropy, file extension and File IOPS.

Before you begin, you should have an ONTAP system with current ONTAP patch level. During the last time, a few bugs were fixed. Enabling ARP requires a ONTAP System with a valid ONTAP ONE license.

About this article

This article provides a brief overview of how to enable ARP on NetApp volumes.
The second part covers event log forwarding to ann syslog destination.

Enabling ARP

There are two ways to enable ARP. The first way is GUI based, the second way is CLI command based.
Before ARP can be enabled, you must first initialise the learning mode.
The learning mode should run for at least 30 days.

 

To start the learning mode, you can use the following cli command:
security anti-ransomware volume dry-run -volume your_vol01 -vserver svm1
or using the task via the GUI:
Bildschirmfoto 2024 09 16 Um 20.30.04
Bildschirmfoto 2024 09 16 Um 20.30.15

After the learning period (30 days), you can enable the ARP function.
During the learning phase, ARP uses machine learning to learn how the file system is typically accessed.
The things that ARP learns about the access behavior are stored in a workload.
This workload can be inspected for troubleshooting purpose.
ONTAP 9.16.1 changes the way ARP is enabled. With ONTAP 9.16.1, learning mode is deprecated, you simply enable ARP and ARP works with a pre-trained model.

Forwarding ARP events

The first step is to specify an event notification destination.​
event notification destination create -name sime -syslog fqdn

The event filter needs to be configured with some ARP rules.

event filter rule add -filter-name arp -type include -message-name callhome.arw.activity.seen -severity *
event filter rule add -filter-name arp -type include -message-name arw.volume.state -severity *
event filter rule add -filter-name arp -type include -message-name arw.vserver.state -severity *
event filter rule add -filter-name arp -type include -message-name arw.analytics.* -severity *

Last step is to attach the event filter to the forwarding destination

yourcluster::> event notification create -filter-name arp -destinations sime

yourcluster::> event notification show -filter-name *arp*
ID   Filter Name                     Destinations
---- ------------------------------  -----------------
4    arp                             your_siem

yourcluster::>

To check whether the configuration works as expected, you can use the following command:

event generate -message-name arw.volume.state -node node01 -values Test,Test,Test,Test,Test
and see if the filter doses what it should.
yourcluster::*> event notification history show -destination your_siem
Time                Node             Severity      Event
------------------- ---------------- ------------- ---------------------------
11/27/2023 17:25:09 node01
                                     NOTICE        arw.volume.state: Anti-ransomware state was changed to "Test" on volume "Test" (UUID: "Test") in Vserver "Test" (UUID: "Test").
11/27/2023 17:20:15 node01
                                     NOTICE        arw.volume.state: Anti-ransomware state was changed to "Test" on volume "Test" (UUID: "Test") in Vserver "Test" (UUID: "Test").
11/27/2023 17:13:23 node01
                                     NOTICE        arw.volume.state: Anti-ransomware state was changed to "Test" on volume "Test" (UUID: "Test") in Vserver "Test" (UUID: "Test").
11/27/2023 17:04:24 node01
                                     NOTICE        arw.volume.state: Anti-ransomware state was changed to "Test" on volume "Test" (UUID: "Test") in Vserver "Test" (UUID: "Test").
4 entries were displayed.

yourcluster::*>
Feel free and try out the NetApp lab "Protection and Recovery From Ransomware v4.2" .
 
Cheers ✌,
Eric

fix NetApp Burts with Ansible

Hi *,

during my last NetApp SAM meeting, I noticed several todo's.

In some cases, NetApp offers an ansible script fixing an issue directly into the https://activeiq.netapp.com/ page.

For my special cases, the issue pointed out into Applications experience latency due to a single slow SSD KB and Unexpected bootargument is set KB. Both bugs need to be fixed manually.

Both KB's were pointed out to more than 50 Nodes...

That's why I created the following ansible roles for the bootarg KB:

---
- hosts: all
  gather_facts: no
  vars_prompt:
    - name: password
      private: yes

  vars:
    login: &login
      username: ansible #"{{ username }}"
      password: "{{ password }}"
      hostname: "{{ inventory_hostname }}"
      https: yes
      validate_certs: false
  
  tasks:
    - name: run ontap cli command
      delegate_to: localhost
      netapp.ontap.na_ontap_ssh_command:
        command: 'node run -node * -command bootargs unset bootarg.gb.override.lmgr.veto'
        privilege: diag
        <<: *login

    - name: Send message
      delegate_to: localhost
      netapp.ontap.na_ontap_autosupport_invoke:
        autosupport_message: "fix A boot argument that is only expected to be set during ONTAP update is still set."
        <<: *login

#https://kb.netapp.com/onprem/ontap/os/Unexpected_bootargument_is_set_-_Active_IQ_Wellness_Risk

For Applications experience latency due to a single slow SSD KB I simply recycled the role above and replaced the command.

---
- hosts: all
  gather_facts: no
  vars_prompt:
    - name: password
      private: yes

  vars:
    login: &login
      username: ansible #"{{ username }}"
      password: "{{ password }}"
      hostname: "{{ inventory_hostname }}"
      https: yes
      validate_certs: false
  tasks:
    - name: run ontap cli command
     delegate_to: localhost
     netapp.ontap.na_ontap_ssh_command:
       command: 'system node run -node * options disk.latency_check_ssd.fail_enable on'
       privilege: diag
       <<: *login
 
   - name: Send message
     delegate_to: localhost
     netapp.ontap.na_ontap_autosupport_invoke:
       autosupport_message: "fix Burt 1479263 - Applications experience latency due to a single slow SSD"
       <<: *login

#https://mysupport.netapp.com/site/bugs-online/product/ONTAP/BURT/1479263

Please note that you should only run these commands against affected systems.

Both Ansible scripts enabled me to fix both issues with really less effort.

With the invoke autosupport the changes were pushed into the NetApp data warehouse and systems were remove from the affected list after 24hours. 

Fix the ontap boot arg issue and the ssd issue can be found into my github repo.

Cheers 

Eric

NetApp Keystone Login Banner

Hello *,

as mentioned in my session at NetApp INSIGHT this year, I have created a smart login banner for NetApp Keystone Systems.

The banner makes NetApp Keystone Systems more visible and suggest admin/users that support cases are opened via a different path than usual. It also points out that QoS values should not be adjusted.

 

---
- hosts: all
  gather_facts: no
  vars_prompt:
    - name: ansible user password
      private: yes
  
  vars:
    login: &login
      username: ansible #"{{ username }}"
      password: "{{ password }}"
      hostname: "{{ inventory_hostname }}"
      https: yes
      validate_certs: false
  tasks:
    - name: modify motd cluster - REST
      delegate_to: localhost
      netapp.ontap.na_ontap_login_messages:
        motd_message: "\n authorized access only \n\n Keystone System, please be aware of special QoS values \n Keystone Germany Hotline +49 8006 2730 17 \n Email keystone.services@netapp.com \n For any escalations, please reachout keystone.escalations@netapp.com \n\n case management https://netappgssc.service-now.com/csm" 
        show_cluster_motd: True
        <<: *login

The last version can be found into my GitHub repo.

And yes if you are not into german region you should replace the phone number to one of the following numbers:

  • Keystone USA and Canada: +1 866 363 8277 (toll free) / 1 919 629 3399 (non-toll-free)
  • Keystone Belgium: +32 8005 8797 
  • Keystone UK: +44 8000 2600 66 
  • Keystone Austria: +43 8003 0101 0 
  • Keystone Netherlands: +31 8003 8000 13 
  • Keystone Switzerland: +41 8005 6689 5 
  • Keystone Germany: +49 8006 2730 17 
  • Keystone Japan: +81 800 6000 140
  • Keystone Australia: +61 1800 0072 01 

Cheers
Eric

ONTAP SSL Security Hardening

Hi *,
during this year I ran two times into an ONTAP hardening issues which was related to snapmirror failures.

For internal hardening projects, I created the following Ansible workbook.

---
- hosts: all
  gather_facts: no
  vars_prompt:
    - name: password
      private: yes

  vars:
    login: &login
      username: ansible #"{{ username }}"
      password: "{{ password }}"
      hostname: "{{ inventory_hostname }}"
      https: yes
      validate_certs: false

  tasks:
    - name: Modify SSH algorithms at cluster level
      delegate_to: localhost
      netapp.ontap.na_ontap_security_ssh:
        vserver:
        ciphers: ["aes256-ctr","aes192-ctr","aes128-ctr","aes128-gcm","aes256-gcm"]
        key_exchange_algorithms: ["diffie-hellman-group-exchange-sha256","ecdh-sha2-nistp256","ecdh-sha2-nistp384","ecdh-sha2-nistp521","curve25519-sha256"]
        mac_algorithms: ["hmac-sha2-256","hmac-sha2-512","hmac-sha2-256-etm","hmac-sha2-512-etm","umac-64","umac-128","umac-64-etm","umac-128-etm"]
        max_authentication_retry_count: 6
        <<: *login

    - name: Modify SSL Security Config
      delegate_to: localhost
      netapp.ontap.na_ontap_security_config:
        #name: ssl
        is_fips_enabled: false
        supported_cipher_suites: ["TLS_PSK_WITH_AES_256_GCM_SHA384","TLS_RSA_WITH_AES_128_GCM_SHA256","TLS_RSA_WITH_AES_256_GCM_SHA384","TLS_DHE_DSS_WITH_AES_128_GCM_SHA256","TLS_DHE_DSS_WITH_AES_256_GCM_SHA384","TLS_DHE_RSA_WITH_AES_128_GCM_SHA256","TLS_DHE_RSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256","TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256","TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384"]
        #supported_ciphers:  'ALL:!LOW:!aNULL:!EXP:!eNULL:!3DES:!RC4:!SHA1'
        supported_protocols: ['TLSv1.2']
        <<: *login

# src https://docs.ansible.com/ansible/latest/collections/netapp/ontap/na_ontap_security_ssh_module.html
# src https://docs.ansible.com/ansible/latest/collections/netapp/ontap/na_ontap_security_config_module.html

I was asked to remove TLS_PSK_WITH_AES_256_GCM_SHA384.
This was a terrible idea. Removing TLS_PSK_WITH_AES_256_GCM_SHA384 leads to that snapmirror stops working or new cluster peering can't be created.

Peering errors looked like:

yourcluster::> cluster peer create -peer-addrs x.x.x.x,x.x.x.y -applications snapmirror -ipspace snapspace -address-family ipv4
 
Notice: Use a generated passphrase or choose a passphrase of 8 or more characters. To ensure the authenticity of the peering
        relationship, use a phrase or sequence of characters that would be hard to guess.
 
Enter the passphrase:
Confirm the passphrase:
 
Error: command failed: Using peer-address x.x.x.x: An introductory RPC to the peer address "x.x.x.x" failed to connect: RPC:
       Remote system error [from mgwd on node "nodeA" (VSID: -1) to xcintro at x.x.x.x ]. Verify that the peer
       address is correct, and then try the operation again.
 
yourcluster::>

There is an NetApp Knowlge Base article regarding this issue -> https://kb.netapp.com/onprem/ontap/metrocluster/Connectivity_to_peer_cluster_is_broken_with_missing_PSK_Cipher

Please be aware of removing TLS_PSK_WITH_AES_256_GCM_SHA384 of your ONTAP SSL Cipher list.
And please also keep in mind, that hardening change for SSL config gets effective only after a takeover/giveback.

The current version of my Harding workbook can be found into my GitHub repo.

Cheers
Eric