effects of sss_cache on the memory cache
This document (000020646) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server for SAP Applications 12 SP5
SUSE Linux Enterprise Server 15 All Service Packs
SUSE Linux Enterprise Server 12 SP5
Situation
# id xxx id: 'xxx': no such user
2022-03-28T17:32:09.780060+02:00 dcplnx25719280 nss: Starting up 2022-03-28T17:32:11.792566+02:00 dcplnx25719280 nss: Starting up 2022-03-28T17:32:15.805214+02:00 dcplnx25719280 nss: Starting up 2022-03-28T17:32:15.808333+02:00 dcplnx25719280 sssd: Exiting the SSSD. Could not restart critical service [nss].
2022-03-28T19:21:46.296848+02:00 dcplnx25719280 systemd[1]: Starting System Security Services Daemon... 2022-03-28T19:21:46.313220+02:00 dcplnx25719280 sssd: Starting up 2022-03-28T19:21:46.323196+02:00 dcplnx25719280 be[LDAPS]: Starting up 2022-03-28T19:21:46.342295+02:00 dcplnx25719280 pam: Starting up 2022-03-28T19:21:46.342689+02:00 dcplnx25719280 sudo: Starting up 2022-03-28T19:21:46.343671+02:00 dcplnx25719280 nss: Starting up 2022-03-28T19:21:46.344005+02:00 dcplnx25719280 ssh: Starting up 2022-03-28T19:21:46.374240+02:00 dcplnx25719280 nss: Starting up 2022-03-28T19:21:52.452237+02:00 dcplnx25719280 sssd: Exiting the SSSD. Could not restart critical service [nss]. 2022-03-28T19:21:52.452388+02:00 dcplnx25719280 sudo: Shutting down 2022-03-28T19:21:52.454401+02:00 dcplnx25719280 ssh: Shutting down 2022-03-28T19:21:52.456898+02:00 dcplnx25719280 pam: Shutting down 2022-03-28T19:21:52.459478+02:00 dcplnx25719280 be[LDAPS]: Shutting down 2022-03-28T19:21:52.464080+02:00 dcplnx25719280 systemd[1]: sssd.service: Main process exited, code=exited, status=1/FAILURE 2022-03-28T19:21:52.464473+02:00 dcplnx25719280 systemd[1]: Failed to start System Security Services Daemon.
# sss_debuglevel 0x00ff
(2022-03-28 19:21:46): [be[LDAPS]] [ldb] (0x0010): ltdb: tdb(/var/lib/sss/db/cache_LDAPS.ldb): tdb_transaction_prepare_commit: expansion failed (2022-03-28 19:21:46): [be[LDAPS]] [ldb] (0x0010): Failure during prepare_write): IO Error -> Protocol error (2022-03-28 19:21:46): [be[LDAPS]] [ldb] (0x0010): cancel called but no ldb transactions are active! (2022-03-28 19:21:52): [be[LDAPS]] [orderly_shutdown] (0x0010): SIGTERM: killing children
(2022-03-28 19:17:51): [nss] [sss_mc_create_file] (0x0010): Failed to mark mmap file /var/lib/sss/mc/passwd as recycled: 28(No space left on device) (2022-03-28 19:17:51): [nss] [sss_mc_create_file] (0x0010): Failed to mark mmap file /var/lib/sss/mc/group as recycled: 28(No space left on device)
Resolution
As a workaround you might set the environment variable SSS_NSS_USE_MEMCACHE to "NO". This causes the memory cache to not be used at all, but performance issues are likely.
Cause
EFFECTS ON THE FAST MEMORY CACHE sss_cache also invalidates the memory cache. Since the memory cache is a file which is mapped into the memory of each process which called SSSD to resolve users or groups the file cannot be truncated. A special flag is set in the header of the file to indicated that the content is invalid and then the file is unlinked by SSSD's NSS responder and a new cache file is created. Whenever a process is now doing a new lookup for a user or a group it will see the flag, close the old memory cache file and map the new one into its memory. When all processes which had opened the old memory cache file have closed it while looking up a user or a group the kernel can release the occupied disk space and the old memory cache file is finally removed completely. A special case is long running processes which are doing user or group lookups only at startup, e.g. to determine the name of the user the process is running as. For those lookups the memory cache file is mapped into the memory of the process. But since there will be no further lookups this process would never detect if the memory cache file was invalidated and hence it will be kept in memory and will occupy disk space until the process stops. As a result calling sss_cache might increase the disk usage because old memory cache files cannot be removed from the disk because they are still mapped by long running processes.
A possible work-around for long running processes which are looking up users and groups only at startup or very rarely is to run them with the environment variable SSS_NSS_USE_MEMCACHE set to "NO" so that they won't use the memory cache at all and not map the memory cache file into the memory. In general a better solution is to tune the cache timeout parameters so that they meet the local expectations and calling sss_cache is not needed.
Source:
https://github.com/SSSD/sssd/commit/b9e60ae067696782e3a52f58172f13077b5ea0f2
Background:
https://docs.pagure.org/sssd.sssd/design_pages/fast_nss_cache.html
Additional Information
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000020646
- Creation Date: 28-Apr-2022
- Modified Date:06-May-2022
-
- SUSE Linux Enterprise Server
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com