tests: fix UTF-8 detection, per-test LC_* settings, CI coverage

- runtests: fix `codeset-utf8` feature detection. Before this patch it
  detected if the calling environment had UTF-8 enabled. If not, UTF-8
  tests were all skipped. After this patch, it detects if UTF-8 is
  supported by the calling environment regardless of what's currently
  enabled.
  Follow-up to 0b70b23ef4 #15039

- GHA/linux: sync `codeset-test` to also reset `LC_CTYPE` and
  `LC_NUMBER`. To give it more spin.
  Follow-up to c221c0ee59 #17938

- GHA/macos: fix to actually enable `codeset-test`. Also set `LC_ALL`,
  which seems necessary to trigger issues.
  Follow-up to c221c0ee59 #17938

- tests/data: replace `LC_CTYPE` env with `LC_ALL` in all tests
  requiring a locale. Also to avoid potential issues with a blank or
  unset `LC_ALL`, as seen earlier. And to ensure that the override works
  on all platforms (as tested in CI.)
  Slight downside is that this now resets the language/culture to `C`.
  Ref: b4c9982382 #4743
  Ref: 23208e330a #4738

- replace `en_US.UTF-8` with `C.UTF-8` to be language/culture-agnostic.

- TEST-SUITE.md: drop `UTF-8` as a requirement for tests.
  Tests shall work (or least be skipped) without UTF-8 support.

Tests requiring UTF-8 locale:
165, 962, 963, 964, 965, 966, 967, 1448, 1560, 2046, 2047
Tests requiring UTF-8 locale, but passing without one anyway:
955, 956, 957, 958, 959, 960, 961, 968, 1034, 1035

Spec 1997: https://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html
Spec 2008: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html

Ref: c221c0ee59 #17938
Ref: 7cf8414fab
Ref: 4c140a5628
Ref: 28faaacee2 #2436
Ref: ecd1d020ab

Closes #17988
This commit is contained in:
Viktor Szakats 2025-07-16 04:25:08 +02:00
parent c37e06c642
commit 1cc8a5235f
No known key found for this signature in database
GPG key ID: B5ABD165E2AEF201
25 changed files with 51 additions and 45 deletions

View file

@ -685,7 +685,12 @@ jobs:
fi
fi
[ -x ~/venv/bin/activate ] && source ~/venv/bin/activate
[[ "${MATRIX_INSTALL_STEPS}" = *'codeset-test'* ]] && export LC_ALL=C
if [[ "${MATRIX_INSTALL_STEPS}" = *'codeset-test'* ]]; then
locale || true
export LC_ALL=C
export LC_CTYPE=C
export LC_NUMERIC=fr_FR.UTF-8
fi
if [ "${MATRIX_BUILD}" = 'cmake' ]; then
cmake --build bld --verbose --target "${TEST_TARGET}"
else

View file

@ -214,6 +214,7 @@ jobs:
MATRIX_BUILD: ${{ matrix.build.generate && 'cmake' || 'autotools' }}
MATRIX_COMPILER: '${{ matrix.compiler }}'
MATRIX_INSTALL: '${{ matrix.build.install }}'
MATRIX_INSTALL_STEPS: '${{ matrix.build.install_steps }}'
MATRIX_MACOS_VERSION_MIN: '${{ matrix.build.macos-version-min }}'
strategy:
fail-fast: false
@ -381,7 +382,6 @@ jobs:
MATRIX_CHKPREFILL: '${{ matrix.build.chkprefill }}'
MATRIX_CONFIGURE: '${{ matrix.build.configure }}'
MATRIX_GENERATE: '${{ matrix.build.generate }}'
MATRIX_INSTALL_STEPS: '${{ matrix.build.install_steps }}'
run: |
if [[ "${MATRIX_COMPILER}" = 'gcc'* ]]; then
sysroot="$("${CC}" --print-sysroot)" # Must match the SDK gcc was built for
@ -481,6 +481,8 @@ jobs:
TFLAGS="-j20 ${TFLAGS}"
source ~/venv/bin/activate
if [[ "${MATRIX_INSTALL_STEPS}" = *'codeset-test'* ]]; then
locale || true
export LC_ALL=C
export LC_CTYPE=C
export LC_NUMERIC=fr_FR.UTF-8
fi

View file

@ -53,7 +53,6 @@ SPDX-License-Identifier: curl
- `openssl` (the command line tool, for generating test server certificates)
- `openssh` or `SunSSH` (for SCP and SFTP tests)
- `nghttpx` (for HTTP/2 and HTTP/3 tests)
- An available `en_US.UTF-8` locale
### Installation of impacket

View file

@ -29,8 +29,7 @@ proxy
codeset-utf8
</features>
<setenv>
LC_ALL=
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
HTTP over proxy with malformatted IDN host name

View file

@ -27,8 +27,7 @@ proxy
codeset-utf8
</features>
<setenv>
LC_ALL=
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
HTTP over proxy with too long IDN host name

View file

@ -43,8 +43,7 @@ IDN
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
Redirect following to UTF-8 IDN host name

View file

@ -13,7 +13,7 @@ urlapi
none
</server>
<setenv>
LC_ALL=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<features>
file

View file

@ -33,8 +33,7 @@ proxy
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
HTTP over proxy with IDN host name

View file

@ -43,8 +43,7 @@ IDN
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
Connection reuse with IDN host name

View file

@ -44,8 +44,7 @@ proxy
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
Connection reuse with IDN host name over HTTP proxy

View file

@ -24,8 +24,7 @@ smtp
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
SMTP without SMTPUTF8 support - UTF-8 based sender (local part only)

View file

@ -21,8 +21,7 @@ smtp
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
SMTP without SMTPUTF8 support - UTF-8 based recipient (local part only)

View file

@ -22,8 +22,7 @@ smtp
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
SMTP VRFY without SMTPUTF8 support - UTF-8 recipient (local part only)

View file

@ -22,8 +22,7 @@ smtp
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
SMTP external VRFY without SMTPUTF8 - UTF-8 recipient (local part only)

View file

@ -25,8 +25,7 @@ smtp
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
SMTP without SMTPUTF8 support - UTF-8 based sender (host part only)

View file

@ -22,8 +22,7 @@ smtp
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
SMTP without SMTPUTF8 support - UTF-8 based recipient (host part only)

View file

@ -23,8 +23,7 @@ smtp
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
SMTP external VRFY without SMTPUTF8 - UTF-8 recipient (host part only)

View file

@ -23,8 +23,7 @@ IDN
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
SMTP without SMTPUTF8 support - UTF-8 based sender (host part only)

View file

@ -23,8 +23,7 @@ IDN
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
SMTP without SMTPUTF8 support (IDN) - UTF-8 recipient (host part only)

View file

@ -24,8 +24,7 @@ IDN
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
SMTP external VRFY without SMTPUTF8 (IDN) - UTF-8 recipient (host part)

View file

@ -26,8 +26,7 @@ IDN
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
SMTP with SMTPUTF8 support - UTF-8 based sender

View file

@ -26,8 +26,7 @@ IDN
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
SMTP with SMTPUTF8 support - UTF-8 based recipient

View file

@ -30,8 +30,7 @@ IDN
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
SMTP external VRFY with SMTPUTF8 support

View file

@ -27,8 +27,7 @@ IDN
codeset-utf8
</features>
<setenv>
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_ALL=C.UTF-8
</setenv>
<name>
SMTP VRFY with SMTPUTF8 support

View file

@ -83,6 +83,7 @@ BEGIN {
use Digest::MD5 qw(md5);
use List::Util 'sum';
use I18N::Langinfo qw(langinfo CODESET);
use POSIX qw(setlocale LC_ALL);
use serverhelp qw(
server_exe
@ -484,6 +485,25 @@ sub parseprotocols {
push @protocols, 'none';
}
#######################################################################
# Check if the operating environment supports UTF-8.
sub is_utf8_supported {
my $result;
my $old_LC_ALL;
my $was_defined = defined $ENV{'LC_ALL'};
if($was_defined) {
$old_LC_ALL = $ENV{'LC_ALL'};
}
setlocale(LC_ALL, $ENV{'LC_ALL'} = "C.UTF-8");
$result = lc(langinfo(CODESET())) eq "utf-8";
if($was_defined) {
$ENV{'LC_ALL'} = $old_LC_ALL;
}
else {
delete $ENV{'LC_ALL'};
}
return $result;
}
#######################################################################
# Check & display information about curl and the host the test suite runs on.
@ -808,7 +828,7 @@ sub checksystemfeatures {
# Use this as a proxy for any cryptographic authentication
$feature{"crypto"} = $feature{"NTLM"} || $feature{"Kerberos"} || $feature{"SPNEGO"};
$feature{"local-http"} = servers::localhttp();
$feature{"codeset-utf8"} = lc(langinfo(CODESET())) eq "utf-8";
$feature{"codeset-utf8"} = is_utf8_supported();
if($feature{"codeset-utf8"}) {
$ENV{'CURL_TEST_HAVE_CODESET_UTF8'} = 1;
}