<# Prerequisites: PowerShell v3+ License: MIT Author: Michael Klement DOWNLOAD and DEFINITION OF THE FUNCTION: irm https://gist.github.com/mklement0/ef57aea441ea8bd43387a7d7edfc6c19/raw/Invoke-WithEncoding.ps1 | iex The above directly defines the function below in your session and offers guidance for making it available in future sessions too. DOWNLOAD ONLY: irm https://gist.github.com/mklement0/ef57aea441ea8bd43387a7d7edfc6c19/raw > Invoke-WithEncoding.ps1 The above downloads to the specified file, which you then need to dot-source to make the function available in the current session: . ./Invoke-WithEncoding.ps1 To learn what the function does: * see the next comment block * or, once downloaded and defined, invoke the function with -? or pass its name to Get-Help. To define an ALIAS for the function, (also) add something like the following to your $PROFILE: Set-Alias ien Invoke-WithEncoding #> function Invoke-WithEncoding { <# .SYNOPSIS Invokes a native (external) program with the specified character encoding. .DESCRIPTION Invokes a native (external) program using the specified encoding to both send data to and receive data from via the pipeline. Note: * Even though there's no formal parameter, pipeline input *is* supported. * However, for technical reasons all pipelne in put is *collected in full* first, and so is all output. * This command ensures that decoding of the native program output into .NET string is performed, as would invariably happen on capturing output or piping output to a different command, so that, on Windows, encoding mismatches aren't masked by direct-to-console output printing correctly. The previous encoding settings are restored when this command exits. .PARAMETER ScriptBlock The script block containing the native-program call(s) to perform. Note that if you use the pipeline to pipe text to this command and you have have *multiple* native-program call, only the *first* one will receive the input. .PARAMETER Encoding The character encoding to use as a temporary override while executing the command(s). You may pass a [System.Text.Encoding] instance directly, a code-page number (e.g. 850), or an encoding name (e.g. 'utf-8'). Additionally, 'ansi' and 'oem' are supported to refer to the system's active ANSI/OEM code page. The resulting encoding is temporarily set as follows: $OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = Note that $OutputEncoding is also set, to ensure consistency with the console settings, whereas the default $OutputEncoding value differs, except if you use PowerShell (Core) 7+ *and* have *system-wide* UTF-8 support enabled (available in Windows 10). See the NOTES section (Get-Help -Full) for more information. .Parameter WindowsOnly Indicates that the encoding should only be applied when running *on Windows*, which is helpful for programs that only exhibit nonstandard behavior on Windows. For instance, Python works as expected on Unix-like platforms, but unexpectedly uses the active *ANSI* rather than OEM code page on Windows. Using -WindowsOnly allows you to use the same invocation on both platforms, without the need for a conditional .Parameter InputObject An aux. parameter to enable input from the pipeline. Do not use it directly. .EXAMPLE Invoke-WithEncoding -Encoding Ansi -WindowsOnly { python -c "print('eé')" } Calls Python to print an ASCII-range and an accented character, using ANSI encoding to decode the output, which Python unconditionally uses, but only on Windows. .EXAMPLE 'eé' | Invoke-WithEncoding -Encoding utf8 { node -pe "require('fs').readFileSync(0).toString().trim()" } Pipes string 'eé' to a Node.js command that simply relays its stdin input to stdout, using UTF-8 encoding to send input and receive output. .EXAMPLE Invoke-WithEncoding -Encoding utf8 { node -pe "'eé'" } | ForEach-Object { $_.ToCharArray().ForEach({ '0x' + ([int] $_).ToString('x') + " ($_)" }) } Calls Node.js to print an ASCII and an accented character, using UTF-8 encoding to decode the output, which Node.js unconditionally uses, and examines the output string's Unicode code points in hex. format. .NOTES Given that most Unix-like system nowadays default to UTF-8 encoding, where no encoding problems are to be expected, this command is primarily useful on Windows. To make a console / Windows Terminal window use UTF-8 consistently, run the following (which you may place in your $PROFILE file): $global:OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = [System.Text.UTF8Encoding]::new() For background information, including how to enable UTF-8 system-wide in Windows 10, see https://stackoverflow.com/a/57134096/45375 #> # ALSO STORED AS A GIST AT: https://gist.github.com/mklement0/ef57aea441ea8bd43387a7d7edfc6c19 [CmdletBinding(PositionalBinding = $false)] param( [Parameter(Mandatory, Position = 0)] $Encoding, # [System.Text.Encoding] instance, code-page number, or encoding name. [Parameter(Mandatory, Position = 1)] [scriptblock] $ScriptBlock, [Parameter(ValueFromPipeline)] $InputObject, [switch] $WindowsOnly ) Set-StrictMode -Version 1; $ErrorActionPreference = 'Stop' # Prevent direct use of -InputObject. # Note that mistaken attempts to provide *both* pipeline input and use -InputObject will # cause PowerShell itself to complain *for each input object*, with "The input object cannot be bound to any parameters, ..." if (-not $MyInvocation.ExpectingInput -and $InputObject) { Throw "Direct use of -InputObject is not supported. Please use the pipeline." } # Get the active ANSI and OEM encodings. # Note: # * On Windows, we query the *registry* to reliably get the *system locale*'s code pages, given that [cultureinfo]::CurrentCulture.TextInfo.ANSI/OEMCodePage can be *overridden* on a per-user basis (reflect's the user's / thread's culture) # * On Unix, our only option is to use [cultureinfo]::CurrentCulture.TextInfo.ANSI/OEMCodePage $ansiEncoding = if ($env:OS -eq 'Windows_NT') { [System.Text.Encoding]::GetEncoding([int] (Get-ItemProperty HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP).ACP) } else { [Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage) } $oemEncoding = if ($env:OS -eq 'Windows_NT') { [System.Text.Encoding]::GetEncoding([int] (Get-ItemProperty HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage OEMCP).OEMCP) } else { [Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.OEMCodePage) } # Validate the -Encoding argument, if any: if ($null -ne $Encoding -and $Encoding -isnot [System.Text.Encoding]) { # As a courtesy, accept 'ANSI' and 'OEM' to represent the active ANSI / OEM encoding. if ($Encoding -is [string] -and $Encoding -in 'ansi', 'oem') { $Encoding = @{ ansi = $ansiEncoding; oem = $oemEncoding }[$Encoding] } else { # Code-page number or encoding name (e.g., 'unicode', 'utf-8') # As a courtesy, also accept 'utf8' instead of 'utf-8', etc. # NOTE: UTF-32 is NOT supported: it fails on assigning to [Console]::InputEncoding / [Console]::OutputEncoding if ($Encoding -is [string]) { $Encoding = $Encoding -replace '^utf(\d)', 'utf-$1' } if ($Encoding -match '^(utf-|unicode$)' -and $Encoding -ne 'utf-7') { # !! [System.Text.Encoding]::GetEncoding('utf-.*|unicode') calls return an encoding *with BOM*, which we do NOT want. # !! so we explicitly create one without. # !! Note: UTF-32 isn't supported anyway, and identifiers such as 'utf-16be' for BE encodings are seemingly not supported. $Encoding = switch ($Encoding) { 'utf-8' { [System.Text.Utf8Encoding]::new() } { $_ -in 'unicode', 'utf-16', 'utf-16le' } { [System.Text.UnicodeEncoding]::new($false, $false) } default { [System.Text.Encoding]::GetEncoding($Encoding) } } } else { $Encoding = [System.Text.Encoding]::GetEncoding($Encoding) } } } $ignoreEncoding = $WindowsOnly -and $env:OS -ne 'Windows_NT' try { if ($ignoreEncoding) { Write-Verbose "Non-Windows platform: ignoring specified encoding, as requested." } else { Write-Verbose "Temporarily setting encoding to: $($Encoding.WebName)" # Save the currently active encodings for later restoration. $prevIn, $prevOut = [Console]::InputEncoding, [Console]::OutputEncoding # Set in-, output and $OutputEncoding to the specified encoding. $OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = $Encoding } # Note: # * Since this is an *advanced* function and there is no process {} # block, $input is an [object[]] array that is empty if there's no # pipeline input. if ($Input) { # There is pipeline input: we must patch it into the script block. # Note: In order to patch the equivalent of `$Input | ...` into the # the script block, we simply stringify and invoke the patched # command with Invoke-Expression - hypothetically, state from # the original script block could be lost in this implicit re-creation, # but this is probably not a real-world concern. $collectedInput = $Input # We must use an aux. variable, because $Input is redefined in the Invoke-Expression context. # !! Do NOT try to force enumeration with @($Input), as we want to preserve *streaming* behavior. $ScriptBlock = { Invoke-Expression ('$collectedInput | {0}' -f $ScriptBlock.ToString()) } } # * Invoke in a *streaming* manner, so as to also support indefinitely # running external programs that periodically produce output, such as # `mosquitto_sub` # * This precludes using $output = & $ScriptBlock # * Streaming requires us to output lines *as they're received*, # yet we don't want to just pass them through as-is, as that could cause # the false appearance that everything is fine on Windows with CLIs that # use WriteConsole() with Unicode support when stdout is directly connected # to a console. # * Therefore, we must force *decoding* of each line, which we can simply # achieve by enclosing it in (...) # * On Unix, it is additionally necessary to restore the original # console output encoding *before* outputting the decoded output - otherwise # even correctly decoded input will *print incorrectly* or vice versa. & $ScriptBlock | ForEach-Object { if (-not $ignoreEncoding) { [Console]::OutputEncoding = $prevOut } # To ensure correct *printing to the terminal* (strictly speaking needed on Unix only), temporarily revert to the original output encoding. ($_) if (-not $ignoreEncoding) { [Console]::OutputEncoding = $Encoding } # Restore the target encoding *for decoding* for the next output line. } } finally { # This should also cover aborting with ^C if (-not $ignoreEncoding) { # Restore original encodings. # Note: No need to restore $OutputEncoding - it was set as a *local* # variable only that will go out of scope automatically. [Console]::InputEncoding, [Console]::OutputEncoding = $prevIn, $prevOut } } } # end of function # -------------------------------- # GENERIC INSTALLATION HELPER CODE # -------------------------------- # Provides guidance for making the function persistently available when # this script is either directly invoked from the originating Gist or # dot-sourced after download. # IMPORTANT: # * DO NOT USE `exit` in the code below, because it would exit # the calling shell when Invoke-Expression is used to directly # execute this script's content from GitHub. # * Because the typical invocation is DOT-SOURCED (via Invoke-Expression), # do not define variables or alter the session state via Set-StrictMode, ... # *except in child scopes*, via & { ... } if ($MyInvocation.Line -eq '') { # Most likely, this code is being executed via Invoke-Expression directly # from gist.github.com # To simulate for testing with a local script, use the following: # Note: Be sure to use a path and to use "/" as the separator. # iex (Get-Content -Raw ./script.ps1) # Derive the function name from the invocation command, via the enclosing # script name presumed to be contained in the URL. # NOTE: Unfortunately, when invoked via Invoke-Expression, $MyInvocation.MyCommand.ScriptBlock # with the actual script content is NOT available, so we cannot extract # the function name this way. & { param($invocationCmdLine) # Try to extract the function name from the URL. $funcName = $invocationCmdLine -replace '^.+/(.+?)(?:\.ps1).*$', '$1' if ($funcName -eq $invocationCmdLine) { # Function name could not be extracted, just provide a generic message. # Note: Hypothetically, we could try to extract the Gist ID from the URL # and use the REST API to determine the first filename. Write-Verbose -Verbose "Function is now defined in this session." } else { # Indicate that the function is now defined and also show how to # add it to the $PROFILE or convert it to a script file. Write-Verbose -Verbose @" Function `"$funcName`" is now defined in this session. * If you want to add this function to your `$PROFILE, run the following: "``nfunction $funcName {``n`${function:$funcName}``n}" | Add-Content `$PROFILE * If you want to convert this function into a script file that you can invoke directly, run: "`${function:$funcName}" | Set-Content $funcName.ps1 -Encoding $('utf8' + ('', 'bom')[[bool] (Get-Variable -ErrorAction Ignore IsCoreCLR -ValueOnly)]) "@ } } $MyInvocation.MyCommand.Definition # Pass the original invocation command line to the script block. } else { # Invocation presumably as a local file after manual download, # either dot-sourced (as it should be) or mistakenly directly. & { param($originalInvocation) # Parse this file to reliably extract the name of the embedded function, # irrespective of the name of the script file. $ast = $originalInvocation.MyCommand.ScriptBlock.Ast $funcName = $ast.Find( { $args[0] -is [System.Management.Automation.Language.FunctionDefinitionAst] }, $false).Name if ($originalInvocation.InvocationName -eq '.') { # Being dot-sourced as a file. # Provide a hint that the function is now loaded and provide # guidance for how to add it to the $PROFILE. # Write-Verbose -Verbose @" # Function `"$funcName`" is now defined in this session. # If you want to add this function to your `$PROFILE, run the following: # "``nfunction $funcName {``n`${function:$funcName}``n}" | Add-Content `$PROFILE # "@ } else { # Mistakenly directly invoked. # Issue a warning that the function definition didn't effect and # provide guidance for reinvocation and adding to the $PROFILE. Write-Warning @" This script contains a definition for function "$funcName", but this definition only takes effect if you dot-source this script. To define this function for the current session, run: . "$($originalInvocation.MyCommand.Path)" "@ } } $MyInvocation # Pass the original invocation info to the helper script block. }