c++ – Visual Studio cannot output Unicode Characters


I’ve made a program for school; that goes through a plain text file and makes a concordance for each word. It will take each word, remove non-alphabetical characters from the front and back, and put it into a Binary Search Tree. When encountering Unicode characters in a text, you get random ascii characters that make up the multibyte character instead of what it is: for example, “yarns—and,” is outputted as “yarnsùand.” I spent hours months ago and this week trying to solve this problem, so what do I do?

https://www.codeproject.com/Articles/38242/Reading-UTF-8-with-C-streams#mozTocId353176 This article seemed useful. But not being able to read in utf-8 is a solved problem, so making up a facet didn’t seem useful. I didn’t try it though because of that.

Here is a MRE of the bug.

#include <string>
#include <iostream>
#include <fstream> 
#include <windows.h>
#include <consoleapi2.h>
using namespace std;

int main()
{
    wfstream file;
    file.open("Example.txt", ios::in);
    // Changes buffer from char to wchar_t
    wchar_t* buffer = new wchar_t[100];
    file.rdbuf()->pubsetbuf(buffer, 100);
    wchar_t CurrentStreamCharacter = file.get();
    wstring NewWord = L"";
    while (file) 
    {
        NewWord.push_back(CurrentStreamCharacter);
        CurrentStreamCharacter = file.get();
    }
    //SetConsoleOutputCP(65001);
    wcout << NewWord << endl;
    wcout << "yarns—and even convictions. The Lawyer—the best of old fellows—had,";
    return 0;
}

Here is the text in Example.txt.

yarns—and even convictions. The Lawyer—the best of old fellows—had,

Leave a Reply

Your email address will not be published. Required fields are marked *