Encoding
Download source code: ZIP file
If you like this software, consider donating to me at this link: http://peteroupc.github.io/
A portable library in C# and Java that implements character encodings used in Web pages and email.
It implements the Encoding Standard, which is currently a candidate recommendation at the time of this writing.
How to Install
The C# implementation is available in the NuGet Package Gallery under the name PeterO.Encoding. To install this library as a NuGet package, enter Install-Package PeterO.Encoding in the NuGet Package Manager Console.
The Java implementation is available as an artifact in the Central Repository. To add this library to a Maven project, add the following to the dependencies section in your pom.xml file:
<dependency>
<groupId>com.github.peteroupc</groupId>
<artifactId>encoding</artifactId>
<version>0.6.0</version>
</dependency>
In other Java-based environments, the library can be referred to by its group ID (com.github.peteroupc), artifact ID (encoding), and version, as given above.
Documentation
See the Java API documentation.
See the C# (.NET) API documentation.
Examples
In C#.
// Reads text from a UTF-8/UTF-16/UTF-32 file
public static string ReadTextFromFile(string filename) {
using (var stream = new FileStream(filename, FileMode.Open)) {
return new CharacterReader(stream, 2).InputToString();
}
}
// Reads text from a SHIFT-JIS stream, but uses UTF-8/UTF-16
// instead if it detects byte order marks
using (var stream = new FileStream(filename, FileMode.Open)) {
return Encodings.GetEncoding("shift_jis")
.GetDecoderInputSkipBom(stream).InputToString();
}
// Writes text in UTF-8 to a file
using (var stream = new FileStream(filename, FileMode.Create)) {
var str="Hello world!"
str.EncodeToWriter(Encodings.UTF8,stream);
}
History
Version 0.6.0:
- Bug fixes
Version 0.5.1:
- Fixed issue in .NET 2.0 and 4.0 assemblies where resources were inadvertently left out of build.
Version 0.5.0:
- Separate aliases and encodings for email are used, for better conformance to MIME.
- New methods added to Encodings class.
- Endian-independent UTF-16 encoding added for email.
- ISO-2022-JP-2 and ISO-2022-KR encodings added for email.
- .NET 2 and .NET 4 assemblies added to NuGet package.
Version 0.4.0:
- Updated to latest Encoding Standard draft as of Jun. 28, 2018, except for a bug fix in one encoding.
Version 0.3.2:
- Version change needed to properly refer to version.
Version 0.3.1:
- Marked assembly as CLS-compliant.
Version 0.3:
- Converted project to .NET Standard
Version 0.2.1:
- Fix ResolveAliasForEmail method to conform to new behavior in version 0.2.0
Version 0.2.0:
- Update implementation to latest candidate recommendation of Encoding Standard
- ResolveAlias may return a mixed-case encoding name (as opposed to a lower-case one).
- Add overloads to CharacterReader constructor
- Add IReader interface
- Deprecated some methods of DataIO
- Add a few overloads in Encodings class, especially EncodeToWriter
- Bug fixes
Version 0.1.0:
- First release
About
Written by Peter O. in 2015.
Any copyright is dedicated to the Public Domain. http://creativecommons.org/publicdomain/zero/1.0/
If you like this, you should donate to Peter O. at: http://peteroupc.github.io/