Username: Save?
Password:
Home Forum Links Search Login Register*
    News: Keep The TechnoWorldInc.com Community Clean: Read Guidelines Here.
Recent Updates
[March 06, 2024, 02:45:27 PM]

[March 06, 2024, 02:45:27 PM]

[March 06, 2024, 02:45:27 PM]

[March 06, 2024, 02:45:27 PM]

[February 14, 2024, 02:00:39 PM]

[February 14, 2024, 02:00:39 PM]

[February 14, 2024, 02:00:39 PM]

[February 14, 2024, 02:00:39 PM]

[February 08, 2024, 10:26:18 AM]

[February 08, 2024, 10:26:18 AM]

[February 08, 2024, 10:26:18 AM]

[February 08, 2024, 10:26:18 AM]

[November 27, 2023, 06:32:12 PM]
Subscriptions
Get Latest Tech Updates For Free!
Resources
   Travelikers
   Funistan
   PrettyGalz
   Techlap
   FreeThemes
   Videsta
   Glamistan
   BachatMela
   GlamGalz
   Techzug
   Vidsage
   Funzug
   WorldHostInc
   Funfani
   FilmyMama
   Uploaded.Tech
   MegaPixelShop
   Netens
   Funotic
   FreeJobsInc
   FilesPark
Participate in the fastest growing Technical Encyclopedia! This website is 100% Free. Please register or login using the login box above if you have already registered. You will need to be logged in to reply, make new topics and to access all the areas. Registration is free! Click Here To Register.
+ Techno World Inc - The Best Technical Encyclopedia Online! » Forum » THE TECHNO CLUB [ TECHNOWORLDINC.COM ] » Programming Zone » HTML
  HTML Character Entities, Problems For RSS Readers
Pages: [1]   Go Down
  Print  
Author Topic: HTML Character Entities, Problems For RSS Readers  (Read 943 times)
Daniel Franklin
TWI Hero
**********


Karma: 3
Offline Offline

Posts: 16647


View Profile Email
HTML Character Entities, Problems For RSS Readers
« Posted: September 26, 2007, 04:09:00 PM »


Entity 2NCR has a confusing name, but has a purpose that is easy to understand which is to convert various HTML Character Entities to their numeric equivalents.

HTML Explained

The Hypertext Markup Language (HTML) is a simple markup language used to create hypertext documents that are platform independent. HTML documents are SGML documents with generic semantics appropriate for representing information from a wide range of domains. It can represent hypertext news, mail, documentation and hypermedia as well as menus of options, database query results and simple structured documents with in-line graphics. It can likewise represent hypertext views of existing bodies of information.

The World Wide Web (WWW) has been using HTML since 1990, making it one if the most widely used computer languages in the world. The WWW, in turn, is most commonly used for HTML whose popularity is due to the fact that it is the coding technology used to publish content on the Internet or the web. Programmers were quick to recognize HTML's user friendliness due to the ease of learning it.

This ease of coding was significantly contributory to the proliferation of web sites. However, HTML is not a complete programming language because it lacks conditional tests and flow control statements. There are implementations that may offer extensions to the HTML language in order to accomplish these functions but are not actually part of the HTML standards. By embedding some suitable programming language code inside HTML, the power of real programming language is realized.

A character entity can be written in two ways in HTML. One is called the symbolic reference while the other is the numeric reference. Symbolic references start with an ampersand and ends with a semi colon. The description of the symbol which is generally a shortened version of the full expression, can be found between these two. The letters in the middle are case sensitive and are usually lower cased, though there are exceptions.

Numeric references also start with an ampersand and finish with a semi colon, but between them is a number preceded by a hash. These are less memorable than symbolic references but correspond only to just a single byte of data. This can be very useful if one is trying to optimize pages for minimum download time. Symbolic references are sometimes referred to as entity references while numeric references are also called decimal references.

Most unusual characters can be directly entered without any problem. However, HTML character entities can be used in case one does encounter a problem. Lines and paragraph are automatically recognized. A couple of blank lines are added when paragraphs are not recognized.

A character entity is a method used to display special characters normally reserved for use in HTML. For instance, the less than () are used as part of the HTML tag structure, thus both symbols are reserved for the use. If there is a need to display these symbols on one's site, character entities can be used.

Problems

Many WordPress users are running afoul of character entities appearing in their comment RSS feeds, which many RSS/syndication readers fail on. The WordPress Plugin - Entity 2NCR seeks to resolve this by converting various HTML character entities such as », &, © and so on to their numeric equivalents. This plugin is for RSS output, but can also be adapted to posts if the user so wishes.

Installation of the Entity2NCR is not needed if a user is running WordPress 1.5.1 and above. It will only result to problems due to the plugin's function having the same name in the WordPress core. Upgrading to the most recent version is recommended since the plugin is already incorporated. The Entity2NCR should first be deactivated from the Plugin Admin before the installation of 1.5.1. The user should likewise delete its file from the WP-contents/plugins directory since it will just unnecessarily take up space.

The Entity2NCR is installed by downloading the zip file, extracting http://Entity2NCR.php from it and uploading this to the WP-content/plugins/directory and activating the plugin in WordPress. Entity2NCR hits the standard assortment of HTML character entities plus some of the more unusual and obscure ones as well. While this plugin primarily focuses on RSS output, both from posts and comments, it can also convert character entities in the regular content on one's blog as well. At the end of the plugin for the add-filter lines, the user is to remove the comment for any WordPress function he/she would want Entity2NCR to work on.

The RSS 2.0 spec is too vague although it can produce feeds that are valid, accurate and useful. This means that the contents of the feed should reflect the best possible representation of the article content. The spec does not say however, what to do if an article title contains HTML code or entities. It also doesn't say a lot of other things. In fact, an entire industry has sprung around the service of interpreting and fixing the various semantic differences between feeds. RSS application developers need to agree on some basic answers to fundamental questions instead of making endless conflicting discussions that do not help in any way.

Attribute Values

An HTML author should always put attribute values into quotes in HTML, although the formal rules allow the omission of the quotes in some cases. SGML requires that all attribute values are delimited using either double quotation marks or single quotation marks. Single quote marks can be included within the attribute values when the value is delimited by double quote marks and vice versa. Authors can also use numeric character references to represent double quotes and single quotes or use the character entity reference " for double quotes. There are cases that the values of an attribute may be specified without any quotation marks. The attribute value may contain letters, digits, hyphens and periods. It is highly recommended to use quotation marks even when it is possible to eliminate them.

There are several reasons to always use quotes around attribute values. It is much easier since there is no need to memorize and recall the rules for allowable omission. Another thing is that quotes are always required in XML. When one's HTML file is later edited, it may easily be forgotten to add the quotes in attribute value that is edited in a manner which makes the quotes mandatory. One drawback of doing this is the effort of typing and extra storage and transmission time required which are quite minor issues anyway. Quotes constitute just a small fraction of an HTML file.

Articles Source - Free Articles

Logged

Pages: [1]   Go Up
  Print  
 
Jump to:  

Copyright © 2006-2023 TechnoWorldInc.com. All Rights Reserved. Privacy Policy | Disclaimer
Page created in 0.071 seconds with 24 queries.