Overview
In the dynamic world of technology consulting, we often find ourselves learning new things in the most unexpected ways. This is a story from my time enabling a team at a major bank in Indonesia. During this project, we encountered an unusual challenge: handling duplicate fields in JSON objects from an external system we needed to integrate with.
The Challenge of Duplicate Fields in JSON
Typically, JSON parsers prioritize the last occurrence of any duplicate fields, as specified by ECMA standards. The JSON syntax itself does not impose restrictions on the strings used as names, nor does it require name strings to be unique. This lack of constraints can lead to complications when all instances of duplicate fields must be considered and mapped into a list, as was our requirement.
Reference: Having duplicate fields in JSON is not valid according to the JSON specification defined by the ECMA-404 standard. The standard does not explicitly forbid duplicate keys, but it also does not define any semantics for handling them. This ambiguity means that behavior can vary depending on the JSON parser being used. ECMA-404 Standard
Crafting a Solution: Insights on Tweaking the JSON Library to Handle Duplicate Keys
When working with JSON data, we often rely on established libraries to parse and process the data efficiently. However, sometimes these libraries may not meet all our specific requirements as is the case with above challenge. To address this, I modified the widely-used org.json
library to preserve all values associated with duplicate keys. Here’s how I did it:
Choosing the Right Library
I chose the org.json
library for several reasons:
- Popularity and Reliability: It is one of the most widely-used libraries for JSON processing.
- Simplicity and Ease of Use: The library offers a straightforward API, making it easy to integrate and extend.
- Community Support: Extensive documentation and community support make it easier to troubleshoot and enhance.
Identifying the Key Classes
The first step in modifying the library was to understand its internal workings, specifically how it parses JSON elements. Through my exploration, I identified two crucial classes:
JSONObject
: This class represents a JSON object and handles the storage of key-value pairs.JSONTokener
: This class is responsible for tokenizing the JSON input and feeding it to theJSONObject
for parsing.
Modifying the Necessary Classes
To implement custom handling for duplicate keys, I extended both JSONObject
and JSONTokener
classes. Here’s a high-level overview of the changes:
- Extending
JSONObject
: I overrode the methods responsible for adding key-value pairs to check for duplicate keys. When a duplicate key is detected, the values are stored in a list to preserve all instances. - Extending
JSONTokener
: I modified the tokenizer to support the extended behavior of theJSONObject
, ensuring seamless integration between the two classes.
Next Steps!
Having successfully modified the org.json
library to handle duplicate keys, I wanted to share my work with the broader development community. However, I had never published a library to Maven Central before. I had always relied on libraries from the Maven Central repository while working on various software projects. The prospect of publishing my own library filled me with excitement, and I embarked on a journey to learn how to publish to Maven Central.
Sharing the Knowledge: Publishing to Maven Central
In the spirit of sharing knowledge and contributing to the developer community, I published the modified JSON parsing library to the Maven Central Repository. This process was another learning curve, involving the following key steps:
In the world of software development, the path to success is often paved with learning curves and unexpected challenges. My journey to publish a library to Maven Central was no different. Here’s the story of how I navigated the intricate process of bringing my work to a broader audience.
Step 1: Initiating the Request
The adventure began with a visit to the Sonatype Central Portal, where I initiated my request to publish. Following the guidelines provided by Sonatype's publishing instructions, I embarked on the journey with a sense of excitement and curiosity.
Step 2: Preparing the Prerequisites
Before diving into the technicalities, I had to ensure I met all prerequisites:
- Approval for OSSRH: Gaining approval to publish to the Open Source Software Repository Hosting (OSSRH) was the first milestone.
- Generate GPG Keys: Generating and preparing my GPG key was crucial. This key would serve as a digital signature, ensuring the integrity and authenticity of my library.
I also saved my passphrase and updated the settings.xml file of my Maven configuration:
<settings>
<servers>
<server>
<id>ossrh</id>
<username>${env.OSS_USERNAME}</username>
<password>${env.OSS_PASSWORD}</password>
</server>
</servers>
</settings>
Step 3: Trials with Keyserver
Handling the GPG (The GNU Privacy Guard) keys involved a few trials and errors. I used the following commands to send and receive keys via the keyserver:
gpg --keyserver hkp://keys.openpgp.org --send-keys <DIGITAL-SIGNATURE>
gpg --keyserver hkp://keys.openpgp.org --recv-keys <DIGITAL-SIGNATURE>
Step 4: Validation and Verification
Validating in Nexus: Once the keys were set up, the next step was validating my project in Nexus. I used the Nexus search to ensure everything was in order:
Getting the GPG key ID:
The GPG key ID is essentially the low-order 32 bits (short key ID) or 64 bits (long key ID) of the digital fingerprint or signature, which is calculated from the hash of the public key. It’s recommended to use long key IDs to avoid the risk of collisions.
If you’re on the instance where you created the GPG keyring, you can use the following command to list your keys and get the ID:
gpg --list-secret-keys --keyid-format=long
Note: If you are on a different instance and only have the public key on hand, you can also extract the GPG/PGP key ID from the public key file. You can follow these guidelines from Stack Exchange.
Step 5: Verification and Release
The final step involved verifying my library in the releases repository. Ensuring everything was correctly in place, I checked the release at the Releases Repository.
Additionally, I associated my email ID with my GPG key, following GitHub’s documentation: Associating Email with GPG Key
Success! After following these steps, my library was successfully published. You can find it at Duplicate Keys JSON Library.
This journey was not just about publishing a library; it was about growth, learning, and contributing to the developer community. Each step taught me something new, enriching my skills and preparing me for future challenges.
Although this specific library has not yet seen widespread use, the experience enriched my understanding of both JSON handling and the process of publishing to Maven Central.
Conclusion
Tech consulting is often a journey of discovery, filled with unexpected challenges and opportunities to learn. From tweaking a JSON parser to handling duplicate fields to navigating the intricacies of Maven Central publishing, each step is a testament to the continuous learning and adaptation that defines our field. This experience not only solved a critical problem for our client but also added a valuable tool to my professional toolkit, ready to be utilized in future projects.