dc-ml-emea-ar/instructions/aus_prospectus/data_extraction_prompts_con...

307 lines
22 KiB
JSON

{
"summary": "Read the context carefully.\nMaybe exists {} data in the context.\n",
"summary_image": "Read the image carefully.\nMaybe exists {} data in the image.\n",
"get_image_text": "Instructions:\nYou are given an image of a page from a PDF document. Extract **all visible text** from the image while preserving the original order, structure, and any associated context as closely as possible. Ensure that:\n\n1. **All textual elements are included**, such as headings, body text, tables, and labels.\n2. **Numerical data, symbols, and special characters** are preserved accurately.\n3. Text in structured formats (e.g., tables, lists) is retained in a logical and readable format.\n4. Any text embedded in graphical elements, if clearly readable, is also included.\n5. The text is clean, readable, and free of formatting artifacts or errors.\n\nDo not include non-textual elements such as images or graphics unless they contain text that can be meaningfully extracted.\n\n### Output Format:\nOutput the result as JSON format, here is the example: \n{\"text\": \"Text from image\"}\n\nAnswer: \n[Extracted Text Here, retaining logical structure and all content]",
"image_features":
[
"1. Identify the text in the PDF page image.",
"2. Identify and format the all of tables in the PDF page image.",
"Table contents should be as markdown format,",
"ensuring the table structure and contents are exactly as in the PDF page image.",
"The format should be: |Column1|Column2|\n|---|---|\n|Row1Col1|Row1Col2|",
"Each cell in the table(s) should be in the proper position of relevant row and column.",
" 3. Extract data from upon parsed text and table(s) contents.",
"3.1 The upon parsed text and table(s) contents as context.",
"3.2 Please extract data from the context."
],
"data_business_features": {
"common": [
"General rules:",
"- 1. The data is in the context, perhaps in table(s), semi-table(s) or paragraphs.",
"- 2. Fund name: ",
"a. The full fund name should be main fund name + sub-fund name, e,g, main fund name is Black Rock European, sub-fund name is Growth, the full fund name is: Black Rock European Growth.",
"b. The sub-fund name may be as the first column or first row values in the table.",
"b.1 fund name example:",
"---- Example Start ----",
"Summary information\nCapital International Fund Audited Annual Report 2023 | 15\nFootnotes are on page 17.\nCapital Group Multi-Sector \nIncome Fund (LUX) \n(CGMSILU)\nCapital Group US High Yield \nFund (LUX) (CGUSHYLU)\nCapital Group Emerging \nMarkets Debt Fund (LUX) \n(CGEMDLU)",
"---- Example End ----",
"Fund names: Capital International Group Multi-Sector Income Fund (LUX), Capital International Group US High Yield Fund (LUX), Capital International Group Emerging Markets Debt Fund (LUX)",
"\n",
"c. If with multiple fund names in context, please retrieve the fund name closest above the numerical value.",
"c.1 fund name example:",
"---- Example Start ----",
"AXA World Funds ACT Emerging Markets Bonds\nAXA World Funds \n \nAdditional Unaudited Appendix \n\nƒ$GGLWLRQDO8QDXGLWHG$SSHQGL[$118$/5(3257$;$:RUOG)XQGV\nExpense Ratios (continued) \n \nCalculated TER (1) \nSwiss method \nApplied\nService Fee (2)\nOngoing \nCharges (3) \n \nwith performance \nfees \nwithout performance \nfees \n \nAXA World Funds - ACT Emerging Markets Short Duration Bonds Low Carbon \nA Capitalisation CHF Hedged \n1.26% \n1.26% \n0.26% \n1.29%",
"---- Example End ----",
"Correct fund name: AXA World Funds - ACT Emerging Markets Short Duration Bonds Low Carbon",
"\n",
"- 3. Only extract the latest data from context:",
"If with multiple data values in same row, please extract the latest.",
"\n",
"d. Some table format, the fund name is in the end of row, please extract the fund name from the end of row.",
"---Example Start---",
"\nTotal\nTransaction Costs\nPerformance Fees\nManagement fees and costs\nIndirect Fee\nManagement fees\nMLC diversified investment\noption\n1.49% p.a.\n0.01% p.a.\n0.06% p.a.\n0.07% p.a.\n1.35% p.a.\nMLC Horizon 2\nIncome Portfolio\n",
"---Example End---",
"Correct fund name: MLC Horizon 2 Income Portfolio",
"\n",
"e. Fund and share relationship:",
"One fund could be with multiple share classes and relevant share class level data values.",
"If can't find specific share name, set share name as same as fund name.",
"---Example Start---",
"\nTotal\nTransaction Costs\nPerformance Fees\nManagement fees and costs\nIndirect Fee\nManagement fees\nMLC diversified investment\noption\n1.49% p.a.\n0.01% p.a.\n0.06% p.a.\n0.07% p.a.\n1.35% p.a.\nMLC Horizon 2\nIncome Portfolio\n",
"---Example End---",
"Correct fund name: MLC Horizon 2 Income Portfolio",
"Correct share name: MLC Horizon 2 Income Portfolio",
"- 4. Reported names:",
"Only output the values which with significant reported names.",
"- Multiple data columns with same reported name but different post-fix:",
"If there are multiple reported names with different post-fix text, here is the priority rule:",
"The pos-fix text is in the brackets: (gross), (net), pick up the values from (net).",
"---Example Start---",
"\n Investment option \nInvestment option \nmanagement \ncosts1 \n% p.a. \n(A)\nLifeplan \nadministration fee \n(gross)2 \n% p.a. \n(B)\nLifeplan \nadministration fee \n(net) \n% p.a. \n(C)\nTotal Management \nfees and costs \n(gross) \n% p.a. \n(A + B)\nTotal Management \nfees and costs \n(net) \n% p.a. \n(A + C)\nAllan Gray Australian Equity Fund \u2013 Class A\n0.77\n0.60\n0.42\n1.37\n1.19\n",
"---Example End---",
"The output should be:",
"{\"data\": [{\"fund name\": \"Allan Gray Australian Equity Fund\", \"share name\": \"Class A\", \"management_fee_and_costs\": 1.19, \"management_fee\": 0.77, \"administration_fees\": 0.42}]",
"- 6. Please ignore these words as fund names, it means never extract these words as fund names. They are:",
"\"Ready-made portfolios\", \"Simple choice\", \"Build-your-own portfolio\"."
],
"investment_level": {
"total_annual_dollar_based_charges": "Total annual dollar based charges is share level data.",
"management_fee_and_costs": "Management fee and costs is share level data.",
"management_fee": "Management fee is share level data.",
"performance_fee_costs": "Performance fee costs is share class level data.",
"performance_fee": "Performance fees is share class level data.",
"buy_spread": "Buy spread is share class level data.",
"sell_spread": "Sell spread is share class level data.",
"establishment_fee": "Establishment fee is share class level data.",
"contribution_fee": "Contribution fee is share class level data.",
"withdrawal_fee": "Withdrawal fee is share class level data.",
"switching_fee": "Switching fee is share class level data.",
"activity_fee": "Activity fee is share class level data.",
"exit_fee": "Exit fee is share class level data.",
"administration_fees": "Administration fees is share class level data.",
"interposed_vehicle_performance_fee_cost": "Interposed vehicle performance fee cost is share class level data.",
"additional_hurdle": "Additional hurdle is share class level data.",
"benchmark_name": "Benchmark name is fund level data.",
"reference_rate": "Reference rate is share class level data.",
"crystallisation_frequency": "Crystallisation frequency is share class level data.",
"date_of_last_hwm_reset": "Date of last hwm reset is share class level data.",
"date_of_last_performance_fee_restructure": "Date of last performance fee restructure is share class level data.",
"high_water_mark_type": "High water mark type is share class level data.",
"minimum_initial_investment": "Minimum initial investment is share class level data.",
"recoverable_expenses": "Recoverable expenses is share class level data.",
"indirect_costs": "Indirect costs is share class level data."
},
"data_value_range": {
"total_annual_dollar_based_charges": "Total annual dollar based charges is belong to decimal number, the value could be more than 100, e.g. 625.00",
"management_fee_and_costs": "Management fee and costs is belong to percentage number, the value should be less than 100.",
"management_fee": "Management fee is belong to percentage number, the value should be less than 100.",
"performance_fee_costs": "Performance fees costs is belong to percentage number, the value should be less than 100.",
"performance_fee": "Performance fees is belong to percentage number, the value should be less than 100.",
"buy_spread": "Buy spread is belong to percentage number, the value should be less than 100.",
"sell_spread": "Sell spread is belong to percentage number, the value should be less than 100.",
"establishment_fee": "Establishment fee is belong to percentage number, the value should be less than 100.",
"contribution_fee": "Contribution fee is belong to percentage number, the value should be less than 100.",
"withdrawal_fee": "Withdrawal fee is belong to percentage number, the value should be less than 100.",
"switching_fee": "Switching fee is belong to percentage number, the value should be less than 100.",
"activity_fee": "Activity fee is belong to percentage number, the value should be less than 100.",
"exit_fee": "Exit fee is belong to percentage number, the value should be less than 100.",
"administration_fees": "Administration fees is belong to percentage number, the value should be less than 100.",
"interposed_vehicle_performance_fee_cost": "Interposed vehicle performance fee cost is belong to percentage number, the value should be less than 100.",
"additional_hurdle": "Additional hurdle is belong to percentage number, the value should be less than 100.",
"benchmark_name": "Benchmark name is belong to index fund name, the value should be text.",
"reference_rate": "Reference rate is belong to percentage number, the value should be less than 100.",
"crystallisation_frequency": "Crystallisation frequency is belong to text, the value should be text.",
"date_of_last_hwm_reset": "Date of last hwm reset is belong to date, the value should be date format.",
"date_of_last_performance_fee_restructure": "Date of last performance fee restructure is belong to date, the value should be date format. e.g. 12 August 2022",
"high_water_mark_type": "High water mark type is belong to text, the value should be text.",
"minimum_initial_investment": "Minimum initial investment is belong to decimal number, the value could be more than 100, e.g. 625.00",
"recoverable_expenses": "Recoverable expenses is belong to percentage number, the value should be less than 100.",
"indirect_costs": "Indirect costs is belong to percentage number, the value should be less than 100."
},
"special_rule": {
"management_fee_and_costs": [
"If there are multiple Management fee and costs reported names, here is the priority rule:",
"A. With \"Total Management fees and costs (gross)\" and \"Total Management fees and costs (net)\", pick up the values from \"Total Management fees and costs (net)\".",
"---Example Start---",
"\n Investment option \nInvestment option \nmanagement \ncosts1 \n% p.a. \n(A)\nLifeplan \nadministration fee \n(gross)2 \n% p.a. \n(B)\nLifeplan \nadministration fee \n(net) \n% p.a. \n(C)\nTotal Management \nfees and costs \n(gross) \n% p.a. \n(A + B)\nTotal Management \nfees and costs \n(net) \n% p.a. \n(A + C)\nAllan Gray Australian Equity Fund \u2013 Class A\n0.77\n0.60\n0.42\n1.37\n1.19\n",
"---Example End---",
"The output should be:",
"{\"data\": [{\"fund name\": \"Allan Gray Australian Equity Fund\", \"share name\": \"Class A\", \"management_fee_and_costs\": 1.19, \"management_fee\": 0.77, \"administration_fees\": 0.42}]",
"\n",
"If there are multiple Management fee and costs sub-columns, here is the rule:",
"B. With \"Management fees\" and \"Indirect fee\", sum the values from these two columns: \"Management fees\" + \"Indirect fee\".",
"---Example Start---",
"\n\nManagement fees \nManagement fees and costs \nIndirect Fee \nPerformance Fees \nTransaction Costs \nTotal \nMLC diversified investment \noption \nMLC Horizon 2 \nIncome Portfolio \n1.35% p.a. \n0.07% p.a. \n0.06% p.a. \n0.01% p.a. \n1.49% p.a. \n",
"---Example End---",
"The output should be:",
"{\"data\": [{\"fund name\": \"MLC Horizon 2 Income Portfolio\", \"share name\": \"MLC Horizon 2 Income Portfolio\", \"management_fee_and_costs\": 1.42, \"management_fee\": 1.35, \"indirect_costs\": 0.07, \"performance_fee\": 0.06}]",
"\n",
"C. If only find \"Management fees and costs\", please output the relevant same value for both of data point keys: \"management_fee_and_costs\" and \"management_fee\".",
"---Example 1 Start---",
"The fees and costs for managing \nyour investment \nManagement fees and costs \n1 \n• \nSPDR World: 0.30% per annum of net asset \nvalue. This is reduced to 0.18% per annum of net \nasset value with effect from 14 February 2022.",
"---Example 1 End---",
"The output should be:",
"{\"data\": [{\"fund name\": \"SPDR World\", \"share name\": \"SPDR World\", \"management_fee_and_costs\": 0.18, \"management_fee\": 0.18}]",
"---Example 2 Start---",
"Management Fees and Costs \n\nAs at the date of this PDS, Management Fees and Costs will be capped at: \n\n• 0.18% pa of net asset value for SPDR World \n\n• 0.21% pa of net asset value for SPDR World (Hedged) \n\n",
"---Example 2 End---",
"The output should be:",
"{\"data\": [{\"fund name\": \"SPDR World\", \"share name\": \"SPDR World\", \"management_fee_and_costs\": 0.18, \"management_fee\": 0.18}, {\"fund name\": \"SPDR World (Hedged)\", \"share name\": \"SPDR World (Hedged)\", \"management_fee_and_costs\": 0.21, \"management_fee\": 0.21}]"
],
"buy_spread": [
"Please don't extract data by the reported names for buy_spread or sell_spread, they are: ",
"Transaction costs buy/sell spread recovery, Transaction costs reducing return of the investment option (net transaction costs)"
],
"minimum_initial_investment": [
"Minimum initial investment is fund level data, belong to integer number, the value examples are 100, 1,000, 5,000, 10,000, etc.",
"---Example 1 Start---",
"The minimum investment per Pension Plan account is \n$20,000. The minimum initial investment in any \ninvestment option is $5,000.\n\nPerpetual WealthFocus Pension Plan",
"---Example 1 End---",
"The output should be:",
"{\"data\": [{\"fund name\": \"Perpetual WealthFocus Pension Plan\", \"share name\": \"\", \"minimum_initial_investment\": 5000}]",
"\n",
"---Example 2 Start---",
"Prime Super \n\n5 Initial investment amount \n\nThe minimum net total initial investment amount is $10,000. Please note before you open your pension account: If you \nhave made personal contributions into super and wish to claim a tax deduction, you will have to lodge a Notice of \nIntent to Claim form with the relevant super fund (including Prime Super) before you roll your super into the Income \nStreams account.",
"---Example 2 End---",
"The output should be:",
"{\"data\": [{\"fund name\": \"Prime Super\", \"share name\": \"\", \"minimum_initial_investment\": 10000}]"
]
}
},
"special_cases": {
"common": [
{
"title": "Latest data with time series data:",
"contents": [
"Case 1:",
"Some data table is with multiple date columns, please extract the data from the latest date column:",
"- Get dates from column header.",
"- Only extract data from the columns which column header is as the latest date.",
"-- commone case",
"The latest date-time column usually is the first datapoint value column.",
"If the value of column with latest date is N/A or -, please ignore.",
"-----Example Start-----",
"I-class income shares\n\n31.10.22\n30.04.22\n30.04.21\n30.04.20\n\npence per share\npence per share\npence per share\npence per share\nOther information\nOperating charges**\nN/A\n—\n0.90%\n0.90%",
"-----Example End-----",
"The output should be:",
"{\"data\": []}"
]
},
{
"title": "Don't fetch data with number range statement",
"contents":[
"If the value is with number range statement, e.g. \"up to\" or \"from to\" or \"between and\", please ignore the value.",
"Example 1:",
"-----Example Start-----",
"A-Class\nB-Class\nC-Class\n",
"Management fee\nUp to 1.00%\nUp to 1.20%\nUp to 1.50%\n",
"-----Example End-----",
"The output should be:",
"{\"data\": []}",
"Example 2:",
"-----Example Start-----",
"5 year average performance fees range estimated\nto be between 0.00% pa and 0.07% pa2,3.\n",
"-----Example End-----",
"The output should be:",
"{\"data\": []}",
"Example 3:",
"-----Example Start-----",
"Buy-sell spread ranges from 0.00% to 0.25%\ndepending on the investment option",
"-----Example End-----",
"The output should be:",
"{\"data\": []}"
]
}
]
},
"output_requirement": {
"common": [
"If possible, please extract fund name, share name, data points values as the output.",
"If find fund name, and exist sub fund name, please output fund name + sub fund name, e.g. fund name is \"Black Rock European\", sub fund name is \"Growth\", the output fund name should be: \"Black Rock European Growth\".",
"Only output the data point which with relevant value.",
"Don't ignore the data point which with negative value, e.g. -0.12, -1.13",
"Don't ignore the data point which with explicit zero value, e.g. 0, 0.00",
"Don't extract data which values are -, *, **, N/A, N/A%, N/A %, NONE, it means the value should be NULL, please skip them.",
"Please also output the data point reported name in context.",
"Example:",
"---Example Start---",
"\n Investment option \nInvestment option \nmanagement \ncosts1 \n% p.a. \n(A)\nLifeplan \nadministration fee \n(gross)2 \n% p.a. \n(B)\nLifeplan \nadministration fee \n(net) \n% p.a. \n(C)\nTotal Management \nfees and costs \n(gross) \n% p.a. \n(A + B)\nTotal Management \nfees and costs \n(net) \n% p.a. \n(A + C)\nAllan Gray Australian Equity Fund \u2013 Class A\n0.77\n0.60\n0.42\n1.37\n1.19\nAlphinity Sustainable Share Fund\n0.95\n0.60\n0.42\n1.55\n1.37\nAntipodes Global Fund\n1.20\n0.60\n0.42\n1.80\n1.62\n",
"---Example End---",
"Output:",
"{\"data\": [{\"fund name\": \"Allan Gray Australian Equity Fund\", \"share name\": \"Class A\", \"management_fee_and_costs\": 1.19, \"management_fee\": 0.77, \"administration_fees\": 0.42}, {\"fund name\": \"Alphinity Sustainable Share Fund\", \"share name\": \"Alphinity Sustainable Share Fund\", \"management_fee_and_costs\": 1.37, \"management_fee\": 0.95, \"administration_fees\": 0.42}, {\"fund name\": \"Antipodes Global Fund\", \"share name\": \"Antipodes Global Fund\", \"management_fee_and_costs\": 1.62, \"management_fee\": 1.20, \"administration_fees\": 0.42}]",
"Fund level data: (\"fund name\" and \"datapoint_name\") and share level data: (\"fund name\", \"share name\", \"datapoint_name\") should be output separately.",
"The output should be JSON format, the format is like below example(s):"
],
"fund_level": [
"[{\"fund name\": \"fund 1 - sub fund name 1\",\"benchmark_name\": \"S&P 500 Index Fund\"}, {\"fund name\": \"fund 2 - sub fund name 2\",\"benchmark_name\": \"FTSE All Share\"}]"
],
"share_level": {
"fund_name": [
"fund 1",
"fund 2",
"fund 3"
],
"share_name": [
"share 1",
"share 2",
"share 3"
],
"total_annual_dollar_based_charges_value": [125.00, 95.00, 26.00],
"management_fee_and_costs_value": [2.63, 1.58, 2.55],
"management_fee_value": [0.85, 1.10, 0.23],
"performance_fee_value": [0.03, 0.21, 0.08],
"performance_fee_costs_value": [0.05, 0.25, 0.09],
"buy_spread_value": [0.10, 0.15, 0.12],
"sell_spread_value": [0.10, 0.10, 0.15],
"establishment_fee_value": [0.75, 1.20, 0.25],
"contribution_fee_value": [0.82, 0.10, 0.13],
"withdrawal_fee_value": [0.75, 0.15, 1.23],
"switching_fee_value": [0.25, 0.35, 1.53],
"activity_fee_value": [0.63, 0.25, 1.27],
"exit_fee_value": [0.35, 0.18, 1.21],
"administration_fees_value": [0.10, 0.15, 0.08],
"interposed_vehicle_performance_fee_cost_value": [0.02, 0.15, 0.03],
"additional_hurdle_value": [0.02, 0.15, 0.03],
"reference_rate_value": [0.02, 0.15, 0.03],
"crystallisation_frequency_value": ["Monthly", "Bi-Annually", "Annually"],
"date_of_last_hwm_reset_value": ["29 March 2023", "18 April 2024", "19 October 2021"],
"date_of_last_performance_fee_restructure_value": ["12 August 2022", "15 March 2024", "11 November 2023"],
"high_water_mark_type_value": ["Total Return", "Excess Return", "Both TR & ER"],
"minimum_initial_investment_value": [0, 5000, 10000],
"recoverable_expenses_value": [0.12, 0.05, 0.06],
"indirect_costs_value": [0.12, 0.16, 0.02]
},
"dp_reported_name" : {
"total_annual_dollar_based_charges": "Total annual dollar based charges",
"management_fee_and_costs": "Management fee and costs",
"management_fee": "Management fee",
"performance_fee": "Performance fee",
"performance_fee_costs": "Performance fee costs",
"buy_spread": "Buy spread",
"sell_spread": "Sell spread",
"establishment_fee": "Establishment fee",
"contribution_fee": "Contribution fee",
"withdrawal_fee": "Withdrawal fee",
"switching_fee": "Switching fee",
"activity_fee": "Activity fee",
"exit_fee": "Exit fee",
"administration_fees": "Administration fee",
"interposed_vehicle_performance_fee_cost": "Interposed vehicle performance fee cost",
"additional_hurdle": "Additional hurdle",
"benchmark_name": "Benchmark name",
"reference_rate": "Reference rate",
"crystallisation_frequency": "Crystallisation frequency",
"date_of_last_hwm_reset": "Date of last hwm reset",
"date_of_last_performance_fee_restructure": "Date of last performance fee restructure",
"high_water_mark_type": "High-water mark type",
"minimum_initial_investment": "Minimum initial investment",
"recoverable_expenses": "Recoverable expenses",
"indirect_costs": "Indirect cost"
}
},
"end": [
"Only output JSON data.",
"Don't output the value which not exist in context.",
"If can't find fund name or share class name in context, please output empty JSON data: {\"data\": []}"
]
}